JP2009075747A

JP2009075747A - Similar sentence retrieval system and program

Info

Publication number: JP2009075747A
Application number: JP2007242641A
Authority: JP
Inventors: Akira Sasaki; 晶佐々木; Yumiko Yoshimura; 裕美子吉村
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2007-09-19
Filing date: 2007-09-19
Publication date: 2009-04-09
Anticipated expiration: 2027-09-19
Also published as: JP4602388B2

Abstract

<P>PROBLEM TO BE SOLVED: To enable efficient retrieval of an example including a specific expression from a database, when the specific expression noted by a user is included in a retrieval target sentence. <P>SOLUTION: A keyword designation unit 24 receives designation of a keyword in the retrieval target sentence received with a retrieval target input unit 23. A language analysis unit 25 analyzes role information in syntax/semantics of each word comprising the retrieval target sentence. A retrieval expression generation unit 27 generates a retrieval expression including a keyword and role information in syntax/semantics of the keyword among the pieces of role information in syntax/semantics of respective words analyzed with the language analysis unit 25. A database retrieval unit 28 retrieves examples including the keyword from an example database 21 by using the retrieval expression, and retrieves an example having matching role information in syntax/semantics from them. An output unit 29 outputs the retrieved example to a display device 17. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、入力された検索対象文の類似文を用例として用例データベースから検索する類似文検索システム及びプログラムに関する。 The present invention relates to a similar sentence search system and program for searching from an example database using similar sentences of input search target sentences as examples.

類似文検索システムは、入力された検索対象文に対して用例データベースから検索対象文に類似する類似文を検索するものであり、このような類似文の用例検索は、情報検索や翻訳時に参照する用例検索等で用いられる。 The similar sentence search system searches a similar sentence similar to the search target sentence from the example database with respect to the input search target sentence, and the example search of such a similar sentence is referred to during information search or translation. Used in example searches.

類似文検索においては、検索対象文と用例データベースに登録された各文との類似度を算出し、ユーザが設定する閾値以上の類似度となった文を検索結果として検出する。類似度の算出方法には様々な方法があるが、文を構成する単語の文字列及び単語単位での一致を基にする方法が基本である。これに加えて、類義語を事前に辞書に登録しておき、類義語の関係にある語同士は一致しているとみなし、文字列として全く同じ単語を含まなくても、話題としては近い類似文も検出できるようにしたものがある（例えば、特許文献１参照）。また、類似度の算出にあたり、文の構文・意味関係の一致を考慮し、より意味内容が近い類似文の検出を行えるようにしたものがある（例えば、特許文献２参照）。 In the similar sentence search, the similarity between the search target sentence and each sentence registered in the example database is calculated, and a sentence having a similarity higher than a threshold set by the user is detected as a search result. There are various methods for calculating the similarity, but the basic method is based on the character strings of words constituting the sentence and the matching in units of words. In addition to this, synonyms are registered in the dictionary in advance, it is considered that the words that are related to the synonyms match, and similar sentences that are close as topics even if they do not contain the same word as a character string. Some of them can be detected (see, for example, Patent Document 1). Further, in calculating similarity, there is one that can detect similar sentences with closer semantic contents in consideration of matching of sentence syntax and semantic relationship (for example, see Patent Document 2).

このような類似文検索システムでは、文全体としての類似度よりも検索対象文の中の特定の言い回しに注目し、その言い回しを含んだ類似文の検出を目的とする場合には、検出結果がユーザの希望にそぐわない場合がある。これは、従来の検索システムでは類似度計算の際に文の各構成要素の重みを均等に扱っているからである。すなわち、特定の言い回しを含んでいても、文全体に対してその言い回しの割合が低い場合には、文全体としては類似度が低くなるからである。 In such a similar sentence search system, when the focus is on a specific phrase in the search target sentence rather than the similarity as a whole sentence, and the purpose is to detect a similar sentence including the phrase, the detection result is There are cases where it does not meet the user's wishes. This is because the conventional search system treats the weight of each component of the sentence equally when calculating the similarity. That is, even if a specific phrase is included, if the ratio of the phrase is low with respect to the entire sentence, the similarity as a whole sentence is low.

このような問題に対して、類似度計算の際に文を構成する部分に応じて重みを変えるという、以下のような３つの検索方法が提案されている。 In order to solve such a problem, the following three search methods have been proposed in which the weight is changed according to the portion constituting the sentence when calculating the similarity.

（１）第１の検索方法においては、用例・検索対象文の両方に対して、文法・意味的に置換、削除、追加が可能な文中の箇所に各情報を付与して、置換可能な語同士の比較、不要箇所の削除、不足箇所の追加を行って類似度を算出する（例えば、特許文献３参照）。これによって、文の骨子により重点を置いた類似度の算出が行えるようになる。なお、この第１の検索方法では、用例及び検索対象文の両方に対して事前に情報を付与しておく必要があるが、文同士を比較することで、置換、削除、追加が半自動的に行われる。例えば、削除可能な部分の指定に関して、以下の２つの用例Ａ１、Ａ２を比較して、修飾語である「貴重な」は削除可能と自動的に判断する。
用例Ａ１：中山がフリーキックで得点を上げた。
用例Ａ２：そして、３０分後に、中田がＰＫで貴重な得点をした。 (1) In the first search method, for both the example and the search target sentence, each information is assigned to a place in the sentence that can be replaced, deleted, or added grammatically and semantically, and the replaceable word The degree of similarity is calculated by comparing each other, deleting unnecessary portions, and adding insufficient portions (see, for example, Patent Document 3). This makes it possible to calculate the degree of similarity with more emphasis on the outline of the sentence. In this first search method, it is necessary to give information to both the example and the search target sentence in advance, but replacement, deletion, and addition are semi-automatically by comparing the sentences. Done. For example, regarding the designation of a part that can be deleted, the following two examples A1 and A2 are compared, and it is automatically determined that the modifier “precious” can be deleted.
Example A1: Nakayama scored with a free kick.
Example A2: And 30 minutes later, Nakata scored a valuable score with PK.

（２）第２の検索方法においては、用例・検索対象文の両方に対して、主語と動詞、動詞と目的語の関係を抽出して「文断片」を作成し、その文断片ごとに類似判定を行う。複雑な構造を持つ文は、文断片ごとに類似度判定が行われるため、文の一部ではあっても検索対象文と類似度の高い部分を含む用例を検出することができる（例えば、特許文献４参照）。例えば、以下のような検索対象文Ｘ１と用例Ａ３、Ａ４とがあった場合、どちらの用例Ａ３、Ａ４に対しても検索文全体としての類似度はあまり高くないが、用例Ａ３は検索対象文の前半と、用例Ａ４では後半との類似度が高いため、どちらの用例Ａ３、Ａ４も類似文として検出される。
検索対象文Ｘ１：「応募総数は、過去最高の４１５万編で、参加学校は３万校近くにのぼっている。」
用例Ａ３：「応募総数は、過去最高の約４１５万編にも及んだ。」
用例Ａ４：「国内外の参加学校数は、３万校近くにのぼっている。」。 (2) In the second search method, a “sentence fragment” is created by extracting the relationship between the subject and the verb and the verb and the object for both the example and the search target sentence, and each sentence fragment is similar. Make a decision. A sentence having a complicated structure is subjected to similarity determination for each sentence fragment, so that it is possible to detect an example including a part having a high degree of similarity to a search target sentence even though it is a part of the sentence (for example, patents) Reference 4). For example, when there is a search target sentence X1 and examples A3 and A4 as shown below, the similarity as a whole search sentence is not so high for either example A3 or A4, but the example A3 is a search target sentence. Since the similarity between the first half and the second half in example A4 is high, both examples A3 and A4 are detected as similar sentences.
Search target sentence X1: “The total number of applications is the highest of 41.50 million, and there are nearly 30,000 participating schools.”
Example A3: “The total number of applications reached a record high of about 41.50 million.”
Example A4: “The number of participating schools in Japan and overseas is close to 30,000.”

（３）第３の検索方法においては、文の構成要素を意味の塊(チャンク)ごとにまとめ、主格、目的格などの構文の骨格を構成する格情報を持つチャンクの重みを大きくし、チャンクの中心となる語にさらに重みを付加して類似度を算出する（例えば、特許文献５参照）。例えば、検索対象文が「オーストラリアの名将フェリマンは女子４００メートル決勝戦の金メダルを獲得した。」であり、用例集の中に「２０００年９月２５日、北京時間午後４時、女子４００メートル決勝戦で、オーストラリアの名将フェリマンは金メダルを獲得した。」という文が存在した場合、「女子４００メートル決勝戦」、「オーストラリアの名将フェリマン」、「金メダル」、「獲得した」等の文意を取る上で重要な構成要素の重みが大きくなる。これによって、文の骨子により重点を置いた類似度の算出が行えるようになる。 (3) In the third search method, the constituent elements of the sentence are grouped into chunks of meanings, the weight of the chunk having the case information constituting the skeleton of the syntax such as the main case and the purpose case is increased, and the chunk is The similarity is calculated by adding a weight to the word that is the center of (see, for example, Patent Document 5). For example, the search target sentence is "Australian famous general Ferriman has won a gold medal for the 400m women's final". In the example book, "September 25, 2000, 4pm Beijing time, 400m final for women" In the battle, if there is a sentence saying "Australian veteran Ferriman won a gold medal", take the wording such as "Women's 400 meter final", "Australian vice versa Ferriman", "Gold medal", "I won" The weight of the important components above increases. This makes it possible to calculate the degree of similarity with more emphasis on the outline of the sentence.

これら３つの検索方法により、検索対象文に対して文の骨子が類似している文や、文全体としての類似度はあまり高くないが、文の一部に検索対象文と類似度の高い部分を持つ文の検出を行うことができる。
特開平１１−１１０３９５号公報特開２０００−２４２６５０号公報特開２００１−３５７０６５号公報特開２００５−１９０１８５号公報特開２００６−６５３８７号公報 With these three search methods, sentences that are similar to the search target sentence or similar to the whole sentence but not so high, but part of the sentence that has high similarity to the search target sentence Can be detected.
Japanese Patent Application Laid-Open No. 11-110395 JP 2000-242650 A JP 2001-357065 A JP-A-2005-190185 JP 2006-65387 A

しかしながら、前述の３つの検索方法で重点が置かれている文の構成要素は、ユーザが注目している部分と必ずしも一致するとは限らず、これらの検索方法をもってしても、ユーザが注目している特定の言い回しを含んだ類似文の検出結果がユーザの要望を満たすことができない。例えば、下記の検索対象文Ｘ２に対し、用例データベースに用例Ａ５、Ａ６、Ａ７が格納されている場合を例に挙げて説明する。
検索対象文Ｘ２：患者４０人と健常成人２０名を、○○薬を使用する群としない群にそれぞれランダムに割り付けた。
用例Ａ５：手術可能な△△疾患患者１００人を、術前にＸＸ薬を３サイクル投与後に手術を実施した群と、手術単独群にランダムに割り付けた。
用例Ａ６：たとえば、６０の工場の半数をランダムに治療群に、半数を対照群に割り付ける。
用例Ａ７：小児をランダム化により化学療法実施群と標準的な追跡ケア群とに割り付けた。 However, the sentence components emphasized in the three search methods described above do not necessarily match the part that the user is paying attention to. Even with these search methods, the user pays attention. The detection result of a similar sentence including a specific wording cannot satisfy the user's request. For example, a case where the examples A5, A6, and A7 are stored in the example database for the following search target sentence X2 will be described as an example.
Search target sentence X2: 40 patients and 20 healthy adults were randomly assigned to a group using or not using XX medicine.
Example A5: 100 patients with △ Δ disease who can be operated were randomly assigned to a group in which surgery was performed after 3 cycles of administration of XX drug before surgery and a group in which surgery was performed alone.
Example A6: For example, half of 60 factories are randomly assigned to treatment groups and half to control groups.
Example A7: Children were randomly assigned to a chemotherapy group and a standard follow-up care group.

第１の検索方法においては、ユーザが修飾語に注目していても、修飾語は削除可能と判断されてしまい、類似度算出の際に考慮されないという問題がある。例えば、検索対象文Ｘ２は医学統計に関する文であり、この文中の「割り付け方がランダムであること」が重要なポイントである。にもかかわらず、第１の検索方法においては、「ランダムな」が修飾語（形容動詞）であるという理由の元に、重要なキーワードを含む文と含まない文との間で類似度に差が生じない。これによって、「ランダムな」を含む文が優先的に検出されることはなく、検出されたとしても、希望する文とそうでない文との区別に手間がかかる。 The first search method has a problem that even if the user pays attention to the modifier, it is determined that the modifier can be deleted and is not taken into account when calculating the similarity. For example, the search target sentence X2 is a sentence related to medical statistics, and “the assignment method is random” in this sentence is an important point. Nevertheless, in the first search method, there is a difference in similarity between a sentence including an important keyword and a sentence not including it because the word “random” is a modifier (adjective verb). Does not occur. As a result, a sentence including “random” is not detected preferentially, and even if it is detected, it takes time to distinguish a desired sentence from a sentence that is not.

同様に、上述した第２の検索方法においても、「ランダムな」が類似度計算において考慮されないという問題がある。これに加えて、検索対象文Ｘ２の「割り付ける」の間接目的語である「○○薬を使用する群」及び「しない群」を、用例Ａ５の「術前にＸＸ薬を３サイクル投与後に手術を実施した群」及び「手術単独群」と比較した場合、「群」以外の語の占める割合が高いため、両者の類似度は低くなってしまうという問題がある。これは、第２の検索方法では類似度を比較する単位を細かくしてはいるが、ユーザが注目する部分自体には相変わらず重みが与えられていないことに起因する。すなわち、ユーザが検出を望んでいるのは、「〜群と〜群」という「群」を合成語の一部として含む２つの語が並列関係にある文であり、「〜」の部分はあくまでも任意なので、この関係自体に重みを与えなくては、ユーザの検索の目的を満足することはできない。従って、用例Ａ６、Ａ７についても検索されない可能性がある。 Similarly, the second search method described above has a problem that “random” is not considered in the similarity calculation. In addition to this, the indirect object of “assign” in the search target sentence X2 is “the group that uses XX medicine” and “the group that does not use”. Compared with “the group that performed” and “the surgery alone group”, since the ratio of words other than “group” is high, the similarity between the two becomes low. This is due to the fact that in the second search method, the unit for comparing the degree of similarity is made fine, but the weights are still not given to the portions that the user pays attention to. In other words, what the user wants to detect is a sentence in which two words including “group”, “group” and “group”, which are part of a compound word, are in a parallel relationship. Since it is arbitrary, the user's search purpose cannot be satisfied without giving weight to this relationship itself. Therefore, there is a possibility that the examples A6 and A7 are not searched.

一方、上述した第３の検索方法においては、意味のチャンクの中心語に対して重みが加味されるため、上記の例でいうと「群」に重みは加味される。しかしながら、検出の基準はあくまでも文全体の類似度であるため、やはり用例２と検索対象文との類似度は低くなってしまう。また、修飾語である「ランダムな」は格要素を構成しないため、やはり類似度の重みは加味されないという問題は依然として存在する。 On the other hand, in the third search method described above, weights are added to the central word of the meaning chunk, so in the above example, weights are added to the “group”. However, since the detection criterion is the similarity of the whole sentence, the similarity between the example 2 and the search target sentence is low. Further, since the modifier “random” does not constitute a case element, there still remains a problem that the weight of similarity is not taken into account.

本発明の目的は、検索対象文の中でユーザが注目する特定の言い回しがある場合、その特定の言い回しを含む用例をデータベースから効率良く検索することができる類似文検索システム及びプログラムを提供することを目的とする。 An object of the present invention is to provide a similar sentence search system and program capable of efficiently searching an example including a specific phrase from a database when there is a specific phrase noted by the user in the search target sentence. With the goal.

本発明に係わる類似文検索システムは、類似文検索プログラム及び用例データベースを記憶した記憶装置と、類似文の用例検索の検索対象文を入力するとともに操作に必要な情報を入力する入力装置と、前記検索対象文や前記用例データベースから検索された用例を表示する表示装置と、前記類似文検索プログラムを演算実行するＣＰＵとを備えた類似文検索システムにおいて、前記入力装置から入力された検索対象文を受け付ける検索対象入力部と、前記検索対象入力部で受け付けられた検索対象文中のキーワードの指定を受け付けるキーワード指定部と、前記検索対象入力部で受け付けられた検索対象文を構成する各語の構文・意味上の役割情報を解析する言語解析部と、前記言語解析部で解析した各語の構文・意味上の役割情報のうち前記キーワードの構文・意味上の役割情報と前記キーワードとを含む検索式を生成する検索式生成部と、前記検索式生成部で生成された検索式を用いて前記キーワードを含む用例を前記用例データベースから検索しその中から前記構文・意味上の役割情報が一致する用例を検索するデータベース検索部と、前記データベース検索部で検索された用例を前記表示装置に出力する出力部とを備えたことを特徴とする。 A similar sentence search system according to the present invention includes a storage device storing a similar sentence search program and an example database, an input device for inputting a search target sentence for an example search of a similar sentence and inputting information necessary for an operation, In a similar sentence search system comprising a display device that displays a search target sentence and an example searched from the example database, and a CPU that executes the similar sentence search program, the search target sentence input from the input device is A search target input unit that accepts, a keyword specification unit that receives specification of a keyword in a search target sentence received by the search target input unit, and a syntax of each word constituting the search target sentence received by the search target input unit The language analysis unit that analyzes semantic role information, and the syntax and semantic role information of each word analyzed by the language analysis unit A search expression generation unit that generates a search expression including syntactic and semantic role information of the keyword and the keyword, and an example including the keyword using the search expression generated by the search expression generation unit A database search unit that searches for an example in which the syntactic and semantic role information matches, and an output unit that outputs the example searched in the database search unit to the display device. Features.

本発明によれば、検索対象文の中でユーザが注目する特定の言い回しがある場合、その特定の言い回しを含む用例をデータベースから効率良く検索することができる。 ADVANTAGE OF THE INVENTION According to this invention, when there exists a specific phrase which a user pays attention in a search object sentence, the example containing the specific phrase can be searched efficiently from a database.

図１は、本発明の実施の形態に係わる類似文検索システムの構成図である。類似文検索システム１１は、入力された検索対象文の類似文を用例として用例データベースから検索して出力するものであり、例えば一般的なコンピュータに類似文検索プログラムなどのソフトウェアプログラムがインストールされ、そのソフトウェアプログラムがＣＰＵ１２において実行されることにより実現される。類似文検索システム１１は、ＣＰＵ１２、ＲＯＭ（Read Only Memory）１３及びＲＡＭ（Random Access Memory）１４がバス１５を介して接続されている。バス１５には、入力装置１６、表示装置１７、及び記憶装置１９が接続されている。 FIG. 1 is a configuration diagram of a similar sentence search system according to an embodiment of the present invention. The similar sentence search system 11 searches and outputs a similar sentence of an input search target sentence from an example database as an example. For example, a software program such as a similar sentence search program is installed in a general computer, A software program is executed by the CPU 12. In the similar sentence search system 11, a CPU 12, a ROM (Read Only Memory) 13, and a RAM (Random Access Memory) 14 are connected via a bus 15. An input device 16, a display device 17, and a storage device 19 are connected to the bus 15.

記憶装置１９には、類似文検索プログラム２０が記憶されるとともに用例データベース２１及び類義語辞書２２が記憶される。類似文検索プログラム２０は、検索対象文入力部２３、キーワード指定部２４、言語解析部２５、検索制御部２６、検索式生成部２７、データベース検索部２８、出力部２９を有する。 The storage device 19 stores a similar sentence search program 20 and an example database 21 and a synonym dictionary 22. The similar sentence search program 20 includes a search target sentence input unit 23, a keyword specification unit 24, a language analysis unit 25, a search control unit 26, a search expression generation unit 27, a database search unit 28, and an output unit 29.

ＣＰＵ１２は、入力装置１６からの入力信号に基づいてＲＯＭ１３から機械翻訳装置１１を起動するためのブートプログラムを読み出して実行し、さらに記憶装置１９に記憶された図示省略のオペレーティングシステムを読み出す。ＣＰＵ１２は、入力装置１６の入力信号に基づいて、各装置の制御を行い、記憶装置１９などに記憶された類似文検索プログラム２０、用例データベース２１、類義語辞書２２のデータを読み出してＲＡＭ１４にロードするとともに、ＲＡＭ１４から読み出されたプログラムのコマンドに基づいて、後述する類似文検索処理を実現する。 The CPU 12 reads out and executes a boot program for starting the machine translation device 11 from the ROM 13 based on an input signal from the input device 16, and further reads an operating system (not shown) stored in the storage device 19. The CPU 12 controls each device based on the input signal of the input device 16, reads out the similar sentence search program 20, the example database 21, and the synonym dictionary 22 stored in the storage device 19 and loads them into the RAM 14. At the same time, a similar sentence search process to be described later is realized based on the command of the program read from the RAM 14.

入力装置１６は、類似文の用例検索の検索対象文の文字データやファイルデータ等のデータやコマンドを入力する入力手段であり、通常、キーボードやマウス・タッチパネルなどのポインティングデバイス、音声認識や文字認識機能、あるいは、ＣＤドライブなどの外部記憶媒体読取装置、ネットワーク入力装置などによって実現される。表示装置１７は、入力装置１６から入力されたデータや類似文の用例検索結果等の出力手段であり、ＣＲＴ表示装置や液晶ディスプレイ表示装置が使用される。 The input device 16 is an input means for inputting data and commands such as character data and file data of a search target sentence in a similar sentence example search. Usually, the input device 16 is a pointing device such as a keyboard or a mouse / touch panel, voice recognition or character recognition. This function is realized by an external storage medium reading device such as a CD drive or a network input device. The display device 17 is an output means such as data input from the input device 16 and a similar sentence example search result, and a CRT display device or a liquid crystal display device is used.

図２は、本発明の実施の形態に係わる類似文検索システム１１のＣＰＵ１２の機能ブロック図である。図２に示すＣＰＵ１２内の各機能ブロックは、類似文検索プログラム２０を構成する各部、すなわち、検索対象文入力部２３、キーワード指定部２４、言語解析部２５、検索制御部２６、検索式生成部２７、データベース検索部２８、出力部２９に対応する。 FIG. 2 is a functional block diagram of the CPU 12 of the similar sentence search system 11 according to the embodiment of the present invention. Each functional block in the CPU 12 shown in FIG. 2 includes each unit constituting the similar sentence search program 20, that is, a search target sentence input unit 23, a keyword designation unit 24, a language analysis unit 25, a search control unit 26, and a search expression generation unit. 27 corresponds to the database search unit 28 and the output unit 29.

検索対象文入力部２３は、類似文を検索する対象となる文（検索対象文）の入力を受け付けるものである。例えば、検索対象文は、ユーザ自身が入力装置１６であるキーボード等から直接入力したり、記憶媒体を介してディスクドライブから入力される。検索対象文入力部２３は、入力装置１６から入力された検索対象文を受け付ける。 The search target sentence input unit 23 receives an input of a sentence (search target sentence) to be searched for similar sentences. For example, the search target sentence is directly input by the user himself / herself from a keyboard or the like which is the input device 16, or is input from a disk drive via a storage medium. The search target sentence input unit 23 receives a search target sentence input from the input device 16.

キーワード指定部２４は、検索対象文においてユーザが注目する語（キーワード）の指定を受け付けるものである。例えば、キーワードは、入力装置１６であるマウス等によるドラッグ操作によって指定される。詳細については後述するように、キーワードは検索式を生成するために使用されるものであり、１つだけ指定された場合には、検出される類似文に必ず含まれ、複数指定された場合には、優先順位と検出される類似文に必ず含まれるかどうかという条件を指定することになる。 The keyword specifying unit 24 receives a word (keyword) specified by the user in the search target sentence. For example, the keyword is specified by a drag operation using a mouse or the like as the input device 16. As will be described in detail later, the keyword is used to generate a search expression. If only one keyword is specified, it will always be included in the detected similar sentence, and if multiple keywords are specified, Specifies the condition that the priority order is always included in the detected similar sentence.

言語解析部２５は、文の言語解析を行うことにより、文を構成する各語の構文・意味上の役割情報を取得するものである。構文・意味上の役割情報は、例えば、文を構成する各語の品詞、格、並列表現等の情報である。具体的には、自然言語処理で一般的に用いられる解析手法である、形態素解析、構文解析、係り受け解析及び意味解析を行うことにより、文を構成する各語の構文・意味上の役割情報を取得する。なお、言語解析部２５による解析は、入力された検索対象文と、用例データベース２１に格納された用例の両方に対して行われる。 The language analysis unit 25 acquires the role information on the syntax and meaning of each word constituting the sentence by performing language analysis of the sentence. The role information on the syntax and meaning is, for example, information such as part of speech, case, and parallel expression of each word constituting the sentence. Specifically, by performing morphological analysis, syntactic analysis, dependency analysis, and semantic analysis, which are analysis methods generally used in natural language processing, role information on the syntax and semantics of each word constituting a sentence To get. The analysis by the language analysis unit 25 is performed on both the input search target sentence and the examples stored in the example database 21.

用例データベース２１には、文を構成する語と、文脈中における各語の構文・意味上の役割情報とを含む複数の用例が登録されている。用例は検索対象文に類似する類似文である。用例データベース２１への用例の登録は、例えば、入力装置１６であるディスクドライブに挿入された記憶媒体を介して行われる。なお、ここでは予め各用例において、文を構成する各語に対して言語解析部２５を用いて得られた構文・意味上の役割情報が関連付けられているものとする。 In the example database 21, a plurality of examples including words constituting a sentence and syntax / semantic role information of each word in the context are registered. The example is a similar sentence similar to the search target sentence. Registration of the example in the example database 21 is performed, for example, via a storage medium inserted in a disk drive that is the input device 16. Here, in each example, it is assumed that syntax / semantic role information obtained by using the language analysis unit 25 is associated with each word constituting a sentence in advance.

検索制御部２６は、後述する検索式生成部２７、データベース検索部２８及び出力部２９を制御するものである。 The search control unit 26 controls a search expression generation unit 27, a database search unit 28, and an output unit 29, which will be described later.

検索式生成部２７は、検索制御部２６で制御され、キーワード指定部２４で指定されたキーワードと、検索対象文を構成する各語及びその構文・意味上の役割情報とを用いて検索式を生成するものである。すなわち、言語解析部２５で解析した各語の構文・意味上の役割情報のうち、キーワードの構文・意味上の役割情報とキーワードとを含む検索式を生成する。これにより、ユーザが指定したキーワードだけでなく、その構文・意味上の役割情報を含んだ検索式が作成される。検索式の詳細については後述する。 The search expression generation unit 27 is controlled by the search control unit 26 and uses the keyword specified by the keyword specifying unit 24, each word constituting the search target sentence, and its syntactic / semantic role information to generate a search expression. Is to be generated. That is, among the syntax / semantic role information of each word analyzed by the language analysis unit 25, a search expression including the keyword syntax / semantic role information and the keyword is generated. As a result, a search expression including not only the keyword specified by the user but also the role information in the syntax and meaning is created. Details of the search expression will be described later.

データベース検索部２８は、検索制御部２６で制御され、用例データベース２１に登録された用例を検索するものである。具体的には、用例データベース２１に登録された用例と検索式との間で、キーワード及びキーワードに対応付けられた構文・意味上の役割情報の比較を行い、一致もしくは類似する用例を検索する。 The database search unit 28 is controlled by the search control unit 26 and searches for an example registered in the example database 21. Specifically, the example and the role information on the syntax and semantics associated with the keyword are compared between the example registered in the example database 21 and the search formula, and the example that matches or is similar is searched.

類義語辞書２２には、特定の語に対して類義語関係にある語が関連付けて登録されている。類義語辞書２２は、検索式と用例データベース２１に登録された用例との比較を行う際に参照されて、類義語の関係にある語同士は一致しているとみなされる。類義語辞書２２を参照した検索を行うかどうかは、ユーザが指定することができる。 In the synonym dictionary 22, a word having a synonym relation is registered in association with a specific word. The synonym dictionary 22 is referred to when comparing the search expression with the example registered in the example database 21, and the words having the synonym relation are regarded as matching. The user can specify whether to perform a search with reference to the synonym dictionary 22.

出力部２９は、検索制御部２６で制御され、用例データベース２１に登録された用例の検索結果を出力するものである。例えば、表示装置１３に表示することで検索結果を出力する。なお、出力は表示装置１３への出力に限定されるものではなく、表示装置１３による出力に代え、あるいは、表示装置１３による出力に加えて音声や印刷装置により検索結果を出力するようにしてもよい。 The output unit 29 is controlled by the search control unit 26 and outputs the search result of the example registered in the example database 21. For example, the search result is output by displaying on the display device 13. The output is not limited to the output to the display device 13. Instead of the output from the display device 13, the search result may be output by voice or a printing device in addition to the output by the display device 13. Good.

次に、本発明の実施の形態に係わる類似文検索システム１１における検索処理内容について説明する。図３は、本発明の実施の形態に係わる類似文検索システム１１における検索処理内容を示すフロー図である。 Next, search processing contents in the similar sentence search system 11 according to the embodiment of the present invention will be described. FIG. 3 is a flowchart showing the search processing contents in the similar sentence search system 11 according to the embodiment of the present invention.

本発明の実施の形態に係わる類似文検索システム１１を用いて検索処理を行う場合、まず、ユーザから入力装置１６を介して検索対象文が入力されるので、検索対象文入力部２３は、この検索対象文の検索文字列の入力を受け付ける（Ｓ１１）。検索対象文が入力された後においては、ユーザから検索対象文におけるキーワードが指定されるので、キーワード指定部２４は、キーワードが指定されたときは、このキーワードの指定を受け付ける（Ｓ１２）。 When a search process is performed using the similar sentence search system 11 according to the embodiment of the present invention, first, a search target sentence is input from the user via the input device 16. The input of the search character string of the search target sentence is accepted (S11). Since the keyword in the search target sentence is specified by the user after the search target sentence is input, the keyword specifying unit 24 accepts the specification of the keyword when the keyword is specified (S12).

キーワードが受け付けられると、言語解析部２５は、検索対象文の言語解析を行う（Ｓ１３）。これにより、検索対象文を構成する各語の構文・意味上の役割情報が取得され、各語に対して構文・意味上の役割情報が付与される。そして、検索式生成部２７は、指定されたキーワードと、各キーワードに対応付けられた構文・意味上の役割情報とから検索式を生成する（Ｓ１４）。 When the keyword is accepted, the language analysis unit 25 performs language analysis of the search target sentence (S13). Thereby, the syntax / semantic role information of each word constituting the search target sentence is acquired, and the syntax / semantic role information is given to each word. Then, the search expression generation unit 27 generates a search expression from the specified keyword and the syntactic / semantic role information associated with each keyword (S14).

検索式が生成されたならば、データベース検索部２８は、用例データベース２１に登録された用例と検索式との間で、キーワードとそのキーワードに対応付けられた構文・意味上の役割情報とを用いて一致または類似する用例を検索する（Ｓ１５）。そして、類似度がユーザにより設定された閾値以上となった用例が存在するか否か、つまり、検索結果が得られたか否かを判定し（Ｓ１６）、類似度がユーザにより設定された閾値以上となった用例が存在する場合には、当該用例を用例データベース２１から抽出して検索結果とする（Ｓ１７）。また、複数の検索結果が存在する場合には、類似度が高い順に並べ替える。一方、類似度がユーザにより設定された閾値以上の用例データが存在しない場合には、該当する用例が存在しない旨を検索結果とする（Ｓ１８）。そして、データベース検索部２８による検索が完了すると、出力部２９は、その検索結果を出力する（Ｓ１９）。このようにして一連の類似文検索処理が終了する。 If the search expression is generated, the database search unit 28 uses the keyword and the syntactic / semantic role information associated with the keyword between the example registered in the example database 21 and the search expression. Are matched or similar (S15). Then, it is determined whether or not there is an example whose similarity is equal to or higher than the threshold set by the user, that is, whether or not a search result is obtained (S16), and the similarity is equal to or higher than the threshold set by the user. If there is an example that becomes, the example is extracted from the example database 21 and used as a search result (S17). If there are a plurality of search results, the search results are sorted in descending order of similarity. On the other hand, if there is no example data whose similarity is equal to or greater than the threshold set by the user, the search result is that there is no corresponding example (S18). When the search by the database search unit 28 is completed, the output unit 29 outputs the search result (S19). In this way, a series of similar sentence search processing ends.

いま、ユーザから検索対象文として「患者４０人と健常成人２０名を、○○薬を使用する群と○○薬を使用しない群にそれぞれランダムに割り付けた。」（検索対象文Ｘ２）が入力され、また、キーワードとして「割り付け」、「群」(２箇所)、「ランダム」が指定され、検索結果にはこれらのキーワードすべてが含まれる（必須）が指定されたとする。 Now, “40 patients and 20 healthy adults are randomly assigned to a group that uses XX drugs and a group that does not use XX drugs” (search target sentence X2) as a search target sentence. Further, it is assumed that “assignment”, “group” (two places), and “random” are designated as keywords, and that all of these keywords are included (required) in the search result.

このような検索対象文Ｘ２の入力及びキーワードの指定は、検索対象文入力部２３及びキーワード指定部２４により受け付けられる（Ｓ１１、Ｓ１２）。そうすると、言語解析部２５は、当該検索対象文Ｘ２の言語解析を行う（Ｓ１３）。図４は言語解析部２５での言語解析結果を示す説明図である。 The input of the search target sentence X2 and the specification of the keyword are accepted by the search target sentence input unit 23 and the keyword specifying unit 24 (S11, S12). Then, the language analysis unit 25 performs language analysis of the search target sentence X2 (S13). FIG. 4 is an explanatory diagram showing a language analysis result in the language analysis unit 25.

図４に示すように、検索対象文Ｘ２を構成する各語のツリー構造が得られ、検索対象文Ｘ２を構成する自立語（ノード）と、各自立語間の構文・意味上の関係（リンク）が明らかになる。この場合、キーワードである「割り付け」にはキーワードである「ランダム」が副詞として修飾しており、さらにキーワードである「群」を含む２つの名詞句が「に格（間接目的語）」として接続していることが明らかになる。つまり、キーワードの構文・意味上の役割情報が明らかになる。なお、図４では、簡潔のため、キーワード以外の語に関するリンクを適宜省略した形で記述している。 As shown in FIG. 4, a tree structure of each word constituting the search target sentence X2 is obtained, and the independent words (nodes) constituting the search target sentence X2 and the syntactic and semantic relationships (links) between the independent words ) Becomes clear. In this case, the keyword “Random” is modified by the keyword “Random” as an adverb, and two noun phrases including the keyword “Group” are connected as “Nice case (indirect object)”. It becomes clear that In other words, the role information on the syntax and semantics of the keyword becomes clear. In FIG. 4, for the sake of brevity, links relating to words other than keywords are described in a form that is appropriately omitted.

このような検索対象文Ｘ２の言語解析が完了すると、検索式生成部２７により、ステップＳ１２で指定されたキーワード「割り付け」、「群」、「ランダム」と、ステップＳ１３で取得された構文・意味上の役割情報とから検索式が生成される（Ｓ１４）。 When the language analysis of the search target sentence X2 is completed, the search expression generation unit 27 uses the keywords “assignment”, “group”, “random” specified in step S12, and the syntax and meaning acquired in step S13. A search formula is generated from the above role information (S14).

図５は、キーワードの構文・意味上の役割情報とキーワードとを含む検索式の説明図である。図５（ａ）は検索式をツリー構造で表現したもの、図５（ｂ）は検索式を文字列で表現したものである。なお、図５（ａ）でキーワードの単語に役割情報が付いていることと、図５（ｂ）で「(必)」という文字が付いているのは、指定されたキーワードが「必須」であることを示す。 FIG. 5 is an explanatory diagram of a search expression including role information on the syntax and meaning of keywords and keywords. FIG. 5A shows a search expression expressed in a tree structure, and FIG. 5B shows a search expression expressed in a character string. Note that the role information is attached to the keyword word in FIG. 5A, and the letter “(required)” in FIG. 5B is that the specified keyword is “required”. Indicates that there is.

このような検索式が生成されると、データベース検索部２８により、当該検索式を用いて用例データベース２１に登録された用例データの検索が行われる（Ｓ１５）。具体的には、まず「必須」のキーワードを含む文を、用例データベース２１に登録された用例の中から検索する。次に、検索された結果（１次検索結果）に対してツリー構造のノード及びリンクのマッチングを行うことで、１次検索結果の絞込みを行う。なお、このようなツリー構造を基にしたマッチングの手法は、例えば、本出願人による特開２００５−２０８８２５号公報に開示される手法を用いる。 When such a search expression is generated, the database search unit 28 searches the example data registered in the example database 21 using the search expression (S15). Specifically, first, a sentence including the keyword “essential” is searched from the examples registered in the example database 21. Next, the primary search results are narrowed down by matching the nodes and links of the tree structure with the search results (primary search results). Note that, as a matching technique based on such a tree structure, for example, a technique disclosed in Japanese Patent Laid-Open No. 2005-208825 by the present applicant is used.

図６はデータベース検索部２８により検索された用例の一例の説明図である。図６では、前述した用例Ａ５「手術可能な△△疾患患者１００人を、術前にＸＸ薬を３サイクル投与後に手術を実施した群と、手術単独群にランダムに割り付けた。」である場合を示している。 FIG. 6 is an explanatory diagram of an example of an example searched by the database search unit 28. In FIG. 6, in the case of the above-mentioned Example A5, “100 patients with △ Δ disease who can be operated on were randomly assigned to the group in which surgery was performed after 3 cycles of administration of the XX drug and the surgery alone group”. Is shown.

すなわち、図６に示すように、用例Ａ５は図５に示した検索対象文Ｘ２の検索式と同一のツリー構造を持つ。従って、検索式と同一のツリー構造を持つ用例Ａ５が検出される。そして、データベース検索部２８による検索が完了すると、出力部２９は、その検索結果を出力する（Ｓ１７）。この一例では、キーワードはすべて「必須」としたため、ツリー構造が完全に一致したものだけが検出されているが、キーワードに「必須」でないものを指定した場合と、類義語辞書２２を使用する場合は、類似度が指定した閾値以上となった用例データが検出される。これらの具体例については後述する。 That is, as shown in FIG. 6, the example A5 has the same tree structure as the search expression of the search target sentence X2 shown in FIG. Therefore, the example A5 having the same tree structure as the search expression is detected. When the search by the database search unit 28 is completed, the output unit 29 outputs the search result (S17). In this example, since all the keywords are “required”, only those having a completely matched tree structure are detected. However, when a keyword that is not “required” is specified, or when the synonym dictionary 22 is used. The example data whose similarity is equal to or greater than the specified threshold value is detected. Specific examples of these will be described later.

ここで、データベース検索部２８では、検索対象文や検索式は文字列データで扱う。例えば、前述の検索対象文Ｘ２「患者４０人と健常成人２０名を、○○薬を使用する群としない群にそれぞれランダムに割り付けた。」及び図５の検索式は、以下の形式で扱われる。 Here, in the database search unit 28, the search target sentence and the search expression are handled as character string data. For example, the above-described search target sentence X2 “40 patients and 20 healthy adults were randomly assigned to a group that uses XX medicine and a group that does not use XX medicine” and the search formula in FIG. Is called.

検索対象文：割り付け(fukushi_ランダム kan-moku_群(rentai-shu_使用(choku-moku_薬) heiretu_群(hitei_syouryaku使用)) choku-moku_患者(shu_４０人 heiretu_成人(shu_健常(shu_２０名))) fukushi_それぞれ)
検索式：割り付け@@(fukushi _ランダム@@kan-moku_群@@(heiretu_群@@))
この一例においては、カッコが単語同士の繋がりを表しており、ローマ字で記された文字列が構文・意味上の役割情報を表している。また、記号「@@」は、キーワードが「必須」であることを示している。これらの用例と検索式は、「割り付け(fukushi _ランダム kan-moku_群(heiretu_群))」という部分が一致することで、マッチしたと判断されることとなる。 Search target sentence: Allocation (fukushi_random kan-moku_group (rentai-shu_used (choku-moku_drug) heiretu_group (hitei_syouryaku used)) choku-moku_patient (shu_40 heiretu_adult (shu_healthy) (shu_20 people))) fukushi_ each)
Search formula: Allocation @@ (fukushi _random @@ kan-moku_group @@ (heiretu_group @@))
In this example, parentheses indicate a connection between words, and a character string written in Roman characters indicates syntax / semantic role information. The symbol “@@” indicates that the keyword is “essential”. These examples and the search formulas are judged to match because the part of “assignment (fukushi_random kan-moku_group (heiretu_group))” matches.

以上説明した検索の内容について、より具体的に説明する。用例データベース２１に以下のような５個の用例Ａ５〜Ａ９が登録されており、検索対象文Ｘ２につき、以下の３つの検索方法１、２、３で検索した場合について説明する。 The contents of the search described above will be described more specifically. The following five examples A5 to A9 are registered in the example database 21, and a case where the search target sentence X2 is searched by the following three search methods 1, 2, and 3 will be described.

（検索対象文）
検索対象文Ｘ２：患者４０人と健常成人２０名を、○○薬を使用する群としない群にそれぞれランダムに割り付けた。
（用例データベース）
用例Ａ５：手術可能な△△疾患患者１００人を、術前にＸＸ薬を３サイクル投与後に手術を実施した群と、手術単独群にランダムに割り付けた。
用例Ａ６：たとえば、６０の工場の半数をランダムに治療群に、半数を対照群に割り付けるような場合である。
用例Ａ７：小児をランダム化により化学療法実施群と標準的な追跡ケア群とに割り付ける。
用例Ａ８：このバイアスは、対象の選択から始まって、対象群と介入群への割り付け、曝露状態や結果の評価、データの解析などあらゆる時点で起こり得ます。
用例Ａ９：患者を治療群と対照群の２つの群に封筒法などで無作為に割り付け、適切なサンプルサイズで比較試験を行うことが大切といえる。 (Search target text)
Search target sentence X2: 40 patients and 20 healthy adults were randomly assigned to a group using or not using XX medicine.
(Example database)
Example A5: 100 patients with △ Δ disease who can be operated were randomly assigned to a group in which surgery was performed after 3 cycles of administration of XX drug before surgery and a group in which surgery was performed alone.
Example A6: For example, half of 60 factories are randomly assigned to treatment groups and half to control groups.
Example A7: Children are randomly assigned to a chemotherapy group and a standard follow-up care group.
Example A8: This bias can occur at any point in time, starting with target selection, assigning to target and intervention groups, assessing exposure status and results, and analyzing data.
Example A9: It can be said that it is important to randomly assign patients to two groups, a treatment group and a control group, using an envelope method or the like, and to perform a comparative test with an appropriate sample size.

（検索方法１）
キーワードをすべて「必須」として指定した場合である。すなわち、キーワードとして、「割り付け」、「群」、「ランダム」を指定し、各キーワードの優先順位には差がなく、すべてが検索結果に含まれると指定する。そうすると、検索式は以下の通りとなる。
「割り付け(必)＋(副詞_ランダム(必)) ＋(間接目的_(群(必)_並列_群(必)))」
検索の結果、用例データベース２１の５つの用例Ａ５〜Ａ９から、まず、必須のキーワードを含む用例Ａ５、Ａ６、Ａ７が検出される。次に、この１次検索結果から、検索式と同じツリー構造を持つ用例Ａ５と用例Ａ６に検索結果が絞り込まれる。このように、キーワードの一致だけでなく、キーワードの文中における構文・意味上の役割情報までも考慮した検索を行うことで、検索対象文中でのキーワードの使われ方がより近い用例に絞り込んだ、効率の良い検索を行うことができる。 (Search method 1)
This is a case where all keywords are designated as “required”. That is, “assignment”, “group”, and “random” are designated as keywords, and there is no difference in the priority order of the keywords, and all are included in the search result. Then, the search formula is as follows.
"Assignment (required) + (adverb_random (required)) + (indirect purpose_ (group (required) _parallel_group (required)))"
As a result of the search, examples A5, A6, and A7 including essential keywords are first detected from the five examples A5 to A9 in the example database 21. Next, from this primary search result, the search results are narrowed down to example A5 and example A6 having the same tree structure as the search expression. In this way, by performing a search that considers not only keyword matching but also syntactic and semantic role information in the keyword sentence, the usage of the keyword in the search target sentence is narrowed down to examples that are closer. An efficient search can be performed.

（検索方法２）
キーワードの優先順位、及び検出される類似文に必ず含まれるかどうかという条件を指定した場合である。指定したキーワードに加えて「患者」をキーワードに指定して、他の４つのキーワードよりも優先順位は低く、検出結果に必ずしも含まれなくてよい、という条件を指定したとする。この条件では、用例Ａ６は「患者」を含まないが、「患者」は「必須」のキーワードではないため、用例Ａ６は検出の対象となり、「患者」を含む用例Ａ５よりは類似度が低くなる。このように、キーワードの指定の際に条件を指定することによって、検索の網羅度は高くしながらも、注目する言い回しの中で重要度の高い部分により焦点を当てた検索を行うことができる。 (Search method 2)
This is a case where the priority order of keywords and the condition of whether or not to be included in the detected similar sentence are specified. Assume that “patient” is specified as a keyword in addition to the specified keyword, and a condition is specified that the priority is lower than the other four keywords and it is not necessarily included in the detection result. Under this condition, the example A6 does not include “patient”, but “patient” is not a keyword of “essential”, so the example A6 is a detection target, and has a lower similarity than the example A5 including “patient”. . In this way, by specifying a condition when specifying a keyword, it is possible to perform a search that focuses on a portion having a high degree of importance in a wording of interest while a high degree of search coverage.

（検索方法３）
類義語辞書２２を使用するよう指定した場合である。類義語辞書２２を使用するよう指定すると、類義語の関係にある語同士は一致しているとみなされる。類義語辞書２２の中に、「ランダム」と「無作為」とは類義語関係にある語が関連付けられているとすると、用例Ａ９の「無作為」と「ランダム」とが同義語とみなされて、１次検索の際にも検出されるようになる。このように、類義語辞書２２を用いると、キーワードと文字列としては完全一致していなくても、似た言い回しも検出することができる。 (Search method 3)
This is a case where the use of the synonym dictionary 22 is designated. If it is specified that the synonym dictionary 22 is used, words having synonym relations are regarded as matching. In the synonym dictionary 22, if “random” and “random” are associated with words having a synonym relationship, “random” and “random” in Example A9 are regarded as synonyms, It is also detected during the primary search. As described above, when the synonym dictionary 22 is used, similar phrases can be detected even if the keyword and the character string do not completely match.

以上説明したように、本発明の実施の形態による類似文検索システムでは、キーワード及びそのキーワードの構文・意味上の役割情報までも考慮した検索を行うので、検索対象文の中でユーザが注目する特定の言い回しがある場合、その特定の言い回しを含む用例を用例データベース２１から効率良く検索することができる。また、キーワードの指定の際に条件を指定することによって、検索網羅度を高くしながらも、注目する言い回しの中で重要度の高い部分により焦点を当てた検索を行うことができる。さらに、類義語辞書２２を用いることで、キーワードと文字列としては完全一致していなくても、似た言い回しも検出することができる。 As described above, in the similar sentence search system according to the embodiment of the present invention, since the search is performed in consideration of the keyword and the role information on the syntax and semantics of the keyword, the user pays attention in the search target sentence. When there is a specific wording, an example including the specific wording can be efficiently retrieved from the example database 21. In addition, by specifying a condition when specifying a keyword, it is possible to perform a search that focuses on a portion having a high degree of importance in the wording of interest while increasing the search coverage. Furthermore, by using the synonym dictionary 22, even if the keyword and the character string do not completely match, it is possible to detect similar phrases.

本発明の実施の形態に係わる類似文検索システムの構成図。The block diagram of the similar sentence search system concerning embodiment of this invention. 本発明の実施の形態に係わる類似文検索システムのＣＰＵの機能ブロック図。The functional block diagram of CPU of the similar sentence search system concerning embodiment of this invention. 本発明の実施の形態に係わる類似文検索システムにおける検索処理内容を示すフロー図。The flowchart which shows the search processing content in the similar sentence search system concerning embodiment of this invention. 本発明の実施の形態における言語解析部での言語解析結果の一例を示す説明図。Explanatory drawing which shows an example of the language analysis result in the language analysis part in embodiment of this invention. 本発明の実施の形態における検索式生成部で生成された検索式の一例の説明図Explanatory drawing of an example of the search formula produced | generated by the search formula production | generation part in embodiment of this invention 本発明の実施の形態におけるデータベース検索部により検索された用例の一例の説明図。Explanatory drawing of an example of the example searched by the database search part in embodiment of this invention.

Explanation of symbols

１１…類似文検索システム、１２…ＣＰＵ、１３…ＲＯＭ、１４…ＲＡＭ、１５…バス、１６…入力装置、１７…表示装置、１９…記憶装置、２０…類似文検索プログラム、２１…用例データベース、２２…類義語辞書、２３…検索対象文入力部、２４…キーワード指定部、２５…言語解析部、２６…検索制御部、２７…検索式生成部、２８…データベース検索部、２９…出力部 DESCRIPTION OF SYMBOLS 11 ... Similar sentence search system, 12 ... CPU, 13 ... ROM, 14 ... RAM, 15 ... Bus, 16 ... Input device, 17 ... Display device, 19 ... Memory | storage device, 20 ... Similar sentence search program, 21 ... Example database, DESCRIPTION OF SYMBOLS 22 ... Synonym dictionary, 23 ... Search target sentence input part, 24 ... Keyword specification part, 25 ... Language analysis part, 26 ... Search control part, 27 ... Search formula production | generation part, 28 ... Database search part, 29 ... Output part

Claims

It is searched from the storage device storing the similar sentence search program and the example database, the input device for inputting the search target sentence for the example search of the similar sentence and the information necessary for the operation, and the search target sentence and the example database. In a similar sentence search system comprising a display device that displays an example and a CPU that executes the similar sentence search program, a search target input unit that receives a search target sentence input from the input device, and the search target A language specification unit that receives specification of a keyword in a search target sentence received by the input unit, and a language analysis that analyzes syntax / semantic role information of each word constituting the search target sentence received by the search target input unit And syntactic / semantic role information of the keyword among the syntactic / semantic role information of each word analyzed by the language analysis unit And a search expression generation unit that generates a search expression including the keyword, and an example including the keyword is searched from the example database using the search expression generated by the search expression generation unit, and the syntax / meaning A similar sentence search system, comprising: a database search unit that searches for an example that matches the above role information; and an output unit that outputs an example searched by the database search unit to the display device.

The keyword specifying unit, when receiving a plurality of keyword designations from the input device, also accepts a priority order of the accepted keywords and a condition whether or not to be included in a search result. The similar sentence search system described.

A synonym dictionary in which a synonym-related word is registered in association with a specific word is provided, and when there is a request from the input device to use the synonym dictionary for the keyword, the database search unit The similar sentence search system according to claim 1 or 2, wherein an example including a word having a synonym relation is also searched.

Searched from the storage device storing the similar sentence search program and the example database, the input device for inputting the search target sentence of the example of the similar sentence and the information necessary for the operation, and the search target sentence and the example database In a similar sentence search system comprising a display device that displays an example and a CPU that executes the similar sentence search program, a function for receiving a search target sentence input from the input device and a search that is received by a computer A function that accepts keyword specification in the target sentence, a function that analyzes syntax / semantic role information of each word constituting the accepted search target sentence, and a syntax / semantic role information of each analyzed word A function for generating a search expression including syntax and semantic role information of the specified keyword and the keyword, and a generated search A function for searching an example including the keyword using an expression from the example database, searching for an example in which the syntactic and semantic role information matches, and a function for outputting the searched example to the display device; A program to realize