JP6934621B2

JP6934621B2 - Methods, equipment, and programs

Info

Publication number: JP6934621B2
Application number: JP2017102876A
Authority: JP
Inventors: 今出　昌宏; 昌宏今出; 山内　真樹; 真樹山内; 菜々美藤原
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2016-09-27
Filing date: 2017-05-24
Publication date: 2021-09-15
Anticipated expiration: 2037-05-24
Also published as: JP2018055672A

Description

本開示は入力文を翻訳する技術に関するものである。 The present disclosure relates to a technique for translating input sentences.

近年、入力文を翻訳するに際して、単に入力文の機械翻訳結果を提示するのではなく、多面的な翻訳結果をユーザに提示する研究が盛んに行われている。 In recent years, when translating an input sentence, research has been actively conducted in which a multifaceted translation result is presented to a user rather than simply presenting the machine translation result of the input sentence.

例えば、特許文献１は、入力されたテキスト文を同じ内容の別の表現で言い換えた複数の言い換え文を生成し、生成した言い換え文を機械翻訳し、生成した言い換え文の中から翻訳信頼度に基づいて翻訳対象言い換え文の候補を抽出し、抽出した翻訳対象言い換え文の中から翻訳対象の言い換え文を特定する技術を開示する。 For example, Patent Document 1 generates a plurality of paraphrase sentences in which an input text sentence is paraphrased by another expression having the same content, machine translates the generated paraphrase sentences, and translates the generated paraphrase sentences into translation reliability. Based on this, candidates for translation target paraphrase sentences are extracted, and a technique for identifying translation target paraphrase sentences from the extracted translation target paraphrase sentences is disclosed.

特許文献２は、機械翻訳の不確かさを補完するために、入力原文と近い表現を持つ例文を検索し、検索した例文に対応する目的言語の対訳テキストを取得し、取得した対訳テキストを入力原文の機械翻訳結果と合わせて表示する技術を開示する。 In Patent Document 2, in order to supplement the uncertainty of machine translation, an example sentence having an expression close to the input original sentence is searched, a parallel translation text of the target language corresponding to the searched example sentence is acquired, and the acquired parallel translation text is input. Disclose the technology to display together with the machine translation result of.

特開２０１２−１５９９６９号公報Japanese Unexamined Patent Publication No. 2012-1599969 特許第５１０３７１８号Patent No. 5103718

しかし、上記従来の技術は、翻訳機が備える知識空間を増強しなければ、翻訳信頼度の向上が望めないという課題があるので、更なる改善の必要がある。 However, the above-mentioned conventional technique has a problem that the translation reliability cannot be expected unless the knowledge space provided in the translator is enhanced, and therefore needs to be further improved.

本開示の一態様に係る方法は、翻訳文を提供する方法であって、
ユーザの端末を介して、翻訳対象である第１言語で記述された第１文を取得し、
前記第１言語で記述された文と第２言語で記述された対訳文との対を複数含んだデータベースに前記第１文が含まれているか判定し、
前記データベースに前記第１文が含まれていないと判定された場合は、前記第１文を構成する一つ以上の単語を所定のルールに基づいて置き換えた複数の第２文を生成し、
前記複数の第２文と前記データベースに含まれている前記第１言語で記述された複数の文との構文の一致度をそれぞれ算出し、
算出された一致度が閾値以上である前記データベースに含まれている前記第１言語で記述された一以上の第３文を抽出し、
前記データベースにおいて、前記一以上の第３文の対訳文である前記第２言語で記述された一以上の第４文を、前記第１文の対訳リファレンスとして前記ユーザの端末に表示させるものである。 The method according to one aspect of the present disclosure is a method of providing a translated text.
Obtain the first sentence written in the first language to be translated via the user's terminal,
It is determined whether or not the first sentence is included in the database containing a plurality of pairs of the sentence described in the first language and the bilingual sentence described in the second language.
When it is determined that the first sentence is not included in the database, a plurality of second sentences in which one or more words constituting the first sentence are replaced based on a predetermined rule are generated.
The degree of syntactic matching between the plurality of second sentences and the plurality of sentences described in the first language included in the database is calculated.
One or more third sentences described in the first language included in the database whose calculated degree of matching is equal to or more than the threshold value are extracted.
In the database, one or more fourth sentences described in the second language, which are parallel translations of the one or more third sentences, are displayed on the user's terminal as a translation reference of the first sentence. ..

本開示によれば、入力文又はその類似文の翻訳文を高い信頼度で生成するために知識空間を増強しなくても、ユーザにとって有用な翻訳結果を提示できる。 According to the present disclosure, it is possible to present a translation result useful for a user without enhancing the knowledge space in order to generate a translation of an input sentence or a similar sentence with high reliability.

本開示の実施の形態に係る装置の一例である翻訳支援装置の構成を示すブロック図である。It is a block diagram which shows the structure of the translation support apparatus which is an example of the apparatus which concerns on embodiment of this disclosure. 携帯情報端末で構成された場合の翻訳支援装置と、据え置き型コンピュータで構成された場合の翻訳支援装置との一例を示す図である。It is a figure which shows an example of the translation support device when it is composed of a mobile information terminal, and the translation support device when it is composed of a stationary computer. クラウドシステムで構成された場合の翻訳支援装置１の一例を示す図である。It is a figure which shows an example of the translation support apparatus 1 in the case of being configured by a cloud system. 換言文生成部の詳細な構成を示すブロック図である。It is a block diagram which shows the detailed structure of the paraphrase generation part. 入力換言文と用例文との一例を示す図である。It is a figure which shows an example of an input paraphrase sentence and an example sentence. 本実施の形態における具体例を纏めた表である。It is a table which summarized the specific example in this embodiment. 出力部が表示する出力画像の一例を示す図である。It is a figure which shows an example of the output image which an output part displays. 本開示の実施の形態に係る翻訳支援装置の処理の一例を示すフローチャートである。It is a flowchart which shows an example of the processing of the translation support apparatus which concerns on embodiment of this disclosure. 図８のＳ５の処理の詳細の一例を示すフローチャートである。It is a flowchart which shows an example of the details of the process of S5 of FIG. 図８のＳ６の処理の詳細の一例を示すフローチャートである。It is a flowchart which shows an example of the details of the process of S6 of FIG.

（本開示の一態様に至る経緯）
翻訳機の翻訳品質を高めるために、入力文を換言し、複数の入力換言文を生成し、生成した複数の入力換言文の翻訳文を提示し、提示した複数の入力換言文の翻訳文の中から最適な翻訳文をユーザに選択させる技術が提案されている（特許文献１）。 (Background to one aspect of this disclosure)
In order to improve the translation quality of the translator, the input sentence is paraphrased, multiple input paraphrases are generated, the translated sentences of the generated multiple input paraphrases are presented, and the translated sentences of the presented multiple input paraphrases are presented. A technique has been proposed in which the user selects the most suitable translation from among them (Patent Document 1).

また、入力文に類似する文や部分一致する文の用例翻訳結果を、機械翻訳結果とともに提示する技術も提案されている（特許文献２）。 Further, a technique has been proposed in which an example translation result of a sentence similar to an input sentence or a sentence having a partial match is presented together with the machine translation result (Patent Document 2).

しかし、特許文献１の技術は、翻訳文を生成するために翻訳機が使用する知識空間において、入力文及び入力換言文の近傍の知識データが存在しない場合、精度の高い翻訳文を提示できないという問題がある。 However, the technique of Patent Document 1 cannot present a highly accurate translated sentence when there is no knowledge data in the vicinity of the input sentence and the input paraphrase sentence in the knowledge space used by the translator to generate the translated sentence. There's a problem.

また、特許文献２の技術は、翻訳機の所有する用例対訳データベースが、入力文に類似する用例文や部分一致する用例文を含んでいなければ、入力文の機械翻訳の不確かさを補完し得るような用例対訳を提示できないという問題がある。 Further, the technique of Patent Document 2 complements the uncertainty of machine translation of the input sentence unless the example parallel translation database owned by the translator contains an example sentence similar to the input sentence or a partially matching example sentence. There is a problem that it is not possible to present an example translation that can be obtained.

このように、特許文献１、２の技術は知識空間を増強しなければ、翻訳信頼度の向上が望めないという問題がある。また、特許文献１、２の技術は、知識空間を増強したとしても、増強した知識空間の範囲外の表現を持つ入力文が入力された場合、翻訳信頼度の向上は望めない。また、知識空間の増強は、費用対効果の面で課題がある。 As described above, the techniques of Patent Documents 1 and 2 have a problem that the translation reliability cannot be expected unless the knowledge space is strengthened. Further, in the techniques of Patent Documents 1 and 2, even if the knowledge space is enhanced, improvement in translation reliability cannot be expected when an input sentence having an expression outside the range of the enhanced knowledge space is input. In addition, the enhancement of the knowledge space has a problem in terms of cost effectiveness.

本開示は、入力文又はその類似文の翻訳文を高い信頼度で生成するために知識空間を増強しなくともユーザにとって有用な翻訳文を提示する技術を提供する。 The present disclosure provides a technique for presenting a translation useful for a user without enhancing the knowledge space in order to generate a translation of an input sentence or a similar sentence with high reliability.

本態様は、単に、翻訳対象である第１文の換言文の翻訳文を提示するのではない。すなわち、本態様は、データベースに記憶された第１言語で記述された複数の文の中から、第１文を所定のルールで置き換えた複数の第２文に対して構文の一致度が閾値以上である一以上の文が、第３文として抽出される。そして、本態様は、抽出された第３文の対訳文である第４文を対訳リファレンスとして表示する。 This aspect does not merely present a translated sentence of the paraphrase sentence of the first sentence to be translated. That is, in this embodiment, the degree of syntactic matching is equal to or greater than the threshold value for a plurality of second sentences in which the first sentence is replaced by a predetermined rule from among a plurality of sentences written in the first language stored in the database. One or more sentences that are are extracted as the third sentence. Then, in this aspect, the fourth sentence, which is a parallel translation of the extracted third sentence, is displayed as a parallel translation reference.

ここで、第１文と内容は異なるが、第１文と文構造が一致又は類似する第３文の用例対訳である第４文を提示した方が、第１文の換言文の翻訳文を提示するよりも、却って、ユーザにとって有用な翻訳結果を提示できる可能性が高まる。 Here, although the content is different from the first sentence, it is better to present the fourth sentence, which is an example parallel translation of the third sentence whose sentence structure is the same as or similar to that of the first sentence. Rather than presenting, it is more likely that a translation result that is useful to the user can be presented.

すなわち、このような第４文を提示した方が、翻訳文を生成する際に用いられる知識空間が広範囲に使用され、ユーザにとって有用な翻訳結果が得られるのである。本態様は、この点に着目しているので、ユーザにとって有用な翻訳結果を提示できる。 That is, when such a fourth sentence is presented, the knowledge space used when generating the translated sentence is widely used, and a translation result useful for the user can be obtained. Since this aspect pays attention to this point, it is possible to present a translation result useful for the user.

また、本態様は、第１文又はその類似文の翻訳文を高信頼度で生成することを要求していないので、その要求に応えられるような、豊富な知識データを備える知識空間を用いる必要はない。したがって、本態様は、知識空間を増強させなくとも、ユーザにとって有用な翻訳結果を提示できる。 Further, since this aspect does not require that a translated sentence of the first sentence or a similar sentence be generated with high reliability, it is necessary to use a knowledge space having abundant knowledge data that can meet the request. There is no. Therefore, this aspect can present translation results that are useful to the user without enhancing the knowledge space.

上記態様において、前記複数の第２文のうち一以上の第２文を前記第２言語に機械翻訳して一以上の第５文を生成し、
前記一以上の第４文と前記一以上の第５文との少なくともいずれか一方を前記ユーザの端末に表示させてもよい。 In the above aspect, one or more second sentences out of the plurality of second sentences are machine-translated into the second language to generate one or more fifth sentences.
At least one of the one or more fourth sentences and the one or more fifth sentences may be displayed on the terminal of the user.

本態様では、第１文を所定のルールで言い換えた複数の第２文のうち一以上の第２文の翻訳文である一以上の第５文が生成され、第４文と第５文との少なくともいずれか一方が提示される。そのため、第５文を提示することで、多様な翻訳結果が提示され、ユーザにとって有用な翻訳結果を提示できる可能性が高められる。 In this embodiment, one or more fifth sentences, which are translations of one or more second sentences out of a plurality of second sentences in which the first sentence is paraphrased according to a predetermined rule, are generated, and the fourth sentence and the fifth sentence are combined. At least one of the above is presented. Therefore, by presenting the fifth sentence, various translation results are presented, and the possibility of presenting useful translation results for the user is increased.

上記態様において、前記一致度は、前記複数の第２文と前記データベースに含まれている複数の文とのテキスト類似度を示す第１指標に基づいて算出されてもよい。 In the above aspect, the degree of agreement may be calculated based on a first index indicating the text similarity between the plurality of second sentences and the plurality of sentences included in the database.

本態様によれば、データベースに記憶された複数の文の中から、複数の第２文に対してテキスト類似度が一致又は類似する文が第３文として抽出される。そのため、知識空間を広範囲に使用しつつも、複数の第２文に対して無関係な文が第３文として抽出されることを防止できる。 According to this aspect, a sentence having the same or similar text similarity to the plurality of second sentences is extracted as the third sentence from the plurality of sentences stored in the database. Therefore, it is possible to prevent a sentence unrelated to the plurality of second sentences from being extracted as the third sentence while using the knowledge space extensively.

上記態様において、前記一致度は、前記データベースに含まれる複数の文のうち、前記複数の第２文に対して文構造が一致又は類似する文であって、前記第１文とのテキスト類似度が小さい文ほど大きな値を示す第２指標に基づいて算出されてもよい。 In the above aspect, the degree of coincidence is a sentence having a sentence structure that matches or is similar to the plurality of second sentences among the plurality of sentences included in the database, and the degree of text similarity with the first sentence. It may be calculated based on the second index which shows a larger value as the sentence has a smaller value.

本態様によれば、データベースに含まれる複数の文のうち、第２文に対して文構造が一致又は類似する文であって、第１文に対して内容が離れた用例文が抽出される。そのため、多様な第３文を抽出でき、知識空間を広範囲に使用できる。 According to this aspect, among a plurality of sentences included in the database, example sentences that have the same or similar sentence structure to the second sentence but have different contents from the first sentence are extracted. .. Therefore, various third sentences can be extracted and the knowledge space can be widely used.

上記態様において、前記一致度は、前記複数の第２文と前記データベースに含まれている複数の文との文構造の類似性を示す第３指標に基づいて算出されてもよい。 In the above aspect, the degree of agreement may be calculated based on a third index indicating the similarity of the sentence structure between the plurality of second sentences and the plurality of sentences included in the database.

本態様によれば、複数の第２文に対して文構造が一致又は類似する文が第３文として抽出される。そのため、知識空間を広範囲に使用しつつも、複数の第２文に対して関連性の低い第３文が抽出されることを防止できる。 According to this aspect, a sentence having a sentence structure that matches or is similar to a plurality of second sentences is extracted as the third sentence. Therefore, it is possible to prevent the extraction of the third sentence, which is less relevant to the plurality of second sentences, while using the knowledge space over a wide range.

上記態様において、前記一致度は、前記複数の第２文と前記データベースに含まれている複数の文とにおいて、品詞の一致数が多いほど大きな値を示す第４指標に基づいて算出されてもよい。 In the above aspect, the degree of matching may be calculated based on a fourth index showing a larger value as the number of matching parts of speech increases in the plurality of second sentences and the plurality of sentences included in the database. good.

本態様によれば、複数の第２文に対して一致する品詞の数が多い文が第３文として抽出される。そのため、知識空間を広範囲に使用しつつも、複数の第２文に対して関連性の低い文が抽出されることを防止できる。 According to this aspect, a sentence having a large number of matching part of speech for a plurality of second sentences is extracted as the third sentence. Therefore, it is possible to prevent sentences having low relevance to a plurality of second sentences from being extracted while using the knowledge space extensively.

また、本態様は、前記第１指標〜第４指標のうち、少なくとも２つを組み合わせて評価値を算出してもよい。これにより、より多様な第３文が抽出され、多様な翻訳結果を提示することができる。その結果、第１文に対して第１言語での類似度は低いが、翻訳結果が第１文の翻訳のヒントになるような第３文を抽出できる。 Further, in this aspect, the evaluation value may be calculated by combining at least two of the first to fourth indexes. As a result, a wider variety of third sentences can be extracted, and various translation results can be presented. As a result, it is possible to extract a third sentence in which the translation result is a hint for translation of the first sentence, although the similarity in the first language is low with respect to the first sentence.

上記態様において、前記第１指標は、置き換え箇所が多い第２文ほど大きな値を示してもよい。 In the above aspect, the first index may show a larger value as the second sentence has more replacement points.

本態様によれば、置き換え箇所が多い第２文ほど第１指標が大きくなるので、置き換え箇所が多い第２文に対して類似する第３文を抽出でき、より多様な翻訳結果を提示できる。 According to this aspect, since the first index becomes larger as the second sentence has many replacement parts, the third sentence similar to the second sentence with many replacement parts can be extracted, and more diverse translation results can be presented.

上記態様において、前記一以上の第２文は、前記複数の第２文と前記一以上の第３文とのテキスト類似度に基づいて前記複数の第２文の中から抽出されしてもよい。 In the above aspect, the one or more second sentences may be extracted from the plurality of second sentences based on the text similarity between the plurality of second sentences and the one or more third sentences. ..

本態様によれば、一致度に基づいて抽出された第３文に対し、複数の第２文の中から、類似する第２文が抽出されるので、知識空間を広範囲に使用しつつも、第１文とは無関係の第２文が抽出されることを防止できる。 According to this aspect, a similar second sentence is extracted from a plurality of second sentences with respect to the third sentence extracted based on the degree of agreement, so that while using the knowledge space extensively, It is possible to prevent the second sentence unrelated to the first sentence from being extracted.

上記態様において、前記所定のルールは、前記第１文を構成する素片に含まれる第１単語を、文脈類似関係にある第２単語で換言する第１換言ルールであってもよい。 In the above aspect, the predetermined rule may be a first paraphrase rule in which the first word included in the element piece constituting the first sentence is paraphrased by the second word having a context-like relationship.

本態様によれば、第１文を構成する第１単語が文脈類似関係にある第２単語で換言された第２文が生成される。そのため、第１文に対して単に同じ意味を持つ第２文を生成する態様を採用する場合に比べて多様な第２文を生成できる。その結果、第１文に対して第１言語での類似度は低いが、翻訳結果が第１文の翻訳のヒントになるような第２文を生成できる。 According to this aspect, a second sentence is generated in which the first word constituting the first sentence is paraphrased by the second word having a context-like relationship. Therefore, it is possible to generate a variety of second sentences as compared with the case of adopting the mode of generating the second sentence having the same meaning with respect to the first sentence. As a result, it is possible to generate a second sentence in which the translation result is a hint for translation of the first sentence, although the similarity in the first language is low with respect to the first sentence.

上記態様において、前記所定のルールは、前記第１文を構成する素片に含まれる第１単語を、共起関係にある第２単語に換言する第２換言ルールであってもよい。 In the above aspect, the predetermined rule may be a second paraphrase rule that paraphrases the first word contained in the element piece constituting the first sentence into the second word having a co-occurrence relationship.

本態様によれば、第１文を構成する第１単語が、共起関係にある第２単語で換言された第２文が生成される。そのため、第１文に対して単に同じ意味を持つ第２文を生成する態様を採用する場合に比べて多様な第２文を生成できる。その結果、第１文に対して第１言語での類似度は低いが、翻訳結果が第１文の翻訳のヒントになるような第２文を生成できる。 According to this aspect, a second sentence is generated in which the first word constituting the first sentence is paraphrased by the second word having a co-occurrence relationship. Therefore, it is possible to generate a variety of second sentences as compared with the case of adopting the mode of generating the second sentence having the same meaning with respect to the first sentence. As a result, it is possible to generate a second sentence in which the translation result is a hint for translation of the first sentence, although the similarity in the first language is low with respect to the first sentence.

上記態様において、前記所定のルールは、前記第１文を構成する素片に含まれる第１単語を、含意関係にある第２単語に換言する第３換言ルールであってもよい。 In the above aspect, the predetermined rule may be a third paraphrase rule that paraphrases the first word contained in the element piece constituting the first sentence into the second word having an implication relationship.

本態様によれば、第１文を構成する第１単語が含意関係にある第２単語で換言された第２文が生成される。そのため、第１文に対して単に同じ意味を持つ第２文を生成する態様を採用する場合に比べて多様な第２文を生成できる。その結果、第１文に対して第１言語での類似度は低いが、翻訳結果が入力文の翻訳のヒントになるような第２文を生成できる。 According to this aspect, a second sentence is generated in which the first word constituting the first sentence is paraphrased by the second word having an implication relationship. Therefore, it is possible to generate a variety of second sentences as compared with the case of adopting the mode of generating the second sentence having the same meaning with respect to the first sentence. As a result, it is possible to generate a second sentence in which the translation result is a hint for translation of the input sentence, although the similarity in the first language is low with respect to the first sentence.

上記態様において、前記所定のルールは、前記第１文を構成する素片に含まれる第１単語を、上位下位関係にある第２単語に換言する第４換言ルールであってもよい。 In the above aspect, the predetermined rule may be a fourth paraphrase rule in which the first word included in the element piece constituting the first sentence is paraphrased into the second word having a higher-lower relationship.

本態様によれば、第１文を構成する第１単語が上位下位関係にある第２単語で換言された第２文が生成される。そのため、第１文に対して単に同じ意味内容を持つ第２文を生成する態様を採用する場合に比べて多様な第２文を生成できる。その結果、第１文に対して第１言語での類似度は低いが、翻訳結果等が第１文の翻訳のヒントになるような第２文を生成できる。 According to this aspect, a second sentence is generated in which the first word constituting the first sentence is paraphrased by the second word having a higher-lower relationship. Therefore, it is possible to generate a variety of second sentences as compared with the case of adopting a mode in which a second sentence having the same meaning and content is simply generated for the first sentence. As a result, it is possible to generate a second sentence in which the translation result or the like is a hint for the translation of the first sentence, although the similarity in the first language is low with respect to the first sentence.

上記態様において、前記一以上の第４文は前記第１文に対する換言箇所が他の箇所と区別して提示されされてもよい。 In the above aspect, the one or more fourth sentences may be presented with a paraphrase part for the first sentence distinguished from other parts.

本態様によれば、第１文に対する換言箇所を容易にユーザに認識させることができる。 According to this aspect, the user can easily recognize the paraphrase part for the first sentence.

（実施の形態）
図１は、本開示の実施の形態に係る装置の一例である翻訳支援装置１の構成を示すブロック図である。翻訳支援装置１は、第１言語で記述された入力文を第２言語に翻訳する装置である。第１言語としては、例えば、日本語、英語、フランス語、ドイツ語というような言語が採用できる。第２言語としては、第１言語とは異なる言語が採用できる。以下の説明では、第１言語として日本語を採用し、第２言語として英語を採用するがこれは一例である。 (Embodiment)
FIG. 1 is a block diagram showing a configuration of a translation support device 1 which is an example of the device according to the embodiment of the present disclosure. The translation support device 1 is a device that translates an input sentence written in the first language into a second language. As the first language, for example, languages such as Japanese, English, French, and German can be adopted. As the second language, a language different from the first language can be adopted. In the following explanation, Japanese is adopted as the first language and English is adopted as the second language, which is an example.

翻訳支援装置１は、入力部２、用例一致判定部３、用例対訳ＤＢ（データベース）４（データベースの一例）、換言文生成部５、抽出部６、機械翻訳部７、信頼度付与部８、及び出力部９（提示部の一例）を備える。図１において、翻訳支援装置１は、例えばＣＰＵ、ＲＯＭ、及びＲＡＭを含むコンピュータで構成される。入力部２は、例えば、タッチパネル等の入力装置、又はキーボード及びマウス等の入力装置で構成される。用例一致判定部３、換言文生成部５、抽出部６、機械翻訳部７、信頼度付与部８は、例えば、ＣＰＵがコンピュータを翻訳支援装置１として機能させるプログラムを実行することで実現されてもよいし、専用のハードウェア回路により実現されてもよい。このプログラムはネットワークを介してダウンロードすることで提供されてもよいし、コンピュータ読取可能な非一時的な記録媒体に記録されて提供されてもよい。また、用例対訳ＤＢ４は、記憶装置（メモリ）で構成されてもよい。また、出力部９は、表示装置又はスピーカで構成されてもよい。 The translation support device 1 includes an input unit 2, an example match determination unit 3, an example parallel translation DB (database) 4 (an example of a database), a paraphrase generation unit 5, an extraction unit 6, a machine translation unit 7, and a reliability imparting unit 8. And an output unit 9 (an example of a presentation unit). In FIG. 1, the translation support device 1 is composed of, for example, a computer including a CPU, a ROM, and a RAM. The input unit 2 is composed of, for example, an input device such as a touch panel or an input device such as a keyboard and a mouse. The example match determination unit 3, the paraphrase generation unit 5, the extraction unit 6, the machine translation unit 7, and the reliability imparting unit 8 are realized, for example, by executing a program in which the CPU functions the computer as the translation support device 1. It may be realized by a dedicated hardware circuit. The program may be provided by downloading over a network or may be recorded and provided on a computer-readable non-temporary recording medium. Further, the example bilingual translation DB4 may be composed of a storage device (memory). Further, the output unit 9 may be composed of a display device or a speaker.

なお、翻訳支援装置１は、スマートフォンやタブレット端末等の携帯情報端末で構成されてもよいし、据え置き型のコンピュータで構成されてもよい。 The translation support device 1 may be composed of a mobile information terminal such as a smartphone or a tablet terminal, or may be composed of a stationary computer.

図２は、携帯情報端末で構成された場合の翻訳支援装置１と、据え置き型コンピュータで構成された場合の翻訳支援装置１との一例を示す図である。図２の左図では、翻訳支援装置１はスマートフォン又はタブレット端末等の携帯情報端末で構成されている。図２の右図では、翻訳支援装置１は、据え置き型のコンピュータで構成されている。これらの場合、図１に示す各構成要素は、携帯情報端末又は据え置き型コンピュータ内に集約される。 FIG. 2 is a diagram showing an example of a translation support device 1 when configured with a portable information terminal and a translation support device 1 when configured with a stationary computer. In the left figure of FIG. 2, the translation support device 1 is composed of a mobile information terminal such as a smartphone or a tablet terminal. In the right figure of FIG. 2, the translation support device 1 is composed of a stationary computer. In these cases, each component shown in FIG. 1 is aggregated in a personal digital assistant or a stationary computer.

或いは、翻訳支援装置１は、クラウドシステムで構成されてもよい。図３は、クラウドシステムで構成された場合の翻訳支援装置１の一例を示す図である。クラウドシステムは、サーバＳＶ１及び１又は複数の端末ＴＥ１で構成されている。サーバＳＶ１と端末ＴＥ１とはインターネット等のネットワークＮＴを介して通信可能に接続されている。サーバＳＶ１は、１又は複数のコンピュータで構成されるクラウドサーバである。端末ＴＥ１は、スーマートフォンやタブレット端末等の携帯情報端末で構成されてもよいし、据え置き型のコンピュータで構成されてもよい。 Alternatively, the translation support device 1 may be configured by a cloud system. FIG. 3 is a diagram showing an example of the translation support device 1 when configured with a cloud system. The cloud system is composed of a server SV1 and one or a plurality of terminals TE1. The server SV1 and the terminal TE1 are communicably connected via a network NT such as the Internet. The server SV1 is a cloud server composed of one or a plurality of computers. The terminal TE1 may be composed of a mobile information terminal such as a smart phone or a tablet terminal, or may be composed of a stationary computer.

この場合、図１に示す入力部２及び出力部９は、ユーザが所持する端末ＴＥ１で構成される。また、図１に示す、用例一致判定部３、用例対訳ＤＢ４、換言文生成部５、抽出部６、機械翻訳部７、及び信頼度付与部８は、サーバＳＶ１で構成される。つまり、翻訳支援機能はサーバＳＶ１に実装され、端末ＴＥ１は、ユーザインターフェースを提供する。 In this case, the input unit 2 and the output unit 9 shown in FIG. 1 are composed of the terminal TE1 owned by the user. Further, the example match determination unit 3, the example parallel translation DB 4, the paraphrase generation unit 5, the extraction unit 6, the machine translation unit 7, and the reliability imparting unit 8 shown in FIG. 1 are composed of the server SV1. That is, the translation support function is implemented in the server SV1, and the terminal TE1 provides a user interface.

図１に参照を戻す。入力部２は、翻訳対象である第１言語で記述された入力文（第１文の一例）を取得する。入力文は、ユーザによって入力された文であって、第１言語で記述された文である。 The reference is returned to FIG. The input unit 2 acquires an input sentence (an example of the first sentence) written in the first language to be translated. The input sentence is a sentence input by the user and is a sentence written in the first language.

用例一致判定部３は、入力部２が取得した入力文と一致する用例文が用例対訳ＤＢ４に記憶されているか否かを判定する。そして、用例一致判定部３は、用例対訳ＤＢ４に入力文と一致する用例文が記憶されていれば、一致する用例文と、その用例文を含む用例対訳とを出力部９に出力する。ここで、用例一致判定部３は、例えば、用例文と入力文とが完全に一致する場合に一致すると判定すればよい。一方、用例一致判定部３は、入力部２が取得した入力文が用例対訳ＤＢ４に記憶されていなければ、その入力文を換言文生成部５及び機械翻訳部７に出力する。 The example matching determination unit 3 determines whether or not an example sentence matching the input sentence acquired by the input unit 2 is stored in the example bilingual translation DB 4. Then, if the example sentence matching the input sentence is stored in the example parallel translation DB 4, the example matching determination unit 3 outputs the matching example sentence and the example parallel translation including the example sentence to the output unit 9. Here, the example match determination unit 3 may determine, for example, that the example sentence and the input sentence match when they completely match. On the other hand, if the input sentence acquired by the input unit 2 is not stored in the example parallel translation DB 4, the example match determination unit 3 outputs the input sentence to the paraphrase sentence generation unit 5 and the machine translation unit 7.

用例対訳ＤＢ４は、第１言語で記述された用例文と、用例文を第２言語で記述した用例対訳とを対応付けた１以上の用例対訳を記憶するデータベースである。用例対訳ＤＢ４は、第１言語で記述された文と第２言語で記述された対訳文との対を複数含んだデータベースの一例である。詳細には、用例対訳ＤＢ４は、例えば、１つの用例対訳に対して１つのレコードが割り当てられたデータベースであり、用例文のフィールドと用例対訳のフィールドとを備える。用例文のフィールドには、用例文が記憶され、用例対訳のフィールドには、用例文に対応する翻訳文が記憶されている。用例文とは使用実績のある文であり、用例対訳とは翻訳実績のある用例文の翻訳文である。用例文と用例対訳との翻訳信頼度は、例えば、１００％である。 The example bilingual translation DB4 is a database that stores one or more example bilingual translations in which an example sentence written in a first language and an example bilingual translation described in a second language are associated with each other. The example bilingual translation DB4 is an example of a database including a plurality of pairs of sentences written in the first language and bilingual sentences written in the second language. Specifically, the example bilingual translation DB4 is, for example, a database in which one record is assigned to one example bilingual translation, and includes an example sentence field and an example bilingual translation field. The example sentence is stored in the example sentence field, and the translated sentence corresponding to the example sentence is stored in the example parallel translation field. An example sentence is a sentence that has been used, and an example parallel translation is a translated sentence of an example sentence that has been translated. The translation reliability of the example sentence and the example parallel translation is, for example, 100%.

換言文生成部５は、用例一致判定部３により用例対訳ＤＢ４に入力文が記憶されていないと判定された場合は、入力部２が取得した入力文を複数の素片に分割し、複数の素片のうちの１又は複数を所定の換言ルール（所定のルールの一例）を用いて第１言語の他の表現に換言する（置き換える）ことによって、複数の入力換言文（複数の第２文の一例）を生成する。 When the paraphrase sentence generation unit 5 determines that the input sentence is not stored in the example bilingual translation DB 4 by the example match determination unit 3, the paraphrase sentence generation unit 5 divides the input sentence acquired by the input unit 2 into a plurality of elements and a plurality of pieces. Multiple input paraphrases (plural second sentences) by paraphrasing (replacing) one or more of the elements with other expressions in the first language using a predetermined paraphrase rule (an example of a predetermined rule). An example) is generated.

ここで、入力文を複数の素片に分割する手法としては、例えば、入力文を単語ごとに区切る手法が採用される。但し、本実施の形態はこれに限定されず、入力文を品詞ごと区切る手法が採用されてもよいし、入力文を所定文字数（例えば２文字や３文字等）ごとに区切る手法が採用されてもよいし、入力文を句毎に区切る手法が採用されてもよいし、入力文を意味クラスごとに区切る手法が採用されてもよいし、入力文を形態素毎に区切る手法が採用されてもよい。 Here, as a method of dividing the input sentence into a plurality of elementary pieces, for example, a method of dividing the input sentence into words is adopted. However, the present embodiment is not limited to this, and a method of dividing the input sentence by part of speech may be adopted, or a method of dividing the input sentence by a predetermined number of characters (for example, 2 characters or 3 characters) is adopted. Alternatively, a method of separating input sentences by phrase may be adopted, a method of separating input sentences by meaning class may be adopted, or a method of separating input sentences by morpheme may be adopted. good.

また、換言ルールとしては、下記の第１〜第４換言ルールが採用できる。 Further, as the paraphrase rule, the following first to fourth paraphrase rules can be adopted.

第１換言ルールは、入力文を構成する素片に含まれる第１単語を、文脈類似関係にある第２単語で換言するルールである。ここで、文脈類似関係とは、文脈上、類似する関係にある単語同士の関係を指し、例えば、ＡＬＡＧＩＮ言語資源の文脈類似語データベースに登録された単語同士の関係が採用できる。例えば、「ルパン三世」と文脈類似関係にある単語としては、「名探偵コナン」、「宇宙戦艦ヤマト」等が該当する。また、「チャイコフスキー」と文脈類似関係にある単語としては、「ブラームス」、「シューマン」、「メンデルスゾーン」等が該当する。なお、ＡＬＡＧＩＮ言語資源の文脈類似語データベースでは、「ルパン三世」と「ルパン３世」とは文脈類似関係にあると判断されるが、本実施の形態では、両者は内容が近すぎるので、文脈類似関係から除外してもよい。 The first paraphrase rule is a rule that paraphrases the first word included in the element piece constituting the input sentence with the second word having a context-like relationship. Here, the context-similar relationship refers to the relationship between words that are similar in context, and for example, the relationship between words registered in the context-similar word database of the ALAGIN language resource can be adopted. For example, words that have a similar context to "Lupin III" include "Detective Conan" and "Space Battleship Yamato". In addition, "Brahms", "Schumann", "Mendelssohn" and the like correspond to words having a context-like relationship with "Tchaikovsky". In the context-similar word database of ALAGIN language resources, it is judged that "Lupin III" and "Lupin III" have a context-similar relationship, but in this embodiment, the contents are too close to each other, so the context. It may be excluded from the similarity.

第２換言ルールは、入力文を構成する素片に含まれる第１単語を、共起関係にある第２単語に換言するルールである。ここで、共起関係とは、同一文書内で出現する頻度が高い単語同士の関係を指し、例えば、ＡＬＡＧＩＮ言語資源の単語共起頻度データベースに登録された単語同士の関係が該当する。例えば、「海外旅行」と共起関係にある単語としては、ＤＩＣＥ係数が高い順に「国内旅行」、「格安航空券」、「ツアー」、「航空券」、「旅行」が該当する。また、「クリスマス」の共起関係にある単語として、ＤＩＣＥ係数が高い順に「お正月」、「誕生日」、「サンタ」、「冬」、「年末」が該当する。なお、ＤＩＣＥ係数は単語同士の類似性や共起性を数値化した指標である。 The second paraphrase rule is a rule for paraphrasing the first word contained in the element piece constituting the input sentence into the second word having a co-occurrence relationship. Here, the co-occurrence relationship refers to the relationship between words that frequently appear in the same document, and corresponds to, for example, the relationship between words registered in the word co-occurrence frequency database of the ALAGIN language resource. For example, as words co-occurring with "overseas travel", "domestic travel", "cheap airline ticket", "tour", "airline ticket", and "travel" correspond in descending order of DICE coefficient. In addition, as words having a co-occurrence relationship of "Christmas", "New Year", "Birthday", "Santa", "Winter", and "Year-end" correspond in descending order of DICE coefficient. The DICE coefficient is an index that quantifies the similarity and co-occurrence between words.

第３換言ルールは、入力文を構成する素片に含まれる第１単語を、含意関係にある第２単語に換言するルールである。ここで、含意関係とは、第１単語が第２単語を含意する関係を指し、例えば、ＡＬＡＧＩＮ言語資源の含意関係データベースに登録された単語同士の関係が該当する。第１単語が第２単語を含意するとは、第１単語の表す事態が成立するならば、同時かそれ以前に第２単語の表す事態も成立することを意味する。例えば、「チンする」に対して「加熱する」、「デトックスする」に対して「解毒する」、「銀ブラする」に対して「うろつく」、「アポトーシス」するに対して「死ぬ」、「壊れる」に対して「イカれる」、「酔っぱらう」に対して「飲む」が該当する。なお、含意関係は、上位下位関係が成立する場合もあるが、「チンする」と「加熱する」というように上位下位関係が成立しない場合もある。 The third paraphrase rule is a rule for paraphrasing the first word contained in the element piece constituting the input sentence into the second word having an implication relationship. Here, the implication relationship refers to a relationship in which the first word implies the second word, and for example, the relationship between words registered in the implication relation database of the ALAGIN language resource is applicable. The fact that the first word implies the second word means that if the situation represented by the first word is established, the situation represented by the second word is also established at the same time or before that. For example, "heat" for "tin", "detoxify" for "detox", "prowl" for "silver bra", "die" for "apoptosis", "die", " "Break" corresponds to "squid", and "drunk" corresponds to "drink". As for the implication relationship, the upper-lower relationship may be established, but the upper-lower relationship may not be established, such as "tinning" and "heating".

第４換言ルールは、入力文を構成する素片に含まれる第１単語を、上位下位関係にある第２単語に換言するルールである。ここで、上位下位関係とは、例えば、ＡＬＡＧＩＮ言語資源の上位語階層データベースに登録された単語同士の関係を指す。第１単語が第２単語を含む、より一般的、より総称的、より抽象的なものを指す場合、第１単語は第２単語に対して上位関係にある。 The fourth paraphrase rule is a rule for paraphrasing the first word contained in the element piece constituting the input sentence into the second word having a higher-lower relationship. Here, the hypernym-lower relationship refers to, for example, the relationship between words registered in the hypernym hierarchy database of the ALAGIN language resource. If the first word refers to something more general, more generic, and more abstract, including the second word, then the first word is superior to the second word.

図４は、換言文生成部５の詳細な構成を示すブロック図である。換言文生成部５は、換言ＤＢ（データベース）を記憶する換言ＤＢ記憶部５１、換言候補生成部５２、及び換言文識別部５３を備える。換言ＤＢは、第１言語の単語と、第１単語を第１言語の他の表現で表現した第２単語とを互いに対応付けたデータベースである。 FIG. 4 is a block diagram showing a detailed configuration of the paraphrase generation unit 5. The paraphrase sentence generation unit 5 includes a paraphrase DB storage unit 51 that stores a paraphrase DB (database), a paraphrase candidate generation unit 52, and a paraphrase sentence identification unit 53. The paraphrase DB is a database in which a word in the first language and a second word in which the first word is expressed by another expression in the first language are associated with each other.

本実施の形態では、換言ＤＢ記憶部５１は、文脈類似語ＤＢ５１１、共起関係ＤＢ５１２、含意関係ＤＢ５１３、及び上位下位関係ＤＢ５１４を記憶する。以下、文脈類似語ＤＢ５１１、共起関係ＤＢ５１２、含意関係ＤＢ５１３、及び上位下位関係ＤＢ５１４を特に区別しない場合、換言ＤＢと記載する。文脈類似語ＤＢ５１１は、入力文を第１換言ルールで換言するためのデータベースであり、文脈類似関係にある単語同士が予め対応付けて記憶するデータベースである。ここで、文脈類似語ＤＢ５１１としては、例えば、ＡＬＡＧＩＮ言語資源の文脈類似語データベースが採用できる。 In the present embodiment, the paraphrase DB storage unit 51 stores the context-like word DB511, the co-occurrence relation DB512, the implication relation DB513, and the superordinate relation DB514. Hereinafter, when the context-like word DB511, the co-occurrence relation DB512, the implication relation DB513, and the upper-lower relation DB514 are not particularly distinguished, they are described as paraphrase DB. The context-similar word DB511 is a database for paraphrasing an input sentence according to the first paraphrase rule, and is a database in which words having a context-similar relationship are stored in association with each other in advance. Here, as the context-like word DB511, for example, a context-like word database of ALAGIN language resources can be adopted.

共起関係ＤＢ５１２は、入力文を第２換言ルールで換言するためのデータベースであり、共起関係にある単語同士を予め対応付けて記憶するデータベースである。ここで、共起関係ＤＢ５１２としては、例えば、ＡＬＡＧＩＮ言語資源の単語共起頻度データベースが採用できる。 The co-occurrence relationship DB512 is a database for paraphrasing an input sentence according to the second paraphrase rule, and is a database for storing words having a co-occurrence relationship in advance in association with each other. Here, as the co-occurrence relationship DB512, for example, a word co-occurrence frequency database of ALAGIN language resources can be adopted.

含意関係ＤＢ５１３は、入力文を第３換言ルールで換言するためのデータベースであり、含意関係にある単語同士を予め対応付けて記憶するデータベースである。ここで、含意関係ＤＢ５１３としては、例えば、ＡＬＡＧＩＮ言語資源の含意関係データベースが採用できる。 The implication relationship DB 513 is a database for paraphrasing an input sentence according to the third paraphrase rule, and is a database for storing words having an implication relationship in advance in association with each other. Here, as the implication relation DB 513, for example, an implication relation database of the ALAGIN language resource can be adopted.

上位下位関係ＤＢ５１４は、入力文を第４換言ルールで換言するためのデータベースであり、上位下位関係にある単語同士を予め対応付けて記憶するデータベースである。ここで、上位下位関係ＤＢ５１４としては、例えば、ＡＬＡＧＩＮ言語資源の上位語階層データベースが採用できる。 The upper-lower relationship DB 514 is a database for paraphrasing an input sentence according to the fourth paraphrase rule, and is a database for storing words having a higher-lower relationship in advance in association with each other. Here, as the superordinate / subordinate relationship DB514, for example, a hypernym hierarchical database of ALAGIN language resources can be adopted.

換言候補生成部５２は、換言ＤＢを参照することで入力文を第１〜第４換言ルールのそれぞれで換言し、入力換言文を生成する。ここで、換言候補生成部５２は、例えば、入力文Ｂ１「門真までタクシーにしたい」が入力されたとすると、「門真／まで／タクシー／に／したい」というように入力文Ｂ１を単語単位で区分する。そして、換言候補生成部５２は、文脈類似語ＤＢ５１１、共起関係ＤＢ５１２、含意関係ＤＢ５１３、及び上位下位関係ＤＢ５１４のそれぞれを参照することで、第１〜第４の換言ルールのそれぞれで入力文を換言し、少なくとも４つの入力換言文を生成する。 The paraphrase candidate generation unit 52 paraphrases the input sentence according to each of the first to fourth paraphrase rules by referring to the paraphrase DB, and generates the input paraphrase sentence. Here, if the input sentence B1 "I want to make a taxi to Kadoma" is input, the paraphrase candidate generation unit 52 divides the input sentence B1 into words such as "Kadoma / to / taxi / to / want". do. Then, the paraphrase candidate generation unit 52 refers to each of the context-like word DB511, the co-occurrence relation DB512, the implication relation DB513, and the superordinate relation DB514, and inputs an input sentence in each of the first to fourth paraphrase rules. In other words, generate at least four input paraphrases.

ここで、換言候補生成部５２は、第１〜第４換言ルールのうち第ｉ（ｉ＝１〜４）換言ルールを用いて入力換言文を生成するに際して、１つの単語を換言して１つの入力換言文を生成してもよいし、複数の箇所の単語を換言して１つの入力換言文を生成してもよい。また、換言候補生成部５２は、第ｉ換言ルールを用いて入力文を換言するに際して、単語の換言数が異なる複数の入力換言文を生成してもよい。 Here, the paraphrase candidate generation unit 52 paraphrases one word to one when generating an input paraphrase sentence using the i (i = 1 to 4) paraphrase rule among the first to fourth paraphrase rules. An input paraphrase may be generated, or a word at a plurality of places may be paraphrased to generate one input paraphrase. Further, the paraphrase candidate generation unit 52 may generate a plurality of input paraphrase sentences having different numbers of word paraphrases when paraphrasing the input sentence using the i-paraphrase rule.

例えば、換言候補生成部５２は、区分した入力文から１の単語をランダムに特定し、特定した１の単語と同一の単語が換言ＤＢに登録されていれば、その１の単語を換言ＤＢに登録された換言可能な別の単語で換言すればよい。一方、換言候補生成部５２は、特定した１の単語と同一の単語が換言ＤＢに登録されていなければ、その１の単語以外の別の１の単語を入力文からランダムに特定し、特定した別の１の単語と同一の単語が換言ＤＢに登録されていれば、その別の１の単語を換言ＤＢに登録された換言可能な別の単語で換言すればよい。換言候補生成部５２は、このような処理を繰り返して、第ｉ換言ルールにより換言された１又は複数の入力換言文を生成すればよい。 For example, the paraphrase candidate generation unit 52 randomly identifies one word from the divided input sentences, and if the same word as the identified one word is registered in the paraphrase DB, the one word is stored in the paraphrase DB. You can paraphrase it with another registered word that can be paraphrased. On the other hand, if the same word as the specified one word is not registered in the paraphrase DB, the paraphrase candidate generation unit 52 randomly identifies and identifies another one word other than the one word from the input sentence. If the same word as another one word is registered in the paraphrase DB, the other one word may be paraphrased with another paraphrasable word registered in the paraphrase DB. The paraphrase candidate generation unit 52 may repeat such a process to generate one or more input paraphrase sentences paraphrased according to the i-paraphrase rule.

例えば、文脈類似語ＤＢ５１１において、「タクシー」と文脈類似関係にある単語として、「バス」、「トラック」が登録されていたとすると、「バス」、「トラック」の中からランダムに１の単語を決定し、その１の単語で「タクシー」を換言してもよいし、「タクシー」に対して最も類似する単語で、「タクシー」を換言してもよい。 For example, in the context-similar word DB511, if "bus" and "truck" are registered as words having a context-similar relationship with "taxi", one word is randomly selected from "bus" and "truck". You may decide and paraphrase "taxi" with the one word, or you may paraphrase "taxi" with the word most similar to "taxi".

なお、換言候補生成部５２は、生成した入力換言文において、換言箇所を示す付加データを加えて換言文識別部５３に出力すればよい。 The paraphrase candidate generation unit 52 may add additional data indicating the paraphrase location to the generated input paraphrase sentence and output it to the paraphrase sentence identification unit 53.

換言文識別部５３は、入力換言文の中から、言葉らしい文を抽出し、抽出部６に出力する。ここで、換言文識別部５３は、例えば、Ｎ−ｇｒａｍ言語モデルを用いて、入力換言文の出現確率を算出し、算出した出現確率が基準値以上の入力換言文を抽出部６に出力する。Ｎ−ｇｒａｍ言語モデルは、人間が用いるであろう「言葉らしさ」を確率としてモデル化した確率的言語モデルである。例えば、「今日の夕食はカレーです」という文Ｂ２と、「今日の夕食は野球です」という文Ｂ３とがある場合、文Ｂ２は文Ｂ３よりも尤もらしいと言うことができる。この場合、Ｎ−ｇｒａｍ言語モデルでは文Ｂ２の出現確率が文Ｂ３の出現確率より高くなる。ここで、基準値としては、これ以上出現確率が低下すると不自然な文と判定される値であって経験的に得られた値が採用できる。なお、換言文識別部５３は、出力対象となる入力換言文において換言箇所を示す付加データも含めて、抽出部６に出力する。 The paraphrase sentence identification unit 53 extracts a word-like sentence from the input paraphrase sentence and outputs it to the extraction unit 6. Here, the paraphrase identification unit 53 calculates the appearance probability of the input paraphrase using, for example, the N-gram language model, and outputs the input paraphrase whose appearance probability is equal to or greater than the reference value to the extraction unit 6. .. The N-gram language model is a probabilistic language model that models the "word-likeness" that humans would use as a probability. For example, if there is a sentence B2 that says "Today's supper is curry" and a sentence B3 that "Today's supper is baseball", it can be said that sentence B2 is more plausible than sentence B3. In this case, in the N-gram language model, the appearance probability of sentence B2 is higher than the appearance probability of sentence B3. Here, as the reference value, a value obtained empirically, which is a value determined to be an unnatural sentence when the appearance probability is further lowered, can be adopted. In addition, the paraphrase sentence identification unit 53 outputs to the extraction unit 6 including the additional data indicating the paraphrase location in the input paraphrase sentence to be output.

図１に参照を戻す。抽出部６は、換言文生成部５から出力された入力換言文と用例対訳ＤＢ４に記憶された用例文との関連性を示す総合評価値（一致度の一例）をそれぞれ算出し、算出した総合評価値に基づいて、用例対訳ＤＢ４から１以上の用例文（第３文の一例）を抽出する。また、抽出部６は、抽出した用例文と類似する１以上の入力換言文を、換言文生成部５から出力された入力換言文から抽出する。以下、抽出された入力換言文を「換言抽出文」（一以上の第２文の一例）と記述する。なお、関連性とは、入力換言文と用例文とが構文上一定の関係を持つことを指す。 The reference is returned to FIG. The extraction unit 6 calculates a comprehensive evaluation value (an example of the degree of agreement) indicating the relationship between the input paraphrase sentence output from the paraphrase sentence generation unit 5 and the example sentence stored in the example bilingual translation DB4, and the calculated total. Based on the evaluation value, one or more example sentences (an example of the third sentence) are extracted from the example parallel translation DB4. Further, the extraction unit 6 extracts one or more input paraphrase sentences similar to the extracted example sentences from the input paraphrase sentences output from the paraphrase sentence generation unit 5. Hereinafter, the extracted input paraphrase sentence will be described as a "paraphrase extract sentence" (an example of one or more second sentences). Note that the relevance means that the input paraphrase sentence and the example sentence have a certain syntactical relationship.

ここで、抽出部６は、各入力換言文と各用例文との総合評価値を下記の指標Ａ１〜指標Ａ４を用いて算出する。 Here, the extraction unit 6 calculates the comprehensive evaluation value of each input paraphrase sentence and each example sentence by using the following indexes A1 to A4.

指標Ａ１（第３指標の一例）は、各入力換言文と各用例文との文構造の類似性を示す指標である。図５は、入力換言文と用例文との一例を示す図である。 The index A1 (an example of the third index) is an index showing the similarity of the sentence structure between each input paraphrase sentence and each example sentence. FIG. 5 is a diagram showing an example of an input paraphrase sentence and an example sentence.

図５を参照し、例えば、入力文Ｂ１「門真までタクシーにしたい」の入力換言文として、入力換言文Ｃ１「門真までタクシーにのりたい」及び入力換言文Ｃ２「門真までバスを利用したい」が換言文生成部５により生成されたとする。 With reference to FIG. 5, for example, as input paraphrases of input sentence B1 "I want to take a taxi to Kadoma", input paraphrase C1 "I want to take a taxi to Kadoma" and input paraphrase C2 "I want to use a bus to Kadoma" It is assumed that it is generated by the paraphrase generation unit 5.

また、用例対訳ＤＢ４には用例文Ｄ１「とことんまで話にのりたい」、及び用例文Ｄ２「京橋まで電車でいきたい」が記憶されていたとする。 Further, it is assumed that the example sentence D1 "I want to talk to the fullest" and the example sentence D2 "I want to go by train to Kyobashi" are stored in the example bilingual translation DB4.

まず、抽出部６は、入力換言文Ｃ１，Ｃ２を文節又は単語で区切り、入力換言文Ｃ１，Ｃ２の文構造を解析し、構文木を生成する。ここでは、入力換言文Ｃ１の例では、文節「門真まで」と文節「タクシーに」とが共に文節「のりたい」に係っている。そのため、文節「門真まで」に対応するノードＮ１１と文節「タクシーに」に対応するノードＮ１２とを、文節「のりたい」に対応するノードＮ１３にそれぞれ接続する２本のエッジＥ１１，Ｅ１２を含む木構造Ｔ１が生成されている。 First, the extraction unit 6 separates the input paraphrases C1 and C2 by clauses or words, analyzes the sentence structure of the input paraphrases C1 and C2, and generates a syntax tree. Here, in the example of the input paraphrase sentence C1, the phrase "to Kadoma" and the phrase "to taxi" are both related to the phrase "Noritai". Therefore, a tree containing two edges E11 and E12 that connect the node N11 corresponding to the clause "Kadoma" and the node N12 corresponding to the clause "Taxi" to the node N13 corresponding to the clause "Noritai", respectively. Structure T1 has been generated.

このような木構造の生成は、例えば、構文解析ツールである「ＫＮＰ」を用いて実現できる。また、文を構成する単語の品詞の解析は、例えば、形態素解析ツールである「ｊｕｍａｎ」を用いて実現できる。したがって、抽出部６は、「ＫＮＰ」及び「ｊｕｍａｎ」を利用して文の木構造の生成及び文を構成する単語の品詞の抽出を行えばよい。 The generation of such a tree structure can be realized by using, for example, a parsing tool "KNP". Further, the analysis of the part of speech of the words constituting the sentence can be realized by using, for example, "juman" which is a morphological analysis tool. Therefore, the extraction unit 6 may use "KNP" and "juman" to generate the tree structure of the sentence and extract the part of speech of the words constituting the sentence.

入力換言文Ｃ２の例では、文節「門真まで」と文節「バスを」とが共に文節「利用したい」に係っている。そのため、文節「門真まで」に対応するノードＮ２１と文節「タクシーに」に対応するノードＮ２２とを、文節「利用したい」に対応するノードＮ２３にそれぞれ接続する２本のエッジＥ２１，Ｅ２２を含む木構造Ｔ２が生成されている。 In the example of the input paraphrase sentence C2, the phrase "to Kadoma" and the phrase "bus" are both related to the phrase "want to use". Therefore, a tree including two edges E21 and E22 that connect the node N21 corresponding to the clause "Kadoma" and the node N22 corresponding to the clause "Taxi" to the node N23 corresponding to the clause "I want to use", respectively. Structure T2 is generated.

用例文Ｄ１の例では、文節「とことんまで」と文節「話に」とが共に文節「のりたい」に係っている。そのため、文節「とことんまで」に対応するノードＮ３１と文節「タクシーに」に対応するノードＮ３２とを、文節「のりたい」に対応するノードＮ３３にそれぞれ接続する２本のエッジＥ３１，Ｅ３２を含む木構造Ｔ３が生成されている。 In the example of example sentence D1, the phrase "to the fullest" and the phrase "to talk" are both related to the phrase "Noritai". Therefore, a tree containing two edges E31 and E32 that connect the node N31 corresponding to the clause "Tokonto" and the node N32 corresponding to the clause "Taxi" to the node N33 corresponding to the clause "Noritai", respectively. Structure T3 is generated.

用例文Ｄ２の例では、文節「京橋まで」と文節「電車で」とが共に文節「いきたい」に係っている。そのため、文節「京橋まで」に対応するノードＮ４１と文節「電車で」に対応するノードＮ４２とを、文節「いきたい」に対応するノードＮ４３にそれぞれ接続する２本のエッジＥ４１，Ｅ４２を含む木構造Ｔ４が生成されている。 In the example sentence D2, the phrase "to Kyobashi" and the phrase "by train" are both related to the phrase "I want to go". Therefore, a tree including two edges E41 and E42 that connect the node N41 corresponding to the clause "to Kyobashi" and the node N42 corresponding to the clause "by train" to the node N43 corresponding to the clause "I want to go", respectively. Structure T4 is generated.

このように、抽出部６は、入力換言文と用例文との木構造を解析する。そして、抽出部６は、例えば、ツリーマッチングの手法を用いて、入力換言文と用例文との木構造の類似度を指標Ａ１として算出すればよい。なお、用例文の木構造は用例対訳ＤＢ４に事前に記憶されていてもよい。本実施の形態では、指標Ａ１は、０〜１００％の数値をとり、木構造が一致する度合いが高いほど値が大きくなる。 In this way, the extraction unit 6 analyzes the tree structure of the input paraphrase sentence and the example sentence. Then, the extraction unit 6 may calculate the similarity of the tree structure between the input paraphrase sentence and the example sentence as the index A1 by using, for example, a tree matching method. The tree structure of the example sentence may be stored in advance in the example bilingual translation DB4. In the present embodiment, the index A1 takes a numerical value of 0 to 100%, and the higher the degree of matching of the tree structures, the larger the value.

図５の例では、入力換言文Ｃ１，Ｃ２と用例文Ｄ１，Ｄ２との木構造Ｔ１〜Ｔ４は全て同じ構造である。したがって、抽出部６は、入力換言文Ｃ１の用例文Ｄ１，Ｄ２に対する指標Ａ１を、それぞれ、１００％と算出する。また、抽出部６は、入力換言文Ｃ２の用例文Ｄ１，Ｄ２に対する指標Ａ１も、それぞれ、１００％と算出する。 In the example of FIG. 5, the tree structures T1 to T4 of the input paraphrases C1 and C2 and the example sentences D1 and D2 all have the same structure. Therefore, the extraction unit 6 calculates the index A1 for the example sentences D1 and D2 of the input paraphrase sentence C1 as 100%, respectively. Further, the extraction unit 6 also calculates the index A1 for the example sentences D1 and D2 of the input paraphrase sentence C2 as 100%, respectively.

なお、図５において、＜体言＞及び＜用言：動＞等の括弧の記載は、対応する文節の品詞等を説明するために便宜上付したものであり、指標Ａ１の算出にあたって実際に使用されるものではない。 In FIG. 5, the description in parentheses such as <uninflected word> and <phrase: action> is added for convenience to explain the part of speech of the corresponding phrase, and is actually used in the calculation of the index A1. It's not a thing.

指標Ａ２（第４指標の一例）は、各入力換言文と各用例文とにおける品詞の一致数が多いほど大きな値を示す指標である。 The index A2 (an example of the fourth index) is an index showing a larger value as the number of matching parts of speech in each input paraphrase sentence and each example sentence increases.

本実施の形態では、抽出部６は、文構造が一致する入力換言文と用例文とにおいて（指標Ａ１が１００％である入力換言文と用例文とにおいて）、同一箇所に位置する文節同士の品詞の一致数により指標Ａ２を算出する。 In the present embodiment, the extraction unit 6 uses the phrase sentences located at the same position in the input paraphrase sentence and the example sentence having the same sentence structure (in the input paraphrase sentence and the example sentence in which the index A1 is 100%). The index A2 is calculated from the number of matching parts of speech.

図５の例では、抽出部６は、文構造が一致する入力換言文と用例文とにおいて、同一箇所に位置する文節の品詞が名詞で一致するほど値が大きくなるように指標Ａ２を算出する。以下、「同一箇所に位置する文節」を「対応する文節」と記述する。また、「名詞の文節」とは、「名詞を含む文節」を意味する。例えば、文節「門真まで」は単語「門真」と単語「まで」とで構成されているが、単語「門真」は名詞なので、「門真まで」は名詞の文節となる。 In the example of FIG. 5, the extraction unit 6 calculates the index A2 so that in the input paraphrase sentence and the example sentence having the same sentence structure, the value becomes larger as the part of speech of the phrase located at the same place matches with the noun. .. Hereinafter, "phrases located in the same place" will be described as "corresponding clauses". Further, the "noun phrase" means a "phrase containing a noun". For example, the phrase "Kadoma" is composed of the word "Kadoma" and the word "Kadoma", but since the word "Kadoma" is a noun, "Kadoma" is a noun phrase.

詳細には、指標Ａ２は下記の式（１）により規定される。 In detail, the index A2 is defined by the following equation (1).

指標Ａ２＝（１−α／β）×１００（％）（１）
α：対応する文節同士が名詞でない数
β：入力換言文の名詞の文節の総数
図５に示す入力換言文Ｃ１において、名詞の文節は「門真まで」と「タクシーに」との２つである。また、入力換言文Ｃ１の文節「門真まで」に対応する用例文Ｄ１の文節「とことんまで」は副詞であり、入力換言文Ｃ１の文節「タクシーに」に対応する用例文Ｄ１の文節「話に」は名詞である。したがって、入力換言文Ｃ１と用例文Ｄ１とにおいて、β＝２、α＝１となり、指標Ａ２は５０％になる。 Index A2 = (1-α / β) × 100 (%) (1)
α: Number of corresponding clauses that are not nouns β: Total number of noun clauses in the input paraphrase sentence In the input paraphrase sentence C1 shown in FIG. .. In addition, the phrase "Tokonto" in the example sentence D1 corresponding to the phrase "Kadoma" in the input paraphrase C1 is an adverb, and the phrase "To the story" in the example sentence D1 corresponding to the phrase "Taxi" in the input paraphrase C1. Is a noun. Therefore, in the input paraphrase sentence C1 and the example sentence D1, β = 2 and α = 1, and the index A2 becomes 50%.

また、入力換言文Ｃ１の文節「門真まで」に対応する用例文Ｄ２の文節「京橋まで」は名詞であり、入力換言文Ｃ１の文節「タクシーに」に対応する用例文Ｄ２の文節「電車で」は名詞である。したがって、入力換言文Ｃ１と用例文Ｄ２とにおいて、β＝２、α＝０となり、指標Ａ２は１００％になる。同様に、入力換言文Ｃ２と用例文Ｄ１，Ｄ２との指標Ａ２はそれぞれ５０％，１００％となる。 In addition, the phrase "to Kyobashi" in the example sentence D2 corresponding to the phrase "to Kadoma" in the input paraphrase C1 is a noun, and the phrase "by train" in the example sentence D2 corresponding to the phrase "to taxi" in the input paraphrase C1. Is a noun. Therefore, in the input paraphrase sentence C1 and the example sentence D2, β = 2 and α = 0, and the index A2 becomes 100%. Similarly, the indexes A2 of the input paraphrase sentence C2 and the example sentences D1 and D2 are 50% and 100%, respectively.

なお、式（１）のαは、対応する文節同士のカテゴリーが異なる数であってもよい。ここで、カテゴリーとは、例えば、地名、交通、抽象物というように名詞の単語が属している種類を指す。図５において、地名としては「門真」及び「京橋」が該当し、交通としては「タクシー」や「バス」が該当し、抽象物としては「話」が該当する。 Note that α in equation (1) may be a number in which the corresponding clauses have different categories. Here, the category refers to a type to which a noun word belongs, such as a place name, traffic, or an abstraction. In FIG. 5, "Kadoma" and "Kyobashi" are applicable as place names, "taxi" and "bus" are applicable as transportation, and "story" is applicable as an abstraction.

この態様を採用する場合、例えば、入力換言文Ｃ１の文節「タクシーに」及び用例文Ｄ１の文節「話に」は共に名詞の文節であるが、前者のカテゴリーは「交通」であり、後者のカテゴリーは「抽象物」なので、αは１カウントアップされることになり、カテゴリーを考慮しない態様を採用した場合に比べ、指標Ａ２は小さくなる。 When this aspect is adopted, for example, the phrase "taxi" in the input paraphrase C1 and the phrase "talk" in the example sentence D1 are both noun phrases, but the former category is "traffic" and the latter category. Since the category is "abstract", α is incremented by 1 and the index A2 is smaller than when the mode that does not consider the category is adopted.

ここで、抽出部６は、文構造が一致する入力換言文と用例文とに対して指標Ａ２を算出したが、本開示はこれに限定されず、文構造の一致の有無を考慮することなく、すなわち、指標Ａ１とは独立して、指標Ａ２を算出してもよい。また、名詞の文節の一致数に基づいて指標Ａ２は算出されているが、品詞の一致数に基づいて指標Ａ２は算出されてもよい。 Here, the extraction unit 6 calculates the index A2 for the input paraphrase sentence and the example sentence having the same sentence structure, but the present disclosure is not limited to this, and the presence or absence of the matching sentence structure is not considered. That is, the index A2 may be calculated independently of the index A1. Further, although the index A2 is calculated based on the number of matching noun clauses, the index A2 may be calculated based on the number of matching part of speech.

例えば、「文節Ｃ１１／文節Ｃ１２／文節Ｃ１３／文節Ｃ１４」からなる入力換言文Ｃ１Ｘがあったとする。また、「文節Ｄ１１／文節Ｄ１２／文節Ｄ１３」からなる用例文Ｄ１Ｘがあったとする。なお、「／」は文節の切れ目を示す。この場合、抽出部６は、入力換言文Ｃ１Ｘと用例文Ｄ１Ｘとにおいて、先頭から数えて同じ順位に位置する文節同士を、対応する文節として抽出し、抽出した文節同士の品詞の一致数に基づいて指標Ａ２を算出すればよい。 For example, suppose that there is an input paraphrase sentence C1X composed of "phrase C11 / clause C12 / clause C13 / clause C14". Further, it is assumed that there is an example sentence D1X composed of "phrase D11 / clause D12 / clause D13". In addition, "/" indicates a break of a phrase. In this case, the extraction unit 6 extracts the clauses located in the same order from the beginning in the input paraphrase sentence C1X and the example sentence D1X as corresponding clauses, and is based on the number of matching part of speech between the extracted clauses. The index A2 may be calculated.

例えば、抽出部６は、「文節Ｃ１１」及び「文節Ｄ１１」と、「文節Ｃ１２」及び「文節Ｄ１２」と、「文節Ｃ１３」及び「文節Ｄ１３」との３つの文節ペアを対応する文節として抽出する。なお、「文節Ｃ１４」は用例文Ｄ１Ｘに対応する文節がないので、抽出対象から除外される。そして、抽出部６は、品詞が一致しない文節ペアの総数をαとして算出し、入力換言文Ｃ１Ｘから抽出した文節数をβとして、式（１）を用いて指標Ａ２を算出すればよい。 For example, the extraction unit 6 extracts three phrase pairs of "clause C11" and "clause D11", "clause C12" and "clause D12", and "clause C13" and "clause D13" as corresponding clauses. do. Note that "clause C14" is excluded from the extraction target because there is no clause corresponding to the example sentence D1X. Then, the extraction unit 6 may calculate the index A2 using the equation (1) with the total number of clause pairs whose part of speech does not match as α and the number of clauses extracted from the input paraphrase C1X as β.

指標Ａ３（第２指標の一例）は、入力換言文に対して文構造が一致する用例文と入力文とのテキスト類似度が小さいほど大きな値を示す指標である。 The index A3 (an example of the second index) is an index showing a larger value as the text similarity between the example sentence and the input sentence whose sentence structure matches the input paraphrase sentence is smaller.

詳細には、指標Ａ３は式（２）によって規定される。 In detail, the index A3 is defined by the equation (2).

指標Ａ３＝１００−入力文と用例文とのテキスト類似度（２）
まず、抽出部６は、入力換言文と文構造が一致する用例文を用例対訳ＤＢ４から抽出する。そして、抽出部６は、抽出した用例文と入力文とのテキスト類似度を算出し、算出したテキスト類似度が小さいほど値が大きくなるように用例文毎に指標Ａ３を算出する。 Index A3 = 100-Text similarity between input sentences and example sentences (2)
First, the extraction unit 6 extracts an example sentence whose sentence structure matches the input paraphrase sentence from the example parallel translation DB 4. Then, the extraction unit 6 calculates the text similarity between the extracted example sentence and the input sentence, and calculates the index A3 for each example sentence so that the smaller the calculated text similarity is, the larger the value is.

文構造が一致するとは、上述したように木構造が一致すること、すなわち、指標Ａ１が１００％であることを意味する。テキスト類似度は、文の表現及び字面というような文同士の内容がどの程度一致しているかを示し、例えば、２つの文字列同士の類似性を算出するＰＨＰ言語のｓｉｍｉｌａｒ＿ｔｅｘｔ関数を用いて算出される。 When the sentence structures match, it means that the tree structures match as described above, that is, the index A1 is 100%. The text similarity indicates how much the contents of sentences such as the expression and the character face of the sentence match, and is calculated using, for example, the similla_text function of the PHP language that calculates the similarity between two character strings. NS.

例えば、入力文Ｂ１「門真までタクシーにしたい」に対する入力換言文として、入力換言文Ｃ１「門真までタクシーにのりたい」が生成されたとする。この場合、抽出部６は、入力換言文Ｃ１と、文構造が一致する用例文を対訳用例ＤＢ４から抽出する。ここでは、用例文Ｄ２「京橋まで電車でいきたい」、用例文Ｄ３「守口まで車を利用したい」、用例文Ｄ４「東京まで新幹線で行く」、及び用例文Ｄ５「とことんまで話にのりたい」の４つの用例文が抽出されたとする。 For example, suppose that the input paraphrase C1 "I want to take a taxi to Kadoma" is generated as the input paraphrase for the input sentence B1 "I want to take a taxi to Kadoma". In this case, the extraction unit 6 extracts an example sentence having the same sentence structure as the input paraphrase sentence C1 from the parallel translation example DB4. Here, example sentence D2 "I want to go to Kyobashi by train", example sentence D3 "I want to use a car to Moriguchi", example sentence D4 "I want to go to Tokyo by Shinkansen", and example sentence D5 "I want to talk to the whole story" It is assumed that the four example sentences of are extracted.

この場合、抽出部６は、入力文Ｂ１と４つの用例文Ｄ２〜Ｄ５とのそれぞれの指標Ａ３を、式（２）を用いて算出する。 In this case, the extraction unit 6 calculates the respective indexes A3 of the input sentence B1 and the four example sentences D2 to D5 by using the equation (2).

上記説明では、抽出部６は、指標Ａ３を算出する場合、入力換言文と文構造が一致する用例文を抽出したが、本開示はこれに限定されず、入力換言文と文構造が類似する用例文を抽出してもよい。ここで、文構造が類似するとは、例えば、指標Ａ１が基準値以上の場合が該当する。基準値としては、５０％、６０％、７０％、８０％、９０％といった少なくとも５０％より大きな値が採用できる。 In the above description, when calculating the index A3, the extraction unit 6 extracts an example sentence in which the input paraphrase sentence and the sentence structure match, but the present disclosure is not limited to this, and the input paraphrase sentence and the sentence structure are similar. Example sentences may be extracted. Here, the fact that the sentence structures are similar corresponds to, for example, the case where the index A1 is equal to or more than the reference value. As the reference value, a value larger than at least 50% such as 50%, 60%, 70%, 80%, and 90% can be adopted.

指標Ａ４（第１指標の一例）は、入力換言文と用例文とのテキスト類似度を示す指標である。テキスト類似度は指標Ａ３を算出する際に用いられたテキスト類似度と同じである。 The index A4 (an example of the first index) is an index indicating the text similarity between the input paraphrase sentence and the example sentence. The text similarity is the same as the text similarity used when calculating the index A3.

本実施の形態では、抽出部６は、入力換言文と文構造が一致する用例文、すなわち、指標Ａ１が１００％である用例文を対訳用例ＤＢ４から抽出し、抽出した用例文と入力換言文とのそれぞれのテキスト類似度を指標Ａ４として算出する。 In the present embodiment, the extraction unit 6 extracts an example sentence whose sentence structure matches the input paraphrase sentence, that is, an example sentence in which the index A1 is 100% from the bilingual example DB4, and extracts the extracted example sentence and the input paraphrase sentence. Each text similarity with and is calculated as an index A4.

例えば、上記の入力換言文Ｃ１「門真までタクシーにのりたい」が生成されたとすると、抽出部６は入力換言文Ｃ３と文構造が一致する用例文を対訳用例ＤＢ４から抽出する。ここでは、指標Ａ３で説明した４つの用例文Ｄ２〜Ｄ５が抽出されたする。この場合、抽出部６は、入力換言文Ｃ３と用例文Ｄ２〜Ｄ５とのそれぞれのテキスト類似度を指標Ａ４として算出すればよい。 For example, assuming that the above input paraphrase sentence C1 "I want to take a taxi to Monshin" is generated, the extraction unit 6 extracts an example sentence whose sentence structure matches that of the input paraphrase sentence C3 from the bilingual example DB4. Here, the four example sentences D2 to D5 described in the index A3 are extracted. In this case, the extraction unit 6 may calculate the text similarity between the input paraphrase sentence C3 and the example sentences D2 to D5 as the index A4.

なお、抽出部６は、換言箇所の多い入力換言文ほど指標Ａ４の値を大きく算出してもよい。例えば、抽出部６は、テキスト類似度に換言率を乗じることで、最終的な指標Ａ４を算出してもよい。換言率としては、例えば、入力換言文における全文字数のうち、換言された文字数の割合が採用できる。 The extraction unit 6 may calculate the value of the index A4 larger as the input paraphrase sentence has more paraphrase points. For example, the extraction unit 6 may calculate the final index A4 by multiplying the text similarity by the paraphrase rate. As the paraphrase rate, for example, the ratio of the number of paraphrased characters to the total number of characters in the input paraphrase sentence can be adopted.

そして、抽出部６は、各用例文の指標Ａ１〜Ａ４の例えば積を各用例文の総合評価値として算出する。そして、抽出部６は、総合評価値が大きい順にｎ（１以上の整数）個の用例文を抽出する。 Then, the extraction unit 6 calculates, for example, the product of the indexes A1 to A4 of each example sentence as the comprehensive evaluation value of each example sentence. Then, the extraction unit 6 extracts n (integer of 1 or more) example sentences in descending order of the comprehensive evaluation value.

なお、抽出部６は、各用例文のうち、総合評価値が基準値（閾値の一例）より大きい用例文を抽出してもよい。或いは、抽出部６は、各用例文のうち、総合評価値が基準値より大きい用例文を抽出し、抽出した用例文がｎ個以上であれば、総合評価値が高い順にｎ個の用例文を抽出してもよい。 In addition, the extraction unit 6 may extract an example sentence whose comprehensive evaluation value is larger than a reference value (an example of a threshold value) from each example sentence. Alternatively, the extraction unit 6 extracts from each example sentence an example sentence whose overall evaluation value is larger than the reference value, and if the number of extracted example sentences is n or more, n example sentences in descending order of the overall evaluation value. May be extracted.

そして、抽出部６は、抽出したｎ個の用例文の用例対訳を用例対訳ＤＢ４から抽出し、出力部９に出力する。 Then, the extraction unit 6 extracts the example translations of the extracted n example sentences from the example translation DB4 and outputs them to the output unit 9.

抽出部６は、用例文を抽出する処理が終了すると、抽出したｎ個の用例文と類似するｎ個の換言抽出文を抽出する処理を行う。ここで、抽出部６は抽出したｎ個の用例文のそれぞれに対して指標Ａ４が最大の入力換言文を抽出することで、ｎ個の換言抽出文として抽出する。 When the process of extracting the example sentences is completed, the extraction unit 6 performs a process of extracting n paraphrase extraction sentences similar to the extracted n example sentences. Here, the extraction unit 6 extracts the input paraphrase sentence having the maximum index A4 for each of the extracted n example sentences, thereby extracting as n paraphrase extraction sentences.

例えば、換言文生成部５から４個の入力換言文Ｃ１〜Ｃ４が出力され、総合評価値から２個の用例文Ｄ１，Ｄ２が抽出されたとすると、抽出部６は、用例文Ｄ１，Ｄ２のそれぞれに対して、入力換言文Ｃ１〜Ｃ４のそれぞれの指標Ａ４を算出する。そして、抽出部６は、用例文Ｄ１，Ｄ２のそれぞれにおいて指標Ａ４が最大の入力換言文を換言抽出文として抽出する。 For example, assuming that four input paraphrases C1 to C4 are output from the paraphrase generation unit 5 and two example sentences D1 and D2 are extracted from the comprehensive evaluation value, the extraction unit 6 uses the example sentences D1 and D2. For each, the index A4 of each of the input paraphrases C1 to C4 is calculated. Then, the extraction unit 6 extracts the input paraphrase sentence having the maximum index A4 in each of the example sentences D1 and D2 as the paraphrase extraction sentence.

上記説明では、抽出部６は、指標Ａ１〜Ａ４の全てを用いて総合評価値を算出したが、本開示はこれに限定されず、抽出部６は、指標Ａ１〜Ａ４の少なくとも１つを用いて総合評価値を算出してもよい。また、抽出部６は、指標Ａ１〜Ａ４の積を総合評価値として採用したが、本開示はこれに限定されず、抽出部６は、指標Ａ１〜Ａ４の平均値や重み付け平均値を総合評価値として採用してもよい。 In the above description, the extraction unit 6 has calculated the comprehensive evaluation value using all of the indicators A1 to A4, but the present disclosure is not limited to this, and the extraction unit 6 uses at least one of the indicators A1 to A4. The comprehensive evaluation value may be calculated. Further, the extraction unit 6 has adopted the product of the indicators A1 to A4 as the comprehensive evaluation value, but the present disclosure is not limited to this, and the extraction unit 6 comprehensively evaluates the average value and the weighted average value of the indicators A1 to A4. It may be adopted as a value.

機械翻訳部７は、抽出部６から出力されたｎ個の換言抽出文のそれぞれを第２言語に機械翻訳することで、ｎ個の換言翻訳文（第５文の一例）を生成する。ここで、機械翻訳部７では、何らかの翻訳エンジンを利用することで機械翻訳を行う。例えば、機械翻訳部７は、ｗｅｂサイト上で提供されている翻訳エンジンを利用してもよいし、翻訳支援装置１自身が備える翻訳アプリケーションソフトを利用してもよい。また、機械翻訳部７は、用例一致判定部３から出力された入力文を機械翻訳し、入力翻訳文を生成する。 The machine translation unit 7 generates n paraphrase translation sentences (an example of the fifth sentence) by machine translating each of the n paraphrase extraction sentences output from the extraction unit 6 into a second language. Here, the machine translation unit 7 performs machine translation by using some kind of translation engine. For example, the machine translation unit 7 may use the translation engine provided on the web site, or may use the translation application software provided by the translation support device 1 itself. Further, the machine translation unit 7 machine-translates the input sentence output from the example match determination unit 3 to generate the input translation sentence.

信頼度付与部８は、機械翻訳部７により生成されたｎ個の換言翻訳文の翻訳信頼度を算出する。ここで、信頼度付与部８は、換言翻訳文を第２言語から第１言語に逆翻訳したときの対応する換言抽出文との一致度から翻訳信頼度を算出すればよい。また、信頼度付与部８は、入力翻訳文についても翻訳信頼度を算出する。 The reliability imparting unit 8 calculates the translation reliability of n paraphrase translations generated by the machine translation unit 7. Here, the reliability imparting unit 8 may calculate the translation reliability from the degree of coincidence with the corresponding paraphrase extract sentence when the paraphrase translation sentence is back-translated from the second language to the first language. In addition, the reliability imparting unit 8 also calculates the translation reliability of the input translated sentence.

出力部９は、抽出部６により抽出されたｎ個の用例対訳（第４文の一例）を表示する。また、出力部９は、機械翻訳部７により生成されたｎ個の換言翻訳文とそれに対応するｎ個の入力換言文とを翻訳信頼度と合わせて表示する。更に、出力部９は、入力文と入力翻訳文とを翻訳信頼度と合わせてを表示する。 The output unit 9 displays n example parallel translations (an example of the fourth sentence) extracted by the extraction unit 6. Further, the output unit 9 displays the n paraphrase translations generated by the machine translation unit 7 and the corresponding n input paraphrases together with the translation reliability. Further, the output unit 9 displays the input sentence and the input translation sentence together with the translation reliability.

なお、出力部９は、用例一致判定部３により入力文が用例対訳ＤＢ４に記憶されたいずれかの用例文と一致すると判定された場合、用例一致判定部３から出力された用例対訳を表示すればよい。 When the output unit 9 determines that the input sentence matches any of the example sentences stored in the example parallel translation DB 4, the example match determination unit 3 displays the example parallel translation output from the example match determination unit 3. Just do it.

図７は、出力部９が表示する出力画像５００の一例を示す図である。出力画像５００は、入力文表示欄５１０と、翻訳支援情報表示欄５２０とを含む。入力文表示欄５１０は、入力文５１１「門真までタクシーにしたい」と、入力文５１１の機械翻訳結果である入力翻訳文５１２「Ｉｗａｎｔｔｏｔａｘｉｔｏｋａｄｏｍａ」とを並べて表示する。また、入力文表示欄５１０には、入力翻訳文５１２の翻訳信頼度を表示する信頼度表示欄５１３も表示されている。ここでは、入力翻訳文５１２の逆翻訳結果と入力文５１１との一致度が７０％であったので、信頼度表示欄５１３には「７０％」と表示されている。 FIG. 7 is a diagram showing an example of the output image 500 displayed by the output unit 9. The output image 500 includes an input sentence display field 510 and a translation support information display field 520. The input sentence display field 510 displays the input sentence 511 "I want to take a taxi to Kadoma" and the input translation sentence 512 "I want to taxi to kadoma" which is the machine translation result of the input sentence 511 side by side. Further, in the input sentence display field 510, a reliability display field 513 for displaying the translation reliability of the input translation sentence 512 is also displayed. Here, since the degree of agreement between the reverse translation result of the input translated sentence 512 and the input sentence 511 was 70%, "70%" is displayed in the reliability display column 513.

翻訳支援情報表示欄５２０は、入力文５１１と関連する用例文等を表示する欄である。ここでは、入力文５１１に対する用例文として２つの用例文が抽出部６により抽出されたので、２つの用例文５３１ａ，５４１ａに対応する２つの翻訳支援情報表示欄５３０，５４０が表示されている。また、用例文５３１ａの方が用例文５４１ａよりも総合評価値が高かったので、用例文５３１ａに対応する翻訳支援情報表示欄５３０の方が用例文５４１ａに対応する翻訳支援情報表示欄５４０よりも上側に表示されている。 The translation support information display field 520 is a field for displaying an example sentence or the like related to the input sentence 511. Here, since two example sentences are extracted by the extraction unit 6 as example sentences for the input sentence 511, two translation support information display fields 530 and 540 corresponding to the two example sentences 531a and 541a are displayed. Further, since the example sentence 531a had a higher overall evaluation value than the example sentence 541a, the translation support information display field 530 corresponding to the example sentence 531a was higher than the translation support information display field 540 corresponding to the example sentence 541a. It is displayed on the upper side.

翻訳支援情報表示欄５３０には、「参考用例１」と見出しが付けられた参考用例表示欄５３１と、「参考翻訳１」と見出しが付けられた参考翻訳表示欄５３２とが含まれる。 The translation support information display field 530 includes a reference example display field 531 with the heading "Reference example 1" and a reference translation display field 532 with the heading "Reference translation 1".

参考用例表示欄５３１には、総合評価値が１位の用例文５３１ａ「京橋まで電車でいきたい」と、それに対応する用例対訳５３１ｂ「Ｉｗａｎｔｔｏｇｏｂｙｔｒａｉｎｔｏｋｙｏｂａｓｈｉ」とが並べて表示されている。 In the reference example display column 531, the example sentence 531a "I want to go to Kyobashi by train" and the corresponding example translation 531b "I want to go train to kyobashi" are displayed side by side. There is.

参考翻訳表示欄５３２には、用例文５３１ａに対してテキスト類似度（指標Ａ４）が最大の換言抽出文５３２ａ「門真まで電車でいきたい」と、それに対応する換言翻訳文５３２ｂ「Ｉｗａｎｔｔｏｇｏｂｙｔｒａｉｎｔｏｋａｄｏｍａ」とが並べて表示されている。 In the reference translation display column 532, there is a paraphrase extraction sentence 532a "I want to go by train to Kadoma" with the maximum text similarity (index A4) with respect to the example sentence 531a, and a paraphrase translation sentence 532b "I want to go" corresponding to it. "by train to kadoma" is displayed side by side.

また、翻訳支援情報表示欄５３０には、換言翻訳文５３２ｂの翻訳信頼度を示す信頼度表示欄５３３が表示されている。ここでは、換言翻訳文５３２ｂの逆翻訳結果と換言抽出文５３２ａとの一致度が９５％であったので「９５％」と表示されている。 Further, in the translation support information display column 530, a reliability display column 533 indicating the translation reliability of the paraphrase translation sentence 532b is displayed. Here, since the degree of agreement between the reverse translation result of the paraphrase translation sentence 532b and the paraphrase extract sentence 532a was 95%, it is displayed as "95%".

また、換言抽出文５３２ａにおいては、「電車でいきたい」の箇所にアンダーラインが引かれており、入力文５１１に対する換言箇所が他の箇所と区別可能に表示されている。また、換言翻訳文５３２ｂにおいても、「Ｉｗａｎｔｔｏｇｏｂｙｔｒａｉｎ」の箇所にアンダーラインが引かれており、換言箇所の翻訳結果が他の箇所の翻訳結果と区別可能に表示されている。 Further, in the paraphrase extraction sentence 532a, the part "I want to go by train" is underlined, and the paraphrase part for the input sentence 511 is displayed so as to be distinguishable from other parts. Further, also in the paraphrase translation sentence 532b, the part of "I want to go by train" is underlined, and the translation result of the paraphrase part is displayed so as to be distinguishable from the translation result of other parts.

これにより、ユーザは、換言抽出文５３２ａ及び換言翻訳文５３２ｂにおいて、入力文５１１に対する換言箇所を一目で認識することができる。 As a result, the user can recognize at a glance the paraphrase portion for the input sentence 511 in the paraphrase extract sentence 532a and the paraphrase translation sentence 532b.

翻訳支援情報表示欄５４０も、翻訳支援情報表示欄５３０と同様、参考用例表示欄５４１と参考翻訳表示欄５４２とが表示されている。 Similar to the translation support information display field 530, the translation support information display field 540 also displays the reference example display field 541 and the reference translation display field 542.

参考用例表示欄５４１には、総合評価値が２位の用例文５４１ａ「守口まで車を利用したい」と、それに対応する用例対訳５４１ｂ「Ｉｗａｎｔｔｏｔａｋｅａｃａｒｔｏｍｏｒｉｇｕｃｈｉ」とが表示されている。 In the reference example display column 541, the example sentence 541a "I want to use the car to Moriguchi" and the corresponding example translation 541b "I want to take a car to moriguchi" are displayed. ..

参考翻訳表示欄５４２には、用例文５４１ａに対してテキスト類似度（指標Ａ４）が最大の換言抽出文５４２ａ「門真までバスを利用したい」と、それに対応する換言翻訳文５４２ｂ「Ｉｗａｎｔｔｏｔａｋｅｔｈｅｂｕｓｔｏｋａｄｏｍａ」とが並べて表示されている。 In the reference translation display column 542, there is a paraphrase extraction sentence 542a "I want to use the bus to Kadoma" with the maximum text similarity (index A4) with respect to the example sentence 541a, and a paraphrase translation sentence 542b "I want to take" corresponding to it. "the bus to kadoma" is displayed side by side.

換言抽出文５４２ａにおいて、入力文５１１に対する換言箇所は「バスを利用したい」であるので、その箇所にアンダーラインが引かれている。また、換言翻訳文５４２ｂにおいて、換言箇所に対応する翻訳箇所「Ｉｗａｎｔｔｏｔａｋｅｔｈｅｂｕｓ」にアンダーラインが引かれている。更に、換言翻訳文５４２ｂの逆翻訳結果と、換言抽出文との一致度が９０％であったので、信頼度表示欄には「９０％」と表示されている。 In the paraphrase extraction sentence 542a, the paraphrase part for the input sentence 511 is "I want to use the bus", so that part is underlined. Further, in the paraphrase translation sentence 542b, the translation portion "I want to take the bus" corresponding to the paraphrase portion is underlined. Further, since the degree of agreement between the reverse translation result of the paraphrase translation sentence 542b and the paraphrase extract sentence was 90%, "90%" is displayed in the reliability display column.

このように、出力画像５００には、総合評価値が高い用例文を含む翻訳支援情報表示欄５２０ほど上側に表示されるので、ユーザは重要度の高い用例対訳及び換言翻訳文等を含む翻訳支援情報を一目で認識できる。 In this way, since the output image 500 is displayed on the upper side of the translation support information display field 520 including the example sentences having a high overall evaluation value, the user can use the translation support including the highly important example parallel translations and paraphrase translations. Information can be recognized at a glance.

なお、図７の例では、２つの翻訳支援情報表示欄５２０が示されているがこれは一例であり、抽出部６により３つ以上の用例文が抽出されたのであれば、出力画像５００は、３つ以上の用例文を含む翻訳支援情報表示欄５２０を表示すればよい。この場合も、総合評価値が高い用例文ほど上側に位置するように、翻訳支援情報表示欄５２０は表示されればよい。 In the example of FIG. 7, two translation support information display fields 520 are shown, but this is an example. If three or more example sentences are extracted by the extraction unit 6, the output image 500 is The translation support information display field 520 including three or more example sentences may be displayed. In this case as well, the translation support information display field 520 may be displayed so that the example sentence having the higher overall evaluation value is located on the upper side.

また、図７の例では、換言箇所（文字列）をアンダーラインを用いてハイライト表示したが、本開示はこれに限定されず、換言箇所の背景にマーカーを付してハイライト表示する態様を採用してもよいし、換言箇所の文字の色を非換言箇所の文字の色と変えてハイライト表示する態様を採用してもよいし、換言箇所を太字でハイライト表示する態様を採用してもよいし、これらの態様を組み合わせた態様を採用してもよい。更に、本開示は、換言箇所をハイライト表示させず、非換言箇所をハイライト表示しても良い。 Further, in the example of FIG. 7, the paraphrase portion (character string) is highlighted by using an underline, but the present disclosure is not limited to this, and a marker is attached to the background of the paraphrase portion to highlight the paraphrase portion. May be adopted, the color of the characters in the paraphrased part may be changed to the color of the characters in the non-paraphrased part and highlighted, or the paraphrased part may be highlighted in bold. Alternatively, a mode in which these modes are combined may be adopted. Further, in the present disclosure, the paraphrased portion may not be highlighted and the non-paraphrased portion may be highlighted.

また、図７の例では、用例文及び用例対訳には特にハイライト表示が付されていないが本開示は、これに限定されず、換言抽出文に対応する用例文及び用例対訳の箇所（文字列）をハイライト表示してもよい。 Further, in the example of FIG. 7, the example sentence and the example translation are not particularly highlighted, but the present disclosure is not limited to this, and the part (character) of the example sentence and the example translation corresponding to the paraphrase extraction sentence is not limited to this. Column) may be highlighted.

次に、翻訳支援装置１における抽出部６の処理の具体例について説明する。ここでは、入力文（Ｉ）「門真までタクシーにしたい」が入力されたとし、換言文生成部５により以下の３つの入力換言文が生成されたとする。この例では、（Ａ）〜（Ｃ）は、全て同じ文構造、すなわち、同じ木構造を持っているとする。 Next, a specific example of the processing of the extraction unit 6 in the translation support device 1 will be described. Here, it is assumed that the input sentence (I) "I want to take a taxi to Kadoma" is input, and the following three input paraphrase sentences are generated by the paraphrase sentence generation unit 5. In this example, it is assumed that (A) to (C) all have the same sentence structure, that is, the same tree structure.

（Ａ）「門真まで電車でいきたい」
（Ｂ）「門真までバスを利用したい」
（Ｃ）「門真までタクシーにのりたい」
また、指標Ａ１が１００％、すなわち、上記の入力換言文（Ａ）〜（Ｃ）に対して同一の文構造を持つ下記の４つの用例文（１）〜（４）が用例対訳ＤＢ４から抽出されたとする。なお、この具体例では、抽出部６は、入力換言文と文構造が同一の用例文を用例対訳ＤＢ４から抽出し、抽出した用例文に対して指標Ａ２〜指標Ａ４を算出するものとする。 (A) "I want to go to Kadoma by train"
(B) "I want to use the bus to Kadoma"
(C) "I want to take a taxi to Kadoma"
Further, the following four example sentences (1) to (4) having the same sentence structure with respect to the above input paraphrase sentences (A) to (C) are extracted from the example bilingual translation DB4 when the index A1 is 100%. Suppose it was done. In this specific example, the extraction unit 6 extracts an example sentence having the same sentence structure as the input paraphrase sentence from the example parallel translation DB4, and calculates indexes A2 to A4 for the extracted example sentence.

（１）「京橋まで電車でいきたい」
（２）「守口まで車を利用したい」
（３）「東京まで新幹線で行く」
（４）「とことんまで話にのりたい」
次に、抽出部６は、用例文（１）〜（４）のそれぞれについて、上記の式（１）を用いて指標Ａ２を算出する。この具体例では、入力換言文（Ａ）〜（Ｃ）は同一の文構造を持っているので、入力換言文（Ａ）を代表させ、用例文（１）〜（４）と入力換言文（Ａ）との指標Ａ２を算出する。 (1) "I want to go to Kyobashi by train"
(2) "I want to use a car to Moriguchi"
(3) "Go to Tokyo by Shinkansen"
(4) "I want to talk to you thoroughly"
Next, the extraction unit 6 calculates the index A2 using the above formula (1) for each of the example sentences (1) to (4). In this specific example, since the input paraphrase sentences (A) to (C) have the same sentence structure, the input paraphrase sentences (A) are represented, and the example sentences (1) to (4) and the input paraphrase sentences ( Calculate the index A2 with A).

入力換言文（Ａ）の名詞の文節の総数は、「京橋まで」と「電車で」との２つであるので、β＝２である。 Since the total number of noun clauses in the input paraphrase (A) is "to Kyobashi" and "by train", β = 2.

また、用例文（１）〜（３）は、入力換言文（Ａ）に対し、対応する文節同士が名詞でない数は０なのでα＝０となり、指標Ａ２＝１００％となる。一方、用例文（４）は、入力換言文（Ａ）の名詞の文節「電車で」に対応する文節「話に」が名詞であるが、入力換言文（Ａ）の名詞の文節「門真まで」に対応する文節「とことんまで」が名詞ではない。そのため、用例文（４）は、入力換言文（Ａ）に対し、対応する文節同士が名詞でない数は１つになる。よって、用例文（４）は、α＝１となり、指標Ａ２＝（１−１／２）×１００＝５０％となる。したがって、図６の表Ｈ１に示すように、用例文（１）〜（４）の指標Ａ２は、それぞれ、「１００％」、「１００％」、「１００％」、「５０％」となっている。図６は、本実施の形態における具体例を纏めた表Ｈ１である。表Ｈ１では、用例文（１）〜（４）に対する指標Ａ１〜Ａ４が算出されている。 Further, in the example sentences (1) to (3), α = 0 and the index A2 = 100% because the number of corresponding phrases that are not nouns is 0 with respect to the input paraphrase sentence (A). On the other hand, in the example sentence (4), the noun phrase "story" corresponding to the noun phrase "train" in the input paraphrase sentence (A) is the noun, but the noun phrase "Kadoma" in the input paraphrase sentence (A) The phrase "to the fullest" corresponding to "" is not a noun. Therefore, in the example sentence (4), the number of corresponding phrases that are not nouns is one with respect to the input paraphrase sentence (A). Therefore, in the example sentence (4), α = 1 and the index A2 = (1-1 / 2) × 100 = 50%. Therefore, as shown in Table H1 of FIG. 6, the indexes A2 of the example sentences (1) to (4) are "100%", "100%", "100%", and "50%", respectively. There is. FIG. 6 is Table H1 summarizing specific examples in the present embodiment. In Table H1, the indexes A1 to A4 for the example sentences (1) to (4) are calculated.

次に、抽出部６は、上記の式（２）を用いて、用例文（１）〜（４）と、入力文（Ｉ）との指標Ａ３をそれぞれ算出する。ここでは、表Ｈ１に示すように、用例文（１）〜（４）の指標Ａ３は、それぞれ、「３６．８％」、「４１．１％」、「６１．８％」、「５５．８％」と算出された。これにより、用例文（１）〜（４）のうち、用例文（３）「東京まで新幹線で行く」が入力文（Ｉ）「門真までタクシーにしたい」に対してテキスト類似度が最も低い、すなわち、意味内容が最も遠いことが分かる。 Next, the extraction unit 6 calculates the index A3 of the example sentences (1) to (4) and the input sentence (I) by using the above formula (2). Here, as shown in Table H1, the indicators A3 of the example sentences (1) to (4) are "36.8%", "41.1%", "61.8%", and "55. It was calculated as "8%". As a result, among the example sentences (1) to (4), the example sentence (3) "Go to Tokyo by Shinkansen" has the lowest text similarity to the input sentence (I) "I want to take a taxi to Kadoma". That is, it can be seen that the meaning and content are the farthest.

なお、この具体例では、入力換言文（Ｂ）、（Ｃ）の指標Ａ３は、入力換言文（Ａ）の指標Ａ３と同じ値になる。なぜなら、入力換言文（Ｂ）、（Ｃ）についても、指標Ａ３を算出する際に、用例文（１）〜（４）が用いられるからである。 In this specific example, the index A3 of the input paraphrases (B) and (C) has the same value as the index A3 of the input paraphrase (A). This is because, for the input paraphrase sentences (B) and (C), the example sentences (1) to (4) are used when calculating the index A3.

このように、指標Ａ３の大きな用例文を抽出することで、入力文とは文構造は類似するが意味内容が離れた用例文を抽出することができる。その結果、多様な用例対訳をユーザに提示できる。 In this way, by extracting a large example sentence of the index A3, it is possible to extract an example sentence having a sentence structure similar to that of the input sentence but having a different meaning and content. As a result, various example translations can be presented to the user.

次に、抽出部６は、入力換言文（Ａ）〜（Ｃ）と用例文（１）〜（４）との指標Ａ４をそれぞれ算出する。この具体例では、３×４＝１２個の指標Ａ４が算出され、それぞれの値は表Ｈ１に示す通りである。 Next, the extraction unit 6 calculates the indexes A4 of the input paraphrase sentences (A) to (C) and the example sentences (1) to (4), respectively. In this specific example, 3 × 4 = 12 indexes A4 are calculated, and their respective values are as shown in Table H1.

次に、抽出部６は、指標Ａ１×指標Ａ２×指標Ａ３×指標Ａ４により用例文（１）〜（４）の総合評価値「％」を算出する。この具体例では、用例文（１）〜（４）の順で高い（大きな）総合評価値が得られている。なお、この具体例では、用例文（１）〜（４）は、入力換言文（Ａ）〜（Ｃ）と文構造が同じであるので、用例文（１）〜（４）の指標Ａ１は全て１００％とされている。 Next, the extraction unit 6 calculates the comprehensive evaluation value “%” of the example sentences (1) to (4) from the index A1 × index A2 × index A3 × index A4. In this specific example, higher (larger) comprehensive evaluation values are obtained in the order of example sentences (1) to (4). In this specific example, the example sentences (1) to (4) have the same sentence structure as the input paraphrase sentences (A) to (C), so that the index A1 of the example sentences (1) to (4) is All are said to be 100%.

次に、抽出部６は、総合評価値が高い順に上位ｎ個の用例文を抽出し、抽出したｎ個の用例文を含むｎ個の用例対訳を用例対訳ＤＢ４から抽出する。例えば、ｎ＝２であるならば、抽出部６は、用例文（１）、（２）を含む２つの用例対訳を抽出する。 Next, the extraction unit 6 extracts the top n example sentences in descending order of the overall evaluation value, and extracts n example translations including the extracted n example sentences from the example translation DB4. For example, if n = 2, the extraction unit 6 extracts two example translations including the example sentences (1) and (2).

次に、抽出部６は、抽出した用例文において、指標Ａ４（テキスト類似度）が最大の入力換言文を換言抽出文として抽出する。ここでは、用例文（１）、（２）が抽出されているので、用例文（１）において、指標Ａ４が最大の入力換言文（Ａ）と、用例文（２）において、指標Ａ４が最大の入力換言文（Ｂ）とが換言抽出文として抽出される。 Next, the extraction unit 6 extracts the input paraphrase sentence having the maximum index A4 (text similarity) as the paraphrase extraction sentence in the extracted example sentence. Here, since the example sentences (1) and (2) are extracted, the input paraphrase sentence (A) having the maximum index A4 in the example sentence (1) and the maximum index A4 in the example sentence (2) are the maximum. The input paraphrase sentence (B) of is extracted as a paraphrase extract sentence.

次に、翻訳支援装置１のフローチャートについて説明する。図８は、本開示の実施の形態に係る翻訳支援装置１の処理の一例を示すフローチャートである。 Next, the flowchart of the translation support device 1 will be described. FIG. 8 is a flowchart showing an example of processing of the translation support device 1 according to the embodiment of the present disclosure.

まず、入力部２は、ユーザからの操作を受け付けて、入力文を取得する（Ｓ１）。ここでは、例えば入力文（Ｉ）「門真までタクシーにする」が取得されたとする。 First, the input unit 2 receives an operation from the user and acquires an input sentence (S1). Here, for example, it is assumed that the input sentence (I) "Take a taxi to Kadoma" is acquired.

次に、用例一致判定部３は、入力文（Ｉ）と一致する用例文が用例対訳ＤＢ４に記憶されているか否かを判定する（Ｓ２）。ここで、入力文（Ｉ）に一致する用例文が用例対訳ＤＢ４にあれば（Ｓ２でＹＥＳ）、用例一致判定部３は、一致する用例文の用例対訳を用例対訳ＤＢ４から抽出し、出力部９は、抽出された用例対訳を表示する（Ｓ３）。 Next, the example match determination unit 3 determines whether or not an example sentence matching the input sentence (I) is stored in the example parallel translation DB 4 (S2). Here, if there is an example sentence matching the input sentence (I) in the example parallel translation DB4 (YES in S2), the example match determination unit 3 extracts the example parallel translation of the matching example sentence from the example parallel translation DB4 and outputs the output unit. 9 displays the extracted example parallel translation (S3).

一方、入力文（Ｉ）に一致する用例文が用例対訳ＤＢ４に記憶されていなければ（Ｓ２でＮＯ）、処理はＳ４に進む。 On the other hand, if the example sentence matching the input sentence (I) is not stored in the example bilingual translation DB4 (NO in S2), the process proceeds to S4.

Ｓ４では、換言文生成部５は、入力文（Ｉ）を上述の第１〜第４換言ルールを用いて換言することで複数の入力換言文を生成する（Ｓ４）。これにより、例えば、上述した入力換言文（Ａ）〜（Ｃ）が生成される。 In S4, the paraphrase sentence generation unit 5 generates a plurality of input paraphrase sentences by paraphrasing the input sentence (I) using the above-mentioned first to fourth paraphrase rules (S4). As a result, for example, the above-mentioned input paraphrase sentences (A) to (C) are generated.

次に、抽出部６は、入力換言文（Ａ）〜（Ｃ）と用例対訳ＤＢ４に記憶された用例文とを比較することで、上述した総合評価値を算出し、算出した総合評価値が大きい順にｎ個の用例文を抽出することで、ｎ個の用例対訳を抽出する（Ｓ５）。これにより、例えば、上述した２つの用例文（１）、（２）とそれを含む用例対訳とが抽出される。 Next, the extraction unit 6 calculates the above-mentioned comprehensive evaluation value by comparing the input paraphrase sentences (A) to (C) with the example sentences stored in the example parallel translation DB4, and the calculated comprehensive evaluation value is obtained. By extracting n example sentences in descending order, n example translations are extracted (S5). As a result, for example, the above-mentioned two example sentences (1) and (2) and the example bilingual translation including them are extracted.

次に、抽出部６は、Ｓ４で生成された入力換言文から、Ｓ５で抽出したｎ個の用例文のそれぞれについてテキスト類似度が最大の入力換言文を抽出することで、ｎ個の換言抽出文を抽出する（Ｓ６）。これにより、例えば、上述した２つの入力換言文（Ａ）、（Ｂ）が換言抽出文として抽出される。 Next, the extraction unit 6 extracts n paraphrases from the input paraphrases generated in S4 by extracting the input paraphrases having the maximum text similarity for each of the n example sentences extracted in S5. Extract the sentence (S6). As a result, for example, the above-mentioned two input paraphrase sentences (A) and (B) are extracted as paraphrase extraction sentences.

次に、機械翻訳部７は、Ｓ６で抽出されたｎ個の換言抽出文を機械翻訳することで、ｎ個の換言翻訳文を生成すると共に、Ｓ１で取得された入力文を機械翻訳することで、入力翻訳文を生成する（Ｓ７）。これにより、例えば、上述した２つの入力換言文（Ａ）、（Ｂ）の換言翻訳文と入力翻訳文とが生成される。 Next, the machine translation unit 7 machine-translates the n paraphrase extract sentences extracted in S6 to generate n paraphrase translation sentences and machine translates the input sentence acquired in S1. Then, an input translated sentence is generated (S7). As a result, for example, the above-mentioned two input paraphrase sentences (A) and (B) paraphrase translation sentences and input translation sentences are generated.

次に、信頼度付与部８は、Ｓ７で生成された入力翻訳文及びｎ個の換言翻訳文の翻訳信頼度を算出する（Ｓ８）。次に、出力部９は、Ｓ５で抽出された用例対訳と、Ｓ７で生成された入力翻訳文及び換言翻訳文と、Ｓ８で算出された翻訳信頼度等を含む翻訳結果を出力画像５００を表示する（Ｓ９）。 Next, the reliability imparting unit 8 calculates the translation reliability of the input translation sentence and n paraphrase translation sentences generated in S7 (S8). Next, the output unit 9 displays the output image 500 of the translation result including the example translation extracted in S5, the input translation sentence and the paraphrase translation sentence generated in S7, and the translation reliability calculated in S8. (S9).

図９は、図８のＳ５の処理の詳細の一例を示すフローチャートである。ループＬ５は、Ｓ４で生成された全入力換言文のうちの１の入力換言文と、用例対訳ＤＢ４に記憶された全用例対訳のうちの１の用例対訳との組のそれぞれについて、Ｓ５０１の処理を繰り返すループである。ループＬ５は、Ｓ４で生成された全入力換言文と用例対訳ＤＢ４に記憶された全用例対訳とについてＳ５０１の処理が実行されると終了する。 FIG. 9 is a flowchart showing an example of details of the process of S5 of FIG. In the loop L5, the processing of S501 is performed for each of the pair of the input paraphrase sentence of 1 out of all the input paraphrase sentences generated in S4 and the example parallel translation of 1 of all the example translations stored in the example translation DB4. It is a loop that repeats. The loop L5 ends when the processing of S501 is executed for all the input paraphrases generated in S4 and all the example translations stored in the example translation DB4.

Ｓ５０１では、抽出部６は、１の入力換言文と１の用例文との組に対する指標Ａ１〜Ａ４を算出する。また、Ｓ５０１では、抽出部６は、算出した指標Ａ１〜Ａ４から１の組に対する総合評価値を算出する。 In S501, the extraction unit 6 calculates the indexes A1 to A4 for the set of the input paraphrase sentence of 1 and the example sentence of 1. Further, in S501, the extraction unit 6 calculates the comprehensive evaluation value for the set of the calculated indexes A1 to A4 to 1.

Ｓ４で入力換言文（Ａ）〜（Ｃ）が生成されたとすると、まず、入力換言文（Ａ）について、用例対訳ＤＢ４に記憶された全用例文とのそれぞれの総合評価値が算出され、次に、入力換言文（Ｂ）について、用例対訳ＤＢ４に記憶された全用例文とのそれぞれの総合評価値が算出され、次に、入力換言文（Ｃ）について、用例対訳ＤＢ４に記憶された全用例文とのそれぞれの総合評価値が算出される。 Assuming that the input paraphrases (A) to (C) are generated in S4, first, for the input paraphrase (A), the comprehensive evaluation value of each of the input paraphrases (A) and all the example sentences stored in the example parallel translation DB4 is calculated, and then In addition, the comprehensive evaluation value of each of the input paraphrase sentences (B) and all the example sentences stored in the example bilingual translation DB4 is calculated, and then the input paraphrase sentence (C) is all stored in the example bilingual translation DB4. Each comprehensive evaluation value with the example sentence is calculated.

Ｓ５０２では、抽出部６は、総合評価値が上位ｎ個の用例文と、ｎ個の用例文に対応するｎ個の用例対訳を抽出する。 In S502, the extraction unit 6 extracts n example sentences having the highest overall evaluation value and n example translations corresponding to the n example sentences.

図１０は、図８のＳ６の処理の詳細の一例を示すフローチャートである。ループＬ６１は、Ｓ５で抽出されたｎ個の用例文のうち１の用例文（ｉ）毎に実行されるループである。ｉは、ｎ個の用例文のうちの１の用例文を特定するインデックスであり、１以上、ｎ以下の整数である。ループＬ６１の終了条件は、ｎ個の用例文に対する処理が終了したこと、すなわち、ｉ＝ｎになったことである。ループＬ６２は、ループＬ６１の１のループにおいて、１の用例文（ｉ）と全入力換言文のそれぞれとの組についてＳ６０１〜Ｓ６０２の処理を繰り返すループである。ループＬ６２の終了条件は、１の用例文（ｉ）と全入力換言文のそれぞれとに対してＳ６０１〜Ｓ６０２の処理が終了することである。 FIG. 10 is a flowchart showing an example of details of the process of S6 of FIG. The loop L61 is a loop executed for each example sentence (i) of one of the n example sentences extracted in S5. i is an index that identifies one of the n example sentences, and is an integer of 1 or more and n or less. The end condition of the loop L61 is that the processing for n example sentences is completed, that is, i = n. The loop L62 is a loop in which the processes of S601 to S602 are repeated for each pair of the example sentence (i) of 1 and the all-input paraphrase sentence in the loop of 1 of the loop L61. The termination condition of the loop L62 is that the processing of S601 to S602 is terminated for each of the example sentence (i) of 1 and all the input paraphrase sentences.

Ｓ６０１では、抽出部６は、１の用例文（ｉ）と全入力換言文のうちの１の入力換言文との指標Ａ４を算出する。次に、抽出部６は、算出した指標Ａ４が１の用例文（ｉ）のうちで最大であれば（Ｓ６０１でＹＥＳ）、その入力換言文を換言抽出文（ｉ）としてメモリに保持する（Ｓ６０２）。 In S601, the extraction unit 6 calculates the index A4 of the example sentence (i) of 1 and the input paraphrase sentence of 1 of all the input paraphrase sentences. Next, if the calculated index A4 is the largest of the example sentences (i) of 1 (YES in S601), the extraction unit 6 holds the input paraphrase sentence as the paraphrase extraction sentence (i) in the memory ( S602).

一方、算出した指標Ａ４が１の用例文（ｉ）のうちで最大でなければ（Ｓ６０１でＮＯ）、Ｓ６０２の処理は行われずループＬ６２が継続される。ループＬ６２を繰り返すことにより、１の用例文（ｉ）に対して、指標Ａ４が最大の入力換言文（ｉ）が全入力換言文の中から決定される。そして、ループＬ６１により、ｎ個の用例文（ｉ）に対して指標Ａ４が最大のｎ個の入力換言文（ｉ）が抽出される。 On the other hand, if the calculated index A4 is not the maximum among the example sentences (i) of 1, the processing of S602 is not performed and the loop L62 is continued. By repeating the loop L62, the input paraphrase sentence (i) having the maximum index A4 is determined from all the input paraphrase sentences for the example sentence (i) of 1. Then, the loop L61 extracts n input paraphrase sentences (i) having the maximum index A4 for n example sentences (i).

例えば、Ｓ４で入力換言文（Ａ）〜（Ｃ）が生成され、Ｓ５で用例文（１）〜（４）が抽出されたとする。この場合、まず、用例文（１）について、入力換言文（Ａ）〜（Ｃ）の中から指標Ａ４が最大の入力換言文が換言抽出文（１）として抽出され、次に、用例文（２）について、入力換言文（Ａ）〜（Ｃ）の中から指標Ａ４が最大の入力換言文が換言抽出文（２）として抽出されるというようにして、４個の換言抽出文が抽出される。 For example, it is assumed that the input paraphrase sentences (A) to (C) are generated in S4 and the example sentences (1) to (4) are extracted in S5. In this case, first, regarding the example sentence (1), the input paraphrase sentence having the maximum index A4 is extracted as the paraphrase extraction sentence (1) from the input paraphrase sentences (A) to (C), and then the example sentence ( Regarding 2), four paraphrase extraction sentences are extracted so that the input paraphrase sentence having the maximum index A4 is extracted as the paraphrase extraction sentence (2) from the input paraphrase sentences (A) to (C). NS.

このように、本実施の形態によれば、単に入力換言文の翻訳文を提示するのではない。すなわち、本実施の形態では、用例対訳ＤＢ４に記憶された用例文のうち入力換言文に対する総合評価値が基準値以上のｎ個の用例文が抽出されると共に、抽出されたｎ個の用例文と類似するｎ個の入力換言文が抽出される。そして、抽出されたｎ個の入力換言文を機械翻訳したｎ個の換言翻訳文と、抽出されたｎ個の用例文のｎ個の用例対訳とが提示される。 As described above, according to the present embodiment, the translated sentence of the input paraphrase sentence is not simply presented. That is, in the present embodiment, among the example sentences stored in the example bilingual translation DB4, n example sentences whose comprehensive evaluation value for the input paraphrase sentence is equal to or more than the reference value are extracted, and the extracted n example sentences are extracted. N similar input paraphrases are extracted. Then, n paraphrase translations obtained by machine-translating the extracted n input paraphrases and n example parallel translations of the extracted n example sentences are presented.

これにより、入力文又はその類似文の翻訳文を生成する際に用いられる知識空間が広範囲に使用され、ユーザにとって有用な翻訳結果を提示できる。 As a result, the knowledge space used when generating a translated sentence of an input sentence or a similar sentence thereof is widely used, and a translation result useful for the user can be presented.

また、本実施の形態は、入力文又はその類似文の翻訳文を高信頼度で生成することを要求していないので、その要求に応えられるような、幅広く豊富な知識データを備える知識空間を用いる必要はない。したがって、本実施の形態は、知識空間を増強させなくとも、ユーザにとって有用な翻訳結果を提示できる。 Further, since the present embodiment does not require that a translated sentence of an input sentence or a similar sentence is generated with high reliability, a knowledge space having a wide range of abundant knowledge data that can meet the request is provided. There is no need to use it. Therefore, this embodiment can present a translation result useful for the user without enhancing the knowledge space.

また、本実施の形態は、抽出された用例文と類似する入力換言文の翻訳文が提示されるので、入力文とは関連性の低い入力換言文の翻訳結果が提示されることを防止できる。 Further, in the present embodiment, since the translated sentence of the input paraphrase sentence similar to the extracted example sentence is presented, it is possible to prevent the translated result of the input paraphrase sentence which is less related to the input sentence is presented. ..

なお、本開示は、以下の態様が採用できる。 The following aspects can be adopted in the present disclosure.

（１）上記実施の形態では、出力部９は、出力画像５００に示すような画像を用いて用例対訳及び入力換言文の翻訳結果等を表示したが、本開示はこれに限定されず、出力部９は、出力画像５００に含まれる内容を音声で出力してもよい。この場合、出力部９はスピーカで構成される。 (1) In the above embodiment, the output unit 9 displays the translation result of the example parallel translation and the input paraphrase sentence using the image as shown in the output image 500, but the present disclosure is not limited to this, and the output is not limited to this. The unit 9 may output the content included in the output image 500 by voice. In this case, the output unit 9 is composed of a speaker.

（２）図７に示す出力画像５００は、一例であり、本開示では、図７に示すいずれかの項目が出力画像５００から省かれてもよい。例えば、翻訳支援情報表示欄５３０において、参考用例表示欄５３１が省かれてもよいし、参考翻訳表示欄５３２が省かれてもよい。 (2) The output image 500 shown in FIG. 7 is an example, and in the present disclosure, any item shown in FIG. 7 may be omitted from the output image 500. For example, in the translation support information display field 530, the reference example display field 531 may be omitted, or the reference translation display field 532 may be omitted.

（３）図７に示す出力画像５００において、全ての翻訳支援情報表示欄５２０を一度に表示することができない場合、出力部９は、出力画像５００をスクロール表示させればよい。これにより、表示装置の表示面積が小さい場合において、全ての翻訳支援情報表示欄５２０をユーザが閲覧できなくなることを防止できる。 (3) In the output image 500 shown in FIG. 7, when all the translation support information display fields 520 cannot be displayed at once, the output unit 9 may scroll the output image 500. As a result, when the display area of the display device is small, it is possible to prevent the user from being unable to view all the translation support information display fields 520.

（４）図７に示す出力画像５００において、換言抽出文（一以上の第２文）及び換言翻訳文（第５文）は表示されなくてもよい。 (4) In the output image 500 shown in FIG. 7, the paraphrase extract sentence (one or more second sentences) and the paraphrase translation sentence (fifth sentence) may not be displayed.

本開示は、知識空間を増強することなく、ユーザにとって有用な翻訳文を提示できるので、自動翻訳サービスを提供する技術分野にとって有用である。 This disclosure is useful for the technical field of providing an automatic translation service because it is possible to present a translated sentence useful to the user without enhancing the knowledge space.

Ａ１，Ａ２，Ａ３，Ａ４指標
Ｂ１入力文
Ｃ１，Ｃ２，Ｃ３入力換言文
Ｄ１，Ｄ２，Ｄ３，Ｄ４，Ｄ５用例文
５１１文脈類似語ＤＢ
５１２共起関係ＤＢ
５１３含意関係ＤＢ
５１４上位下位関係ＤＢ
１翻訳支援装置
２入力部
３用例一致判定部
４用例対訳ＤＢ
５換言文生成部
６抽出部
７機械翻訳部
８信頼度付与部
９出力部
５１換言ＤＢ記憶部
５２換言候補生成部
５３換言文識別部
５００出力画像
５１０入力文表示欄
５２０，５３０，５４０翻訳支援情報表示欄 A1, A2, A3, A4 Index B1 Input sentence C1, C2, C3 Input paraphrase sentence D1, D2, D3, D4, D5 Example sentence 511 Context-like word DB
512 Co-occurrence relational database
513 Implications DB
514 Upper-lower relational database
1 Translation support device 2 Input unit 3 Example match judgment unit 4 Example translation DB
5 Paraphrase generation unit 6 Extraction unit 7 Machine translation unit 8 Reliability addition unit 9 Output unit 51 Paraphrase DB storage unit 52 Paraphrase candidate generation unit 53 Paraphrase sentence identification unit 500 Output image 510 Input sentence display field 520, 530, 540 Translation support Information display field

Claims

It ’s a way to provide a translation,
Obtain the first sentence written in the first language to be translated via the user's terminal,
It is determined whether or not the first sentence is included in the database containing a plurality of pairs of the sentence described in the first language and the bilingual sentence described in the second language.
When it is determined that the first sentence is not included in the database, a plurality of second sentences in which one or more words constituting the first sentence are replaced based on a predetermined rule are generated.
The degree of syntactic matching between the plurality of second sentences and the plurality of sentences described in the first language included in the database is calculated.
One or more third sentences described in the first language included in the database whose calculated degree of matching is equal to or more than the threshold value are extracted.
In the database, one or more fourth sentences described in the second language, which is a bilingual sentence of the one or more third sentences, are extracted.
One or more of the second sentences of the plurality of second sentences are machine-translated into the second language to generate one or more fifth sentences.
At least one of the one or more fourth sentences and the one or more fifth sentences is displayed on the user's terminal as a parallel translation reference of the first sentence.
Method.

The degree of agreement is calculated based on a first index indicating the text similarity between the plurality of second sentences and the plurality of sentences included in the database.
The method according to claim 1.

The degree of coincidence is a sentence having a sentence structure that matches or is similar to the plurality of second sentences among a plurality of sentences included in the database, and a sentence having a smaller text similarity with the first sentence. Calculated based on a second index that shows a large value,
The method according to claim 1 or 2.

The degree of agreement is calculated based on a third index indicating the similarity of the sentence structure between the plurality of second sentences and the plurality of sentences included in the database.
The method according to any one of claims 1 to 3.

The degree of matching is calculated based on a fourth index showing a larger value as the number of matching parts of speech increases in the plurality of second sentences and the plurality of sentences included in the database.
The method according to any one of claims 1 to 4.

The first index shows a larger value as the second sentence has more replacement points.
The method according to claim 2.

The one or more second sentences are extracted from the plurality of second sentences based on the text similarity between the plurality of second sentences and the one or more third sentences.
The method according to claim 1.

The predetermined rule is a first paraphrase rule in which the first word contained in the element piece constituting the first sentence is paraphrased by the second word having a context-like relationship.
The first paraphrase rule is defined by a context synonym database in which a predetermined word is associated with a word having a context-like relationship with the predetermined word.
When the first word is registered by referring to the context synonym database, the word associated with the first word in the context synonym database is specified as the second word.
The method according to any one of claims 1 to 7.

The predetermined rule is a second paraphrase rule that paraphrases the first word contained in the element piece constituting the first sentence into the second word having a co-occurrence relationship.
The second paraphrase rule is defined by a co-occurrence relational database in which a predetermined word and a word having a co-occurrence relationship with the predetermined word are associated with each other.
When the first word is registered by referring to the co-occurrence relational database, the word associated with the first word in the co-occurrence relational database is specified as the second word.
The method according to any one of claims 1 to 8.

The predetermined rule is a third paraphrase rule that paraphrases the first word contained in the element piece constituting the first sentence into the second word having an implication relationship.
The third paraphrase rule is defined by an implication relation database in which a predetermined word is associated with a word having an implication relationship with the predetermined word.
When the first word is registered with reference to the implication relation database, the word associated with the first word in the implication relation database is specified as the second word.
The method according to any one of claims 1 to 9.

The predetermined rule is a fourth paraphrase rule that paraphrases the first word contained in the element piece constituting the first sentence into the second word having a higher-lower relationship.
The fourth paraphrase rule is defined by an upper-lower relational database in which a predetermined word is associated with a word having a higher-lower relationship with the predetermined word.
When the first word is registered by referring to the upper-lower relational database, the word associated with the first word in the upper-lower relational database is specified as the second word.
The method according to any one of claims 1 to 10.

A device that provides translated text
A database including a plurality of pairs of the described translated sentence in the description sentence and the second language in a first language,
An input unit that acquires the first sentence written in the first language to be translated via the user's terminal, and
When the first sentence is not included in the database, a paraphrase sentence generation unit that generates a plurality of second sentences in which one or more words constituting the first sentence are replaced based on a predetermined rule, and a paraphrase sentence generation unit.
The degree of syntactic matching between the plurality of second sentences and the plurality of sentences described in the first language included in the database is calculated, and the degree of matching is included in the database having a threshold value or more. An extraction unit that extracts one or more third sentences described in the first language,
In the database, one or more fourth sentences described in the second language, which is a bilingual sentence of the one or more third sentences, are extracted, and one or more second sentences out of the plurality of second sentences are described. Machine translation into a second language is performed to generate one or more fifth sentences, and at least one of the one or more fourth sentences and the one or more fifth sentences is used as a parallel translation reference for the first sentence. A device including a presentation unit to be displayed on the user's terminal.

A program for causing a computer to execute the method according to any one of claims 1 to 11.