JP2021179665A

JP2021179665A - Sentence creation device

Info

Publication number: JP2021179665A
Application number: JP2020083177A
Authority: JP
Inventors: 聡一朗村上; Soichiro Murakami
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2020-05-11
Filing date: 2020-05-11
Publication date: 2021-11-18
Anticipated expiration: 2040-05-11
Also published as: JP7477359B2

Abstract

To provide a sentence creation device capable of more easily creating a bilingual corpus for simultaneous interpretation.SOLUTION: A sentence creation device 1 includes: an acquisition unit 10 for acquiring a first sentence T11 of a first language and a second sentence T21 of a second language being a translation of the first sentence T11; an analysis unit 20 for performing morphological analysis to each of the first sentence T11 and the second sentence T21; a detection unit 30 for detecting all of phrases having morphemes in the second sentence T21; a mapping unit 40 for mapping morphemes of the first sentence T11 and morphemes of the second sentence T21 on the basis of meanings of respective morphemes; a deriving unit 50 for deriving a feature quantity in each phrase of the second sentence T21 on the basis of indexes indicating positions of the morphemes, being mapped to the morphemes of the second sentence T21, of the first sentence T11 in the first sentence T11; and a creation unit 60 for creating a third sentence T31 by arranging phrases of the second sentence T21 on the basis of the feature quantity of each phrase of the second sentence T21.SELECTED DRAWING: Figure 1

Description

本発明は、文章作成装置に関する。 The present invention relates to a writing device.

特許文献１には、第二言語文を第一言語の語順に変換する並替モデル生成装置が記載されている。並替モデル生成装置は、第一言語文を構成する各単語と同じ意味を持つ第二言語文を構成する単語と対応付けた後、第二言語文を構成する各文節を、第一言語文を構成するただ１つの単語に対応付け、その各文節に対応付けられた単語の第一言語文における順序に基づいて第二言語文を構成する文節を並び替える。並替モデル生成装置は、第二言語文を構成する各文節を、第一言語文を構成するただ１つの単語に対応付けるために、第二言語文の文節に含まれる機能語の対応付けを消し、一番確からしい対応付けがなされている自立語の対応付けを残す。 Patent Document 1 describes a rearrangement model generator that converts a second language sentence into a word order of the first language. The rearrangement model generator associates each word constituting the first language sentence with a word constituting the second language sentence having the same meaning as each word constituting the first language sentence, and then converts each phrase constituting the second language sentence into the first language sentence. Corresponds to only one word that composes, and rearranges the phrases that compose the second language sentence based on the order of the words associated with each phrase in the first language sentence. The sort model generator erases the correspondence of the function words included in the phrase of the second language sentence in order to associate each phrase constituting the second language sentence with the only word constituting the first language sentence. , Leave the correspondence of the independent words that are most likely to be associated.

特開２０１３−１１７８８８号公報Japanese Unexamined Patent Publication No. 2013-117888

ところで、既存の対訳コーパスには逐次通訳に適した２言語間の訳文が集約されているため、同時通訳に適した訳文が集約されていない。既存の対訳コーパスに基づいて構築された翻訳モデルを用いて同時通訳を行った場合、対象の言語の文章が順次入力されてから訳文が出力されるまでの待ち時間が大きくなる可能性がある。 By the way, in the existing bilingual corpus, translations between two languages suitable for sequential interpretation are aggregated, so translations suitable for simultaneous interpretation are not aggregated. When simultaneous interpretation is performed using a translation model constructed based on an existing bilingual corpus, the waiting time from the sequential input of sentences in the target language to the output of the translated text may increase.

ここで、特許文献１に記載の並替モデル生成装置を利用して同時通訳用の対訳コーパスを作成することが考えられる。特許文献１の並替モデル生成装置は、対応付けにおいて文節内の自立語を選択し、当該自立語を基準として文節を並び替えている。したがって、特許文献１の並替モデル生成装置は、第二言語文において自立語又は機能語の判別を行う処理と、一番確からしい対応づけがなされている自立語の対応付けを残す処理とを行うため、処理が煩雑である。このことから、同時通訳用の対訳コーパスをより容易に作成することができる文章作成装置が求められている。 Here, it is conceivable to create a parallel translation corpus for simultaneous interpretation by using the rearrangement model generator described in Patent Document 1. The rearrangement model generation device of Patent Document 1 selects an independent word in a phrase in the mapping, and rearranges the phrase based on the independent word. Therefore, the rearrangement model generator of Patent Document 1 performs a process of discriminating an independent word or a function word in a second language sentence and a process of leaving an association of the independent words with the most probable association. Therefore, the processing is complicated. For this reason, there is a demand for a sentence creation device that can more easily create a bilingual corpus for simultaneous interpretation.

本発明は、同時通訳用の対訳コーパスをより容易に作成することができる文章作成装置を提供することを目的とする。 An object of the present invention is to provide a sentence creation device capable of more easily creating a bilingual corpus for simultaneous interpretation.

本発明の一側面に係る文章作成装置は、第１言語の第１文章、及び第１文章の訳文である第２言語の第２文章を取得する取得部と、第１文章及び第２文章のそれぞれを形態素解析する解析部と、第２文章において形態素を有するすべての文節を検出する検出部と、第１文章の形態素と第２文章の形態素とを各形態素の意味に基づきそれぞれ対応付ける対応付け部と、第２文章の形態素に対応付けられた第１文章における第１文章の形態素の位置を示すインデックスに基づき、第２文章の各文節における特徴量を導出する導出部と、第２文章の各文節の特徴量に基づき、第２文章の文節を配列して第３文章を作成する作成部と、を備える。 The sentence creation device according to one aspect of the present invention includes an acquisition unit for acquiring a first sentence in the first language and a second sentence in the second language which is a translation of the first sentence, and the first sentence and the second sentence. An analysis unit that analyzes each morpheme, a detection unit that detects all clauses that have morphemes in the second sentence, and a mapping unit that associates the morpheme of the first sentence with the morpheme of the second sentence based on the meaning of each morpheme. Based on the index indicating the position of the morpheme of the first sentence in the first sentence associated with the morpheme of the second sentence, the derivation unit for deriving the feature amount in each clause of the second sentence, and each of the second sentences. It is provided with a creation unit for arranging the phrases of the second sentence and creating the third sentence based on the feature amount of the phrase.

この文章作成装置によれば、第２文章の各文節の特徴量により第２文章の文節が配列され、第３文章が作成される。当該特徴量は第１文章の形態素のインデックスに基づいて導出されるため、第２文章の文節を配列するまでに複雑な処理を必要せず、第３文章を作成するまでの時間を短縮することができる。第３文章は第１文章の各形態素の位置を考慮した文章となるため、例えば、第３文章は、第１文章を第２言語で同時通訳した文章として作成されうる。よって、同時通訳用の対訳コーパスをより容易に作成することができる。 According to this sentence creation device, the phrases of the second sentence are arranged according to the feature amount of each phrase of the second sentence, and the third sentence is created. Since the feature amount is derived based on the index of the morpheme of the first sentence, complicated processing is not required to arrange the clauses of the second sentence, and the time to create the third sentence is shortened. Can be done. Since the third sentence is a sentence considering the position of each morpheme of the first sentence, for example, the third sentence can be created as a sentence in which the first sentence is simultaneously interpreted in a second language. Therefore, it is possible to more easily create a bilingual corpus for simultaneous interpretation.

本発明によれば、同時通訳用の対訳コーパスをより容易に作成することができる。 According to the present invention, it is possible to more easily create a bilingual corpus for simultaneous interpretation.

図１は、一実施形態に係る文章作成装置の構成を示す図である。FIG. 1 is a diagram showing a configuration of a text creation device according to an embodiment. 図２は、図１に示される解析部による形態素解析処理の一例を示す図である。FIG. 2 is a diagram showing an example of morphological analysis processing by the analysis unit shown in FIG. 図３は、図１に示される検出部による文節区切り処理の一例を示す図である。FIG. 3 is a diagram showing an example of a phrase-separating process by the detection unit shown in FIG. 図４は、図１に示される対応付け部による形態素間の対応付け処理の一例と、図１に示される導出部による特徴量導出処理の一例とを示す図である。FIG. 4 is a diagram showing an example of a morpheme-to-morpheme mapping process by the mapping section shown in FIG. 1 and an example of a feature quantity derivation process by the derivation section shown in FIG. 図５は、図１に示される作成部による並び替え処理の一例を示す図である。FIG. 5 is a diagram showing an example of the sorting process by the creating unit shown in FIG. 図６は、図１に示される文章作成装置で実行される文章作成方法の一連の処理を示すフローチャートである。FIG. 6 is a flowchart showing a series of processes of the sentence creation method executed by the sentence creation device shown in FIG. 図７は、別の実施形態に係る文章作成装置の構成を示す図である。FIG. 7 is a diagram showing a configuration of a text creation device according to another embodiment. 図８は、一実施形態に係る文章作成装置のハードウェア構成を示す図である。FIG. 8 is a diagram showing a hardware configuration of the text creation device according to the embodiment.

添付図面を参照しながら本発明の実施形態を説明する。可能な場合には、同一の部分には同一の符号を付して、重複する説明を省略する。 An embodiment of the present invention will be described with reference to the accompanying drawings. When possible, the same parts are designated by the same reference numerals, and duplicate description is omitted.

図１は、一実施形態に係る文章作成装置の構成を示す図である。図１に示される文章作成装置１は、第１言語の第１文章Ｔ１１の語順に第２言語の第２文章Ｔ２１を並び替えることによって、第２言語の第３文章Ｔ３１を作成する装置である。第２文章Ｔ２１は、第１文章Ｔ１１を第２言語に翻訳した文章（訳文）である。第１文章Ｔ１１、第２文章Ｔ２１、及び第３文章Ｔ３１は、例えば、テキストデータである。 FIG. 1 is a diagram showing a configuration of a text creation device according to an embodiment. The sentence creation device 1 shown in FIG. 1 is a device that creates a third sentence T31 in a second language by rearranging the second sentence T21 in the second language in the word order of the first sentence T11 in the first language. .. The second sentence T21 is a sentence (translated sentence) obtained by translating the first sentence T11 into a second language. The first sentence T11, the second sentence T21, and the third sentence T31 are, for example, text data.

第１言語及び第２言語は、それぞれ互いに異なる言語である。例えば、第１言語は英語であり、第２言語は日本語である。第１言語で構成された文章の文型（語順）は、第２言語で構成された文章の文型（語順）と異なっていてもよい。例えば、第１言語の文章と第２言語の文章とでは、主語（Ｓ：Ｓｕｂｊｅｃｔ）、目的語（Ｏ：Ｏｂｊｅｃｔ）、及び動詞（Ｖ：Ｖｅｒｂ）の順序が異なる。例えば、第１言語の文章における典型的な文型はＳＶＯ型であり、第２言語の文章における典型的な文型はＳＯＶ型である。例えば、第３文章Ｔ３１の文型は、第２文章Ｔ２１の文型とは異なり、第１文章Ｔ１１の文型と同一である。第１文章Ｔ１１が補語（Ｃ：Ｃｏｍｐｌｅｍｅｎｔ）又は修飾語（Ｍ：Ｍｏｄｉｆｉｅｒ）を含む場合においても、第３文章Ｔ３１の文型は、第１文章Ｔ１１の文型と同一である。 The first language and the second language are different languages from each other. For example, the first language is English and the second language is Japanese. The sentence pattern (word order) of the sentence composed in the first language may be different from the sentence pattern (word order) of the sentence composed in the second language. For example, the order of the subject (S: Subject), the object (O: Object), and the verb (V: Verb) is different between the sentence of the first language and the sentence of the second language. For example, a typical sentence pattern in a sentence in a first language is an SVO type, and a typical sentence pattern in a sentence in a second language is an SOV type. For example, the sentence pattern of the third sentence T31 is different from the sentence pattern of the second sentence T21 and is the same as the sentence pattern of the first sentence T11. Even when the first sentence T11 includes a complement (C: Complement) or a modifier (M: Modifier), the sentence pattern of the third sentence T31 is the same as the sentence pattern of the first sentence T11.

文章作成装置１は、例えば、サーバ装置によって実現される。文章作成装置１は、複数のサーバ装置、即ち、コンピュータシステムによって実現されてもよい。文章作成装置１は、文章作成装置１の外部に設けられた第１対訳コーパス８２及び第２対訳コーパス８４と通信可能に構成されている。 The text creation device 1 is realized by, for example, a server device. The text creation device 1 may be realized by a plurality of server devices, that is, a computer system. The text creation device 1 is configured to be communicable with the first translation corpus 82 and the second translation corpus 84 provided outside the text creation device 1.

第１対訳コーパス８２及び第２対訳コーパス８４は、それぞれ情報を記憶するデータベースとして機能する機能部である。第１対訳コーパス８２及び第２対訳コーパス８４は、例えば、メモリ及びストレージの少なくとも一方を含むデータベース、サーバ、又はその他の適切な媒体によってそれぞれ実現される。 The first translation corpus 82 and the second translation corpus 84 are functional units that function as databases for storing information, respectively. The first bilingual corpus 82 and the second bilingual corpus 84 are each implemented, for example, by a database, server, or other suitable medium containing at least one of memory and storage.

第１対訳コーパス８２は、例えば、逐次通訳用の対訳コーパスである。第２対訳コーパス８４は、例えば、同時通訳用の対訳コーパスである。対訳コーパスとは、機械翻訳の学習データとして利用するために構築された、互いに異なる言語の文と文とが対訳の形でまとめられた対訳データのコーパス（文のデータベース）である。つまり、第１対訳コーパス８２は、複数の第１対訳データを含み、各第１対訳データは、互いに対応付けられた第１文章Ｔ１１と、当該第１文章の訳文である第２文章Ｔ２１との組み合わせである。第２対訳コーパス８４は、複数の第２対訳データを含み、各第２対訳データは、互いに対応付けられた第１文章Ｔ１１と、当該第１文章の訳文である第２文章Ｔ２１の文節を並び替えた第３文章Ｔ３１との組み合わせである。 The first bilingual corpus 82 is, for example, a bilingual corpus for sequential interpretation. The second bilingual corpus 84 is, for example, a bilingual corpus for simultaneous interpretation. A bilingual corpus is a corpus (sentence database) of bilingual data in which sentences and sentences in different languages are collected in the form of bilingual translation, which is constructed for use as learning data for machine translation. That is, the first bilingual corpus 82 includes a plurality of first bilingual data, and each of the first bilingual data includes a first sentence T11 associated with each other and a second sentence T21 which is a translation of the first sentence. It is a combination. The second bilingual corpus 84 includes a plurality of second bilingual data, and each second bilingual data has a phrase of the first sentence T11 associated with each other and the phrase of the second sentence T21 which is a translation of the first sentence. It is a combination with the changed third sentence T31.

引き続いて、本実施形態に係る文章作成装置１の機能を説明する。図１に示されるように文章作成装置１は、機能的には、取得部１０と、解析部２０と、検出部３０と、対応付け部４０と、導出部５０と、作成部６０とを備える。 Subsequently, the function of the text creation device 1 according to the present embodiment will be described. As shown in FIG. 1, the sentence creation device 1 functionally includes an acquisition unit 10, an analysis unit 20, a detection unit 30, an association unit 40, a derivation unit 50, and a creation unit 60. ..

取得部１０は、第１文章Ｔ１１及び第２文章Ｔ２１の第１対訳データを取得する機能部である。取得部１０は、第１対訳コーパス８２から情報を取得可能なように構成されている。取得部１０は、例えば、第１対訳コーパス８２から第１対訳データを取得する。取得部１０は、例えば、第２対訳コーパス８４内に記憶されていない第１文章Ｔ１１を含む第１対訳データを、第１対訳コーパス８２から取得する。取得部１０は、取得した第１文章Ｔ１１及び第２文章Ｔ２１を解析部２０に出力する。取得部１０は、作成部６０に第１文章Ｔ１１を出力する。 The acquisition unit 10 is a functional unit that acquires the first bilingual data of the first sentence T11 and the second sentence T21. The acquisition unit 10 is configured to be able to acquire information from the first translation corpus 82. The acquisition unit 10 acquires, for example, the first translation data from the first translation corpus 82. The acquisition unit 10 acquires, for example, the first translation data including the first sentence T11 that is not stored in the second translation corpus 84 from the first translation corpus 82. The acquisition unit 10 outputs the acquired first sentence T11 and second sentence T21 to the analysis unit 20. The acquisition unit 10 outputs the first sentence T11 to the creation unit 60.

解析部２０は、第１文章Ｔ１１及び第２文章Ｔ２１のそれぞれを形態素解析する機能部である。形態素解析とは、例えば、テキストデータから、ある対象言語の文法及び品詞等の情報に基づき、形態素（Morpheme）の列に分割し、各形態素の品詞等を判別する処理である。形態素は、例えば、意味を有する最小の言語単位である。解析部２０は、例えば、公知の形態素解析手法を利用して、第１文章Ｔ１１及び第２文章Ｔ２１のそれぞれを形態素解析する。公知の形態素解析手法とは、例えば、条件付き確率場（CRF：Conditional Random Fields）、隠れマルコフモデル（Hidden Markov Model）、又はリカレントニューラルネットワークに基づく手法である。公知の形態素解析手法は、例えば、ルールベース手法であってもよい。具体的な形態素解析ツールとして、解析部２０は、例えば第１文章Ｔ１１にはＮＬＴＫ（Natural Language Toolkit）を適用し、第２文章Ｔ２１にはＭｅＣａｂ（Yet Another Part-of-Speech and Morphological Analyzer）を適用する。なお、解析部２０は、形態素の代わりに単語を用いて解析処理を実行してもよい。 The analysis unit 20 is a functional unit that morphologically analyzes each of the first sentence T11 and the second sentence T21. The morphological analysis is, for example, a process of dividing text data into columns of morphemes (Morpheme) based on information such as grammar and part of speech of a certain target language, and discriminating the part of speech of each morpheme. A morpheme is, for example, the smallest linguistic unit that has meaning. The analysis unit 20 analyzes each of the first sentence T11 and the second sentence T21 by using, for example, a known morphological analysis method. The known morphological analysis method is, for example, a method based on a conditional random field (CRF), a Hidden Markov Model (Hidden Markov Model), or a recurrent neural network. The known morphological analysis method may be, for example, a rule-based method. As a specific morphological analysis tool, the analysis unit 20 applies, for example, NLTK (Natural Language Toolkit) to the first sentence T11 and MeCab (Yet Another Part-of-Speech and Morphological Analyzer) to the second sentence T21. Apply. The analysis unit 20 may execute the analysis process using words instead of morphemes.

図２は、図１に示される解析部による形態素解析処理の一例を示す図である。図２に示されるように、解析部２０は、取得部１０により取得された第１文章Ｔ１１及び第２文章Ｔ２１のそれぞれを形態素解析する。解析部２０は、例えば、第１文章Ｔ１１に対して形態素解析を行い、第１文章Ｔ１１の文頭から文末に向かって順に形態素２ａ，２ｂ，２ｃ，２ｄ，２ｅ，２ｆ，２ｇ，２ｈ，２ｉを抽出する。解析部２０は、例えば、第２文章Ｔ２１に対して形態素解析を行い、第２文章Ｔ２１の文頭から文末に向かって順に形態素３ａ，３ｂ，３ｃ，３ｄ，３ｅ，３ｆ，３ｇ，３ｈ，３ｉ，３ｊ，３ｋを抽出する。解析部２０は、上述のように第１文章Ｔ１１及び第２文章Ｔ２１をそれぞれ構成するすべての形態素を抽出する。 FIG. 2 is a diagram showing an example of morphological analysis processing by the analysis unit shown in FIG. As shown in FIG. 2, the analysis unit 20 morphologically analyzes each of the first sentence T11 and the second sentence T21 acquired by the acquisition unit 10. For example, the analysis unit 20 performs morphological analysis on the first sentence T11, and performs morphological elements 2a, 2b, 2c, 2d, 2e, 2f, 2g, 2h, 2i in order from the beginning of the sentence to the end of the first sentence T11. Extract. For example, the analysis unit 20 performs morphological analysis on the second sentence T21, and the morphemes 3a, 3b, 3c, 3d, 3e, 3f, 3g, 3h, 3i, in order from the beginning to the end of the second sentence T21. Extract 3j and 3k. The analysis unit 20 extracts all the morphemes constituting the first sentence T11 and the second sentence T21 as described above.

解析部２０は、形態素２ａ〜２ｉに対してインデックス２ｐを付与する。当該インデックス２ｐは第１文章Ｔ１１における形態素２ａ〜２ｉにおける形態素の位置（順番）を示す。インデックス２ｐは、第１文章Ｔ１１の文頭から文末に向かって増加する昇順の番号であってもよいし、第１文章Ｔ１１の文頭から文末に向かって減少する降順の番号であってもよい。本実施形態では、解析部２０は、当該インデックス２ｐとして、形態素２ａ〜２ｉのそれぞれに対して昇順の番号を付与する。解析部２０は、例えば、第１文章Ｔ１１において文頭に位置する形態素２ａに「０」を付与する。解析部２０は、以降の各形態素２ｂ〜２ｉに対して、それぞれ直前の形態素に付与された番号に１を加えた番号を付与する。 The analysis unit 20 assigns an index 2p to the morphemes 2a to 2i. The index 2p indicates the position (order) of the morphemes in the morphemes 2a to 2i in the first sentence T11. The index 2p may be an ascending number increasing from the beginning of the first sentence T11 toward the end of the sentence, or a descending number decreasing from the beginning of the first sentence T11 toward the end of the sentence. In the present embodiment, the analysis unit 20 assigns ascending numbers to each of the morphemes 2a to 2i as the index 2p. The analysis unit 20 assigns "0" to the morpheme 2a located at the beginning of the sentence in the first sentence T11, for example. The analysis unit 20 assigns a number obtained by adding 1 to the number assigned to the immediately preceding morpheme for each of the subsequent morphemes 2b to 2i.

解析部２０は、形態素２ａ〜２ｉの配列、及びインデックス２ｐを第１解析文章Ｔ１２として対応付け部４０に出力する。解析部２０は、形態素３ａ〜３ｋの配列を第２解析文章Ｔ２２として検出部３０に出力する。 The analysis unit 20 outputs the array of the morphemes 2a to 2i and the index 2p to the association unit 40 as the first analysis sentence T12. The analysis unit 20 outputs the arrangement of the morphemes 3a to 3k to the detection unit 30 as the second analysis sentence T22.

検出部３０は、第２文章Ｔ２１におけるすべての文節を検出する機能部である。検出部３０は、例えば、第２解析文章Ｔ２２を用いて、文節間の区切りを行う文節区切り処理を行うことで第２文章Ｔ２１の文節を検出する。検出部３０による文節区切り処理により第２解析文章Ｔ２２の文節間が区切られることで、互いに隣り合う２つの区切りの間、文頭と最初の区切りとの間、及び最後の区切りと文末との間の語句がそれぞれ文節として検出される。なお、最初の区切りとは、複数の区切りのうちの文頭に最も近い区切りである。最後の区切りとは、複数の区切りのうちの文末に最も近い区切りである。検出部３０は、例えば、公知の文節区切り手法を利用して、文節区切り処理を実行する。公知の文節区切り手法とは、例えば、サポートベクターマシン（Support Vector Machine）、又はニューラルネットワークに基づく手法である。具体的な文節区切りツールとして、検出部３０は、例えばＣａｂｏＣｈａ（Yet Another Japanese Dependency Structure Analyzer）を用いる。 The detection unit 30 is a functional unit that detects all the clauses in the second sentence T21. The detection unit 30 detects the phrase of the second sentence T21 by, for example, using the second analysis sentence T22 and performing a phrase delimiter process for dividing between the phrases. By separating the clauses of the second analysis sentence T22 by the clause break processing by the detection unit 30, between two adjacent breaks, between the beginning and the first break, and between the last break and the end of the sentence. Each phrase is detected as a phrase. The first delimiter is the delimiter closest to the beginning of the sentence among the plurality of delimiters. The last delimiter is the delimiter closest to the end of the sentence among the delimiters. The detection unit 30 executes the phrase-breaking process by using, for example, a known phrase-breaking method. The known phrase-separating method is, for example, a method based on a support vector machine or a neural network. As a specific phrase delimiter tool, the detection unit 30 uses, for example, CaboCha (Yet Another Japanese Dependency Structure Analyzer).

図３は、図１に示される検出部による文節区切り処理の一例を示す図である。図３に示されるように、検出部３０は、例えば、第２解析文章Ｔ２２に対して文節区切り処理を行う。具体的には、検出部３０は、第２解析文章Ｔ２２に区切り４を挿入することによって第２解析文章Ｔ２２を区切る。検出部３０は、文頭と最初の区切り４との間に位置する形態素の配列（形態素３ａ，３ｂ）を文節５ａとして検出する。検出部３０は、最初の区切り４と２番目の区切り４との間に位置する形態素の配列（形態素３ｃ，３ｄ，３ｅ）を文節５ｂとして検出する。検出部３０は、文頭から２番目の区切り４と最後の区切り４との間に位置する形態素の配列（形態素３ｆ，３ｇ）を文節５ｃとして検出する。検出部３０は、最後の区切り４と文末との間に位置する形態素の配列（形態素３ｈ，３ｉ，３ｊ，３ｋ）を文節５ｄとして検出する。上述のように、検出部３０は、第２解析文章Ｔ２２においてすべての文節５ａ〜５ｄを検出する。 FIG. 3 is a diagram showing an example of a phrase-separating process by the detection unit shown in FIG. As shown in FIG. 3, the detection unit 30 performs a phrase delimiter process on the second analysis sentence T22, for example. Specifically, the detection unit 30 divides the second analysis sentence T22 by inserting the division 4 into the second analysis sentence T22. The detection unit 30 detects an array of morphemes (morphemes 3a, 3b) located between the beginning of the sentence and the first delimiter 4 as a clause 5a. The detection unit 30 detects an array of morphemes (morphemes 3c, 3d, 3e) located between the first delimiter 4 and the second delimiter 4 as clause 5b. The detection unit 30 detects an array of morphemes (morphemes 3f, 3g) located between the second delimiter 4 from the beginning of the sentence and the last delimiter 4 as the clause 5c. The detection unit 30 detects an array of morphemes (morphemes 3h, 3i, 3j, 3k) located between the last delimiter 4 and the end of the sentence as the clause 5d. As described above, the detection unit 30 detects all the clauses 5a to 5d in the second analysis sentence T22.

検出部３０は、文節５ａ〜５ｄの配列を検出文章Ｔ２３として対応付け部４０に出力する。なお、検出部３０は、第２解析文章Ｔ２２に代えて、第２文章Ｔ２１に対して文節区切り処理を実行してもよい。検出部３０は、すべての文節ではなく第２解析文章Ｔ２２のうち一部の文節のみを検出してもよい。 The detection unit 30 outputs the array of the clauses 5a to 5d to the association unit 40 as the detection sentence T23. In addition, the detection unit 30 may execute the phrase delimiter process for the second sentence T21 instead of the second analysis sentence T22. The detection unit 30 may detect only a part of the second analysis sentence T22 instead of all the phrases.

対応付け部４０は、第１文章Ｔ１１の形態素２ａ〜２ｉと第２文章Ｔ２１の形態素３ａ〜３ｋとを各形態素の意味に基づきそれぞれ対応付ける機能部である。対応付け部４０は、例えば、対応付け処理（単語アライメント処理）を行う。単語アライメント処理とは、形態素２ａ〜２ｉと形態素３ａ〜３ｋとの間の対応関係を取る処理である。例えば、単語アライメント処理では、形態素２ａ〜２ｉのうち１つの形態素の意味と形態素３ａ〜３ｋのうち少なくとも１つの形態素の意味とが類似していれば、当該形態素の間に対応関係があると判定される。対応付け部４０は、例えば、公知の単語アライメント手法を利用して、単語アライメント処理を実行する。公知の単語アライメント手法とは、例えば、確率モデル、又はヒューリスティクスに基づくアライメント手法である。具体的な単語アライメントツールとして、検出部３０は、例えばＩＢＭモデル２を用いる。検出部３０は、単語アライメントツールとして、例えば、ＩＢＭモデル１、ＩＢＭモデル３、ＩＢＭモデル４、又はＩＢＭモデル５を用いてもよい。 The association unit 40 is a functional unit that associates the morphemes 2a to 2i of the first sentence T11 with the morphemes 3a to 3k of the second sentence T21 based on the meaning of each morpheme. The mapping unit 40 performs, for example, a mapping process (word alignment process). The word alignment process is a process for establishing a correspondence between morphemes 2a to 2i and morphemes 3a to 3k. For example, in the word alignment process, if the meaning of one of the morphemes 2a to 2i and the meaning of at least one of the morphemes 3a to 3k are similar, it is determined that there is a correspondence between the morphemes. Will be done. The association unit 40 executes the word alignment process by using, for example, a known word alignment method. The known word alignment method is, for example, a probability model or a heuristic-based alignment method. As a specific word alignment tool, the detection unit 30 uses, for example, the IBM model 2. The detection unit 30 may use, for example, an IBM model 1, an IBM model 3, an IBM model 4, or an IBM model 5 as a word alignment tool.

図４は、図１に示される対応付け部による形態素間の対応付け処理の一例と、図１に示される導出部による特徴量導出処理の一例とを示す図である。図４に示されるように、対応付け部４０は、例えば、第１解析文章Ｔ１２及び検出文章Ｔ２３を用いて対応付け処理を実行する。なお、対応付け部４０は、第１解析文章Ｔ１２及び第２解析文章Ｔ２２を用いて対応付け処理を実行してもよい。例えば、形態素３ａ〜３ｋに含まれる１つの形態素に、形態素２ａ〜２ｉのうちの１つの形態素が対応付けられる。形態素２ａ〜２ｉの形態素は、形態素３ａ〜３ｋに含まれる１以上の形態素に対応付けられるが、いずれの形態素にも対応付けられないこともある。 FIG. 4 is a diagram showing an example of a morpheme-to-morpheme mapping process by the mapping section shown in FIG. 1 and an example of a feature quantity derivation process by the derivation section shown in FIG. As shown in FIG. 4, the mapping unit 40 executes the mapping process using, for example, the first analysis sentence T12 and the detection sentence T23. The mapping unit 40 may execute the mapping process using the first analysis sentence T12 and the second analysis sentence T22. For example, one morpheme among the morphemes 2a to 2i is associated with one morpheme included in the morphemes 3a to 3k. The morphemes 2a to 2i are associated with one or more morphemes contained in the morphemes 3a to 3k, but may not be associated with any of the morphemes.

対応付け部４０は、例えば、形態素３ａ〜３ｋの中から、形態素２ｂと意味が類似する形態素を検索し、形態素３ｃが類似すると判定して形態素２ｂと形態素３ｃとを対応付ける。対応付け部４０は、例えば、形態素３ａ〜３ｋの中から、形態素２ａと意味が類似する形態素を検索し、形態素３ｅ及び形態素３ｉが類似すると判定して形態素２ａと形態素３ｅとを対応付けるとともに、形態素２ａと形態素３ｉとを対応付ける。図４に示される例では、対応付け部４０は、例えば、形態素３ａ〜３ｋのすべての形態素を形態素２ａ〜２ｉのいずれかに対応付けている。対応付け部４０は、形態素３ａ〜３ｋのいずれかを形態素２ａ〜２ｉに対応付けなくてもよい。対応付け部４０は、形態素間の対応関係を示す情報、第１解析文章Ｔ１２及び検出文章Ｔ２３を導出部５０に出力する。 For example, the associating unit 40 searches for morphemes having similar meanings to the morphemes 2b from the morphemes 3a to 3k, determines that the morphemes 3c are similar, and associates the morphemes 2b with the morphemes 3c. For example, the associating unit 40 searches for morphemes having similar meanings to the morphemes 2a from the morphemes 3a to 3k, determines that the morphemes 3e and the morphemes 3i are similar, associates the morphemes 2a with the morphemes 3e, and associates the morphemes 2a with the morphemes 3e. The 2a and the morpheme 3i are associated with each other. In the example shown in FIG. 4, the associating unit 40 associates all the morphemes of the morphemes 3a to 3k with any of the morphemes 2a to 2i, for example. The association unit 40 does not have to associate any of the morphemes 3a to 3k with the morphemes 2a to 2i. The association unit 40 outputs information indicating the correspondence between the morphemes, the first analysis sentence T12 and the detection sentence T23 to the derivation unit 50.

導出部５０は、文節５ａ〜５ｄのそれぞれの特徴量を導出する機能部である。当該特徴量は、各文節に含まれる形態素に対応付けられた形態素２ａ〜２ｉのインデックス２ｐに基づき導出される。導出部５０は、文節５ａ〜５ｄのそれぞれについて、文節に含まれる複数の形態素に対応付けられる第１文章Ｔ１１の形態素のインデックス２ｐを抽出し、抽出されたインデックス２ｐの組み合わせのうちの最小値を特徴量として導出する。 The derivation unit 50 is a functional unit that derives the feature quantities of the clauses 5a to 5d. The feature amount is derived based on the index 2p of the morphemes 2a to 2i associated with the morphemes included in each clause. The derivation unit 50 extracts the index 2p of the morpheme of the first sentence T11 associated with the plurality of morphemes included in the clause for each of the clauses 5a to 5d, and sets the minimum value among the extracted combinations of the index 2p. Derived as a feature quantity.

以下、特徴量導出処理について詳細に説明する。図４に示されるように、導出部５０は、形態素３ａ〜３ｋに対応付けられた形態素２ａ〜２ｉのインデックス２ｐを、文節５ａ〜５ｄの１文節ごとに抽出する。例えば、最初の文節である文節５ａは、形態素３ａ及び形態素３ｂを含む。形態素３ａは形態素２ｈと対応付けられており、形態素３ｂは形態素２ｃと対応付けられている。よって、導出部５０は、文節５ａに対して、形態素２ｈのインデックス２ｐである「７」と、形態素２ｃのインデックス２ｐである「２」とを抽出し、これらのインデックス２ｐの組み合わせ（７，２）を抽出量７ａとして得る。対応付け部４０は、文節５ｂ，５ｃ，５ｄに対しても同様の処理を行うことによって、文節５ｂ，５ｃ，５ｄの抽出量７ｂ，７ｃ，７ｄを得る。文節５ｂの抽出量７ｂは（１，２，０）であり、文節５ｃの抽出量７ｃは（５，４）であり、文節５ｄの抽出量７ｄは（３，０，８，８）である。 Hereinafter, the feature quantity derivation process will be described in detail. As shown in FIG. 4, the derivation unit 50 extracts the index 2p of the morphemes 2a to 2i associated with the morphemes 3a to 3k for each clause of the clauses 5a to 5d. For example, the first clause, clause 5a, includes morphemes 3a and morphemes 3b. The morpheme 3a is associated with the morpheme 2h, and the morpheme 3b is associated with the morpheme 2c. Therefore, the derivation unit 50 extracts "7" which is the index 2p of the morpheme 2h and "2" which is the index 2p of the morpheme 2c for the clause 5a, and combines these indexes 2p (7, 2). ) Is obtained as the extraction amount 7a. The association unit 40 performs the same processing on the clauses 5b, 5c, and 5d to obtain the extraction amounts 7b, 7c, and 7d of the clauses 5b, 5c, and 5d. The extraction amount 7b of the clause 5b is (1,2,0), the extraction amount 7c of the clause 5c is (5,4), and the extraction amount 7d of the clause 5d is (3,0,8,8). ..

導出部５０は、抽出量７ａ〜７ｄに基づき、特徴量８ａ〜８ｄをそれぞれ導出する。図４に示される例では、導出部５０は、抽出量に含まれるインデックス２ｐのうちの最小値を特徴量として導出する。例えば、導出部５０は、抽出量７ａである（７，２）から最小値である「２」を文節５ａの特徴量８ａとして導出する。導出部５０は、文節５ｂ，５ｃ，５ｄについても同様の処理を行うことによって、文節５ｂの特徴量８ｂ、文節５ｃの特徴量８ｃ、及び文節５ｄの特徴量８ｄを導出する。文節５ｂの特徴量８ｂは「０」であり、文節５ｃの特徴量８ｃは「４」であり、文節５ｄの特徴量８ｄは「０」である。導出部５０は、検出文章Ｔ２３及び特徴量８ａ〜８ｄを作成部６０に出力する。 The derivation unit 50 derives the feature quantities 8a to 8d based on the extraction amounts 7a to 7d, respectively. In the example shown in FIG. 4, the derivation unit 50 derives the minimum value of the index 2p included in the extraction amount as the feature amount. For example, the derivation unit 50 derives the minimum value “2” from the extraction amount 7a (7, 2) as the feature amount 8a of the clause 5a. The derivation unit 50 derives the feature amount 8b of the clause 5b, the feature amount 8c of the clause 5c, and the feature amount 8d of the clause 5d by performing the same processing for the clauses 5b, 5c, and 5d. The feature amount 8b of the clause 5b is "0", the feature amount 8c of the clause 5c is "4", and the feature amount 8d of the clause 5d is "0". The derivation unit 50 outputs the detection text T23 and the feature quantities 8a to 8d to the creation unit 60.

作成部６０は、特徴量８ａ〜８ｄに基づき、文節５ａ〜５ｄを配列して第３文章Ｔ３１を作成する機能部である。作成部６０は、例えば、文節５ａ〜５ｄを並び替えて第３文章Ｔ３１を作成する並び替え処理を実行する。第３文章Ｔ３１は、第１言語で構成される第１文章Ｔ１１が第２言語で訳され、かつ、第２言語で記載された文節５ａ〜５ｄが第１文章Ｔ１１の文型に従って並んでいる文章となる。作成部６０は、例えば、特徴量８ａ〜８ｄが第１文章Ｔ１１の文頭から文末に並ぶ形態素２ａ〜２ｉのインデックス２ｐの順序に対応するように文節５ａ〜５ｄを配列して第３文章Ｔ３１を作成する。例えば、第１文章Ｔ１１の文頭から文末に向けて形態素２ａ〜２ｉにインデックス２ｐが昇順に付与されている場合、作成部６０は、特徴量８ａ〜８ｄが昇順となるように文節５ａ〜５ｄを並び替えることで第３文章Ｔ３１を作成する。 The creating unit 60 is a functional unit that creates the third sentence T31 by arranging the clauses 5a to 5d based on the feature quantities 8a to 8d. The creation unit 60 executes, for example, a rearrangement process for rearranging the clauses 5a to 5d to create the third sentence T31. In the third sentence T31, the first sentence T11 composed of the first language is translated into the second language, and the clauses 5a to 5d described in the second language are arranged according to the sentence pattern of the first sentence T11. It becomes. For example, the creating unit 60 arranges the clauses 5a to 5d so that the features 8a to 8d correspond to the order of the indexes 2p of the morphemes 2a to 2i arranged from the beginning to the end of the first sentence T11 to form the third sentence T31. create. For example, when the indexes 2p are assigned to the morphemes 2a to 2i in ascending order from the beginning to the end of the first sentence T11, the creating unit 60 sets the clauses 5a to 5d so that the feature quantities 8a to 8d are in ascending order. The third sentence T31 is created by sorting.

なお、複数の文節における特徴量が同一の値である場合には、作成部６０は、第２文章Ｔ２１における文節の順番に従って、当該複数の文節を配列する。すなわち、複数の文節における特徴量が同一の値である場合、第３文章Ｔ３１における当該複数の文節の順序は、第２文章Ｔ２１における当該複数の文節の順序を保持する。 When the feature quantities in the plurality of clauses have the same value, the creating unit 60 arranges the plurality of clauses according to the order of the clauses in the second sentence T21. That is, when the feature quantities in the plurality of clauses have the same value, the order of the plurality of clauses in the third sentence T31 retains the order of the plurality of clauses in the second sentence T21.

図５は、図１に示される作成部による並び替え処理の一例を示す図である。図５に示されるように、作成部６０は、特徴量８ａ〜８ｄを昇順に並び替える。図５に示される例では、特徴量８ａが「２」であり、特徴量８ｂが「０」であり、特徴量８ｃが「４」であり、特徴量８ｄが「０」である。したがって、作成部６０は、特徴量８ｂ、特徴量８ｄ、特徴量８ａ、特徴量８ｃの順に並び替える。そして、作成部６０は、特徴量の順番に従って、文節５ａ〜５ｄを文頭から文末に向けて文節５ｂ、文節５ｄ、文節５ａ、文節５ｃの順に配列する（並び替える）ことによって、第３文章Ｔ３１を作成する。なお、特徴量８ｂと特徴量８ｄとは同一の値であるので、作成部６０は、元の文節の順序を保持して、文節５ｂ、文節５ｄの順に並べる。 FIG. 5 is a diagram showing an example of the sorting process by the creating unit shown in FIG. As shown in FIG. 5, the creating unit 60 sorts the feature quantities 8a to 8d in ascending order. In the example shown in FIG. 5, the feature amount 8a is "2", the feature amount 8b is "0", the feature amount 8c is "4", and the feature amount 8d is "0". Therefore, the creating unit 60 rearranges the feature amount 8b, the feature amount 8d, the feature amount 8a, and the feature amount 8c in this order. Then, the creating unit 60 arranges (rearranges) the clauses 5a to 5d in the order of the clause 5b, the clause 5d, the clause 5a, and the clause 5c from the beginning of the sentence to the end of the sentence according to the order of the features, so that the third sentence T31 To create. Since the feature amount 8b and the feature amount 8d have the same value, the creating unit 60 keeps the order of the original clauses and arranges them in the order of clauses 5b and 5d.

図５では、説明の便宜上、第３文章Ｔ３１において区切り４を残しているが、作成部６０は、区切り４を取り除いた上で文節５ａ〜５ｄを並び替えることにより、第３文章Ｔ３１を作成してもよい。作成部６０は、取得部１０から取得した第１文章Ｔ１１と第３文章Ｔ３１との組み合わせを第２対訳データとして第２対訳コーパス８４に出力し、第２対訳コーパス８４に記憶させる。 In FIG. 5, for convenience of explanation, the delimiter 4 is left in the third sentence T31, but the creating unit 60 creates the third sentence T31 by removing the delimiter 4 and rearranging the clauses 5a to 5d. You may. The creating unit 60 outputs the combination of the first sentence T11 and the third sentence T31 acquired from the acquisition unit 10 to the second bilingual corpus 84 as the second bilingual data, and stores it in the second bilingual corpus 84.

図６は、図１に示される文章作成装置によって実行される文章作成方法の一連の処理を示すフローチャートである。図６に示される一連の処理は、例えば、第１対訳コーパス８２に新たな第１対訳データが格納されることによって開始される。図６に示されるように、まず、取得部１０は取得処理を実行する（ステップＳ１０）。ステップＳ１０では、取得部１０は、第１対訳コーパス８２から第１文章Ｔ１１及び第２文章Ｔ２１を含む第１対訳データを取得する。そして、取得部１０は、第１対訳データを解析部２０に出力する。 FIG. 6 is a flowchart showing a series of processes of the sentence creation method executed by the sentence creation device shown in FIG. The series of processes shown in FIG. 6 is started, for example, by storing new first translation data in the first translation corpus 82. As shown in FIG. 6, first, the acquisition unit 10 executes the acquisition process (step S10). In step S10, the acquisition unit 10 acquires the first translation data including the first sentence T11 and the second sentence T21 from the first translation corpus 82. Then, the acquisition unit 10 outputs the first parallel translation data to the analysis unit 20.

続いて、解析部２０は形態素解析処理を実行する（ステップＳ２０）。ステップＳ２０では、解析部２０は、取得部１０から第１対訳データを受け取ると、第１文章Ｔ１１及び第２文章Ｔ２１のそれぞれを形態素解析することによって、第１解析文章Ｔ１２及び第２解析文章Ｔ２２を作成する。そして、解析部２０は、第１解析文章Ｔ１２を対応付け部４０に出力し、第２解析文章Ｔ２２を検出部３０に出力する。 Subsequently, the analysis unit 20 executes the morphological analysis process (step S20). In step S20, when the analysis unit 20 receives the first bilingual data from the acquisition unit 10, it morphologically analyzes each of the first sentence T11 and the second sentence T21, thereby performing the first analysis sentence T12 and the second analysis sentence T22. To create. Then, the analysis unit 20 outputs the first analysis sentence T12 to the association unit 40, and outputs the second analysis sentence T22 to the detection unit 30.

続いて、検出部３０は、文節区切り処理を実行する（ステップＳ３０）。ステップＳ３０では、検出部３０は、解析部２０から第２解析文章Ｔ２２を受け取ると、第２解析文章Ｔ２２に区切り４を挿入して第２解析文章Ｔ２２を区切る。この処理によって、検出部３０は、文節５ａ〜５ｄを検出し、検出文章Ｔ２３を作成する。そして、検出部３０は、検出文章Ｔ２３を対応付け部４０に出力する。 Subsequently, the detection unit 30 executes the phrase delimiter process (step S30). In step S30, when the detection unit 30 receives the second analysis sentence T22 from the analysis unit 20, the detection unit 30 inserts a delimiter 4 into the second analysis sentence T22 to delimit the second analysis sentence T22. By this process, the detection unit 30 detects the clauses 5a to 5d and creates the detection sentence T23. Then, the detection unit 30 outputs the detection sentence T23 to the association unit 40.

続いて、対応付け部４０は、対応付け処理を実行する（ステップＳ４０）。ステップＳ４０では、対応付け部４０は、解析部２０から第１解析文章Ｔ１２を受け取り、検出部３０から検出文章Ｔ２３を受け取ると、第１解析文章Ｔ１２の形態素２ａ〜２ｉと検出文章Ｔ２３の形態素３ａ〜３ｋとを各形態素の意味に基づき対応付ける。そして、対応付け部４０は、形態素間の対応関係を示す情報、第１解析文章Ｔ１２及び検出文章Ｔ２３を導出部５０に出力する。 Subsequently, the mapping unit 40 executes the mapping process (step S40). In step S40, when the matching unit 40 receives the first analysis sentence T12 from the analysis unit 20 and the detection sentence T23 from the detection unit 30, the morphemes 2a to 2i of the first analysis sentence T12 and the morpheme 3a of the detection sentence T23 ~ 3k are associated with each other based on the meaning of each morpheme. Then, the mapping unit 40 outputs the information indicating the correspondence between the morphemes, the first analysis sentence T12 and the detection sentence T23 to the derivation unit 50.

続いて、導出部５０は、特徴量導出処理を実行する（ステップＳ５０）。ステップＳ５０では、導出部５０は、対応付け部４０から第１解析文章Ｔ１２及び検出文章Ｔ２３を受け取ると、文節５ａ〜５ｄのそれぞれの特徴量８ａ〜８ｄを導出する。そして、導出部５０は、検出文章Ｔ２３及び特徴量８ａ〜８ｄを作成部６０に出力する。 Subsequently, the derivation unit 50 executes the feature amount derivation process (step S50). In step S50, when the derivation unit 50 receives the first analysis sentence T12 and the detection sentence T23 from the association unit 40, the derivation unit 50 derives the feature quantities 8a to 8d of the clauses 5a to 5d, respectively. Then, the derivation unit 50 outputs the detection text T23 and the feature quantities 8a to 8d to the creation unit 60.

続いて、作成部６０は、並び替え処理を実行する（ステップＳ６０）。ステップＳ６０では、作成部６０は、導出部５０から検出文章Ｔ２３及び特徴量８ａ〜８ｄを受け取ると、特徴量８ａ〜８ｄに基づき、文節５ａ〜５ｄを配列して第３文章Ｔ３１を作成する。 Subsequently, the creating unit 60 executes the sorting process (step S60). In step S60, when the creating unit 60 receives the detected sentences T23 and the feature amounts 8a to 8d from the derivation unit 50, the creating unit 60 arranges the clauses 5a to 5d based on the feature amounts 8a to 8d to create the third sentence T31.

続いて、作成部６０は、出力処理を実行する（ステップＳ７０）。ステップＳ７０では、作成部６０は、第３文章Ｔ３１と、取得部１０から受け取った第１文章Ｔ１１との組み合わせを第２対訳データとして第２対訳コーパス８４に出力する。当該出力により、作成部６０は、第２対訳データを第２対訳コーパス８４に記憶させる。以上により、文章作成方法の一連の処理が終了する。 Subsequently, the creation unit 60 executes the output process (step S70). In step S70, the creating unit 60 outputs the combination of the third sentence T31 and the first sentence T11 received from the acquisition unit 10 to the second bilingual corpus 84 as the second bilingual data. With this output, the creation unit 60 stores the second translation data in the second translation corpus 84. This completes a series of processing of the sentence creation method.

文章作成装置１は、例えば、第１対訳コーパス８２におけるすべての第１対訳データに対して、図６に示されるフローチャートの処理を実行する。文章作成装置１においてある第１対訳データに対する処理が終了したとき、文章作成装置１は、未処理の第１対訳データが第１対訳コーパス８２にあるか否かを判定する。文章作成装置１が未処理の第１対訳データがあると判定した場合、取得部１０は未処理の第１対訳データを第１対訳コーパス８２から取得する。文章作成装置１は、第１対訳データのそれぞれに対応する第２対訳データを作成し、第２対訳データを第２対訳コーパス８４に記憶させる。この構成により、文章作成装置１は、第１対訳コーパス８２に対応する第２対訳コーパス８４を作成することができる。文章作成装置１が未処理の第１対訳データが第１対訳コーパス８２にないと判定した場合、文章作成装置１は、第１対訳コーパス８２に対する処理を終了する。 The text creation device 1 executes, for example, the processing of the flowchart shown in FIG. 6 for all the first translation data in the first translation corpus 82. When the processing for the first translation data in the text creation device 1 is completed, the text creation device 1 determines whether or not the unprocessed first translation data is in the first translation corpus 82. When the sentence creation device 1 determines that there is unprocessed first bilingual data, the acquisition unit 10 acquires the unprocessed first bilingual data from the first bilingual corpus 82. The sentence creation device 1 creates a second bilingual data corresponding to each of the first bilingual data, and stores the second bilingual data in the second bilingual corpus 84. With this configuration, the sentence creation device 1 can create a second translation corpus 84 corresponding to the first translation corpus 82. When the sentence creation device 1 determines that the unprocessed first translation data is not in the first translation corpus 82, the sentence creation device 1 ends the processing for the first translation corpus 82.

なお、図６に示されるフローチャートにおいて、文章作成装置１は、取得処理（ステップＳ１０）より後であって、特徴量導出処理（ステップＳ５０）の前であれば、文節区切り処理（ステップＳ３０）をいつ実行してもよい。 In the flowchart shown in FIG. 6, if the sentence creation device 1 is after the acquisition process (step S10) and before the feature amount derivation process (step S50), the sentence segmentation process (step S30) is performed. You may run it at any time.

以上説明した文章作成装置１においては、検出文章Ｔ２３の文節５ａ〜５ｄのそれぞれの特徴量８ａ〜８ｄにより検出文章Ｔ２３の文節が配列され、第３文章Ｔ３１が作成される。例えば、第２言語の文章に対して自立語又は機能語の判別を行う処理と、一番確からしい対応づけがなされている自立語の対応付けを残す処理とを行う場合と比べて、文章作成装置１では検出文章Ｔ２３の文節５ａ〜５ｄを配列する並び替え処理（ステップＳ６０）までに複雑な処理を必要せず、第３文章Ｔ３１を作成するまでの時間を短縮することができる。第３文章Ｔ３１は第１文章Ｔ１１の形態素２ａ〜２ｉの位置を考慮した文章となるため、第３文章Ｔ３１は、第１文章Ｔ１１を第２言語で同時通訳した文章として作成される。よって、同時通訳用の対訳コーパスである第２対訳コーパス８４をより容易に作成することができる。第２対訳データが記憶された第２対訳コーパス８４（同時通訳用の対訳コーパス）に基づいて構築された翻訳モデルは、例えば、第１言語の文章を、第２言語に同時通訳することが可能になる。 In the sentence creating device 1 described above, the phrases of the detected sentence T23 are arranged by the respective feature quantities 8a to 8d of the phrases 5a to 5d of the detected sentence T23, and the third sentence T31 is created. For example, compared to the process of discriminating an independent word or a function word for a sentence in a second language and the process of leaving the correspondence of the independent words with the most probable correspondence, the sentence is created. The device 1 does not require complicated processing up to the rearrangement process (step S60) for arranging the clauses 5a to 5d of the detected sentence T23, and can shorten the time until the third sentence T31 is created. Since the third sentence T31 is a sentence considering the positions of the morphemes 2a to 2i of the first sentence T11, the third sentence T31 is created as a sentence obtained by simultaneously interpreting the first sentence T11 in a second language. Therefore, it is possible to more easily create a second translation corpus 84, which is a translation corpus for simultaneous interpretation. A translation model constructed based on the second translation corpus 84 (translation corpus for simultaneous interpretation) in which the second translation data is stored can, for example, simultaneously translate a sentence in the first language into a second language. become.

上述のように作成部６０は、文節５ａ〜５ｄのそれぞれの特徴量８ａ〜８ｄが第１文章Ｔ１１の文頭から文末に並ぶ形態素２ａ〜２ｉのインデックス２ｐの順序（昇順）に対応するように文節５ａ〜５ｄのそれぞれを配列して第３文章Ｔ３１を作成している。第３文章Ｔ３１は形態素２ａ〜２ｉのインデックス２ｐの順序に対応した文章となるため、文章作成装置１は、第１文章Ｔ１１を第２言語で同時通訳した文章として第３文章Ｔ３１を作成することができる。 As described above, the creating unit 60 sets the clauses so that the feature quantities 8a to 8d of the clauses 5a to 5d correspond to the order (ascending order) of the indexes 2p of the morphemes 2a to 2i arranged from the beginning to the end of the first sentence T11. The third sentence T31 is created by arranging each of 5a to 5d. Since the third sentence T31 is a sentence corresponding to the order of the indexes 2p of the morphemes 2a to 2i, the sentence creation device 1 creates the third sentence T31 as a sentence obtained by simultaneously interpreting the first sentence T11 in the second language. Can be done.

上述のように導出部５０は、文節５ａ〜５ｄのそれぞれに含まれる形態素に対応付けられた第１解析文章Ｔ１２の形態素のインデックス２ｐのうち最小値を文節５ａ〜５ｄのそれぞれの特徴量としている。上記実施形態では、第１解析文章Ｔ１２（第１文章Ｔ１１）の文頭から文末に向かってインデックス２ｐは昇順に割り当てられているので、作成部６０は、特徴量８ａ〜８ｄが昇順に並ぶように、文節５ａ〜５ｄを配列して第３文章Ｔ３１を作成している。言い換えると、各文節の抽出量のうちの最も小さい値（インデックス２ｐ）を基準として文節が並び替えられる。この構成により、第１文章Ｔ１１の文頭に近い形態素に対応した文節から順に配列され得る。よって、作成された第２対訳コーパス８４（同時通訳用の対訳コーパス）は、逐次入力される第１文章Ｔ１１の形態素を先に早く訳出できるように翻訳モデルを学習させることができる。第１文章Ｔ１１に含まれる形態素は、文頭に近いほど先に訳出される必要がある。したがって、当該翻訳モデルにおいて、第１文章Ｔ１１が入力されてから第３文章Ｔ３１が訳出されるまでの時間を短縮することができる。 As described above, the derivation unit 50 sets the minimum value of the index 2p of the morpheme of the first analysis sentence T12 associated with the morpheme included in each of the clauses 5a to 5d as the feature amount of each of the clauses 5a to 5d. .. In the above embodiment, the indexes 2p are assigned in ascending order from the beginning to the end of the first analysis sentence T12 (first sentence T11), so that the creating unit 60 arranges the feature quantities 8a to 8d in ascending order. , The third sentence T31 is created by arranging the clauses 5a to 5d. In other words, the clauses are sorted based on the smallest value (index 2p) of the extraction amount of each clause. With this configuration, the phrases corresponding to the morphemes near the beginning of the first sentence T11 can be arranged in order. Therefore, the created second translation corpus 84 (translation corpus for simultaneous interpretation) can train the translation model so that the morphemes of the first sentence T11 that are sequentially input can be translated first. The morpheme contained in the first sentence T11 needs to be translated earlier as it is closer to the beginning of the sentence. Therefore, in the translation model, the time from the input of the first sentence T11 to the translation of the third sentence T31 can be shortened.

なお、第３文章Ｔ３１におけるすべての文節が、必ずしも第１文章Ｔ１１の文頭に近い形態素に対応した文節から順に配列されていなくてもよい。例えば、図５に示されるように、文節５ａ〜５ｄのうち文節５ｃに対応した第１文章Ｔ１１のすべての形態素（形態素２ｅ及び形態素２ｆ）が文節５ａに対応した第１文章Ｔ１１の形態素の少なくとも１つ（形態素２ｈ）より文頭に近い位置にあった場合であっても、第３文章Ｔ３１において文節５ｃが文節５ａより文末側に位置する場合もある。この場合であっても、文節５ａ〜５ｄのうち少なくとも１つ以上の文節（例えば、文節５ｂ及び文節５ｄ）は、第１文章Ｔ１１の文頭に近い形態素に対応した文節から順に配列される。よって、作成された第２対訳コーパス８４（同時通訳用の対訳コーパス）は、逐次入力される第１文章Ｔ１１の少なくとも一部を先に早く訳出できるように翻訳モデルを学習させることができる。例えば、第１対訳コーパス８２を用いて翻訳モデルが学習された場合、図５の例では、形態素２ｈが入力されるまで、翻訳モデルは何も出力することができない。一方、第２対訳コーパス８４を用いて翻訳モデルが学習された場合、図５の例では、形態素２ｂが入力されると、翻訳モデルは翻訳文の最初の文節（文節５ｂ）のうちの最初の形態素３ｃを出力することができる。したがって、当該翻訳モデルにおいて、第１文章Ｔ１１が入力されてから第３文章Ｔ３１の少なくとも一部が訳出されるまでの時間を短縮することができる。 It should be noted that all the clauses in the third sentence T31 do not necessarily have to be arranged in order from the clause corresponding to the morpheme close to the beginning of the first sentence T11. For example, as shown in FIG. 5, all the morphemes (morphemes 2e and morphemes 2f) of the first sentence T11 corresponding to the clause 5c among the clauses 5a to 5d are at least the morphemes of the first sentence T11 corresponding to the clause 5a. Even if it is located closer to the beginning of the sentence than one (morpheme 2h), the phrase 5c may be located closer to the end of the sentence than the phrase 5a in the third sentence T31. Even in this case, at least one or more clauses (for example, clauses 5b and 5d) among the clauses 5a to 5d are arranged in order from the clause corresponding to the morpheme close to the beginning of the first sentence T11. Therefore, the created second translation corpus 84 (translation corpus for simultaneous interpretation) can train the translation model so that at least a part of the first sentence T11 that is sequentially input can be translated first. For example, when the translation model is trained using the first translation corpus 82, in the example of FIG. 5, the translation model cannot output anything until the morpheme 2h is input. On the other hand, when the translation model is trained using the second translation corpus 84, in the example of FIG. 5, when the morpheme 2b is input, the translation model is the first of the first clauses (phrases 5b) of the translated sentence. The morpheme 3c can be output. Therefore, in the translation model, it is possible to shorten the time from the input of the first sentence T11 to the translation of at least a part of the third sentence T31.

本発明は、上述の実施形態に限定されない。例えば、特徴量は、文節５ａ〜５ｄ内の形態素に対応付けられた第１解析文章Ｔ１２の形態素のインデックス２ｐのうち最小値でなくてもよい。特徴量は、例えば、文節５ａ〜５ｄのそれぞれに含まれる形態素に対応付けられた第１解析文章Ｔ１２の形態素のインデックス２ｐのうち最大値であってもよい。 The present invention is not limited to the above-described embodiment. For example, the feature amount does not have to be the minimum value of the index 2p of the morpheme of the first analysis sentence T12 associated with the morpheme in the clauses 5a to 5d. The feature amount may be, for example, the maximum value of the index 2p of the morpheme of the first analysis sentence T12 associated with the morpheme included in each of the clauses 5a to 5d.

この場合、導出部５０は、特徴量として抽出量７ａ〜７ｄのそれぞれのうち、最大値を出力する。図４に示される例では、導出部５０は、抽出量７ａである（７，２）から最大値である「７」を文節５ａの特徴量８ａとして導出する。導出部５０は、文節５ｂ，５ｃ，５ｄについても同様の処理を行うことによって、文節５ｂの特徴量８ｂ、文節５ｃの特徴量８ｃ、及び文節５ｄの特徴量８ｄを導出する。この場合、文節５ｂの特徴量８ｂは「２」であり、文節５ｃの特徴量８ｃは「５」であり、文節５ｄの特徴量８ｄは「８」である。 In this case, the derivation unit 50 outputs the maximum value of each of the extraction amounts 7a to 7d as the feature amount. In the example shown in FIG. 4, the derivation unit 50 derives the maximum value “7” from the extraction amount 7a (7, 2) as the feature amount 8a of the clause 5a. The derivation unit 50 derives the feature amount 8b of the clause 5b, the feature amount 8c of the clause 5c, and the feature amount 8d of the clause 5d by performing the same processing for the clauses 5b, 5c, and 5d. In this case, the feature amount 8b of the clause 5b is "2", the feature amount 8c of the clause 5c is "5", and the feature amount 8d of the clause 5d is "8".

以上のように導出部５０は、文節５ａ〜５ｄのそれぞれの抽出量のうち最大値を特徴量とし、作成部６０は、文節５ａ〜５ｄのそれぞれの特徴量が第１解析文章Ｔ１２の文頭から文末に並ぶ形態素２ａ〜２ｉのインデックス２ｐの順序に対応するように、文節５ａ〜５ｄを配列して第３文章Ｔ３１を作成してもよい。この変形例では、第１解析文章Ｔ１２（第１文章Ｔ１１）の文頭から文末に向かってインデックス２ｐは昇順に割り当てられているので、作成部６０は、特徴量８ａ〜８ｄが昇順に並ぶように、文節５ａ〜５ｄを配列して第３文章Ｔ３１を作成する。言い換えると、各文節の抽出量のうちの最も大きい値（インデックス２ｐ）を基準として文節が並び替えられる。この構成により、第１文章Ｔ１１の文頭から文末に向けて、対応付けられているすべての形態素が揃う順に文節が配列される。よって、作成された第２対訳コーパス８４（同時通訳用の対訳コーパス）は、第１文章Ｔ１１が文頭から逐次入力される場合、第２文章Ｔ２１の文節に対応する形態素がすべて揃った順に優先的に出力できるように翻訳モデルを学習させることができる。この翻訳モデルにおいては、文節に対応する第１文章Ｔ１１の形態素がすべて揃ってから当該文節が翻訳されるので、第１言語で表される第１文章Ｔ１１に対する第２言語での訳出の正確性を向上させることができる。 As described above, the derivation unit 50 uses the maximum value of each of the extraction amounts of the clauses 5a to 5d as the feature amount, and the creation unit 60 sets the feature amount of each of the clauses 5a to 5d from the beginning of the first analysis sentence T12. The third sentence T31 may be created by arranging the clauses 5a to 5d so as to correspond to the order of the indexes 2p of the morphemes 2a to 2i arranged at the end of the sentence. In this modification, the indexes 2p are assigned in ascending order from the beginning to the end of the first analysis sentence T12 (first sentence T11), so that the creating unit 60 arranges the feature quantities 8a to 8d in ascending order. , The clauses 5a to 5d are arranged to create the third sentence T31. In other words, the clauses are sorted based on the largest value (index 2p) of the extraction amount of each clause. With this configuration, the clauses are arranged in the order in which all the associated morphemes are aligned from the beginning to the end of the first sentence T11. Therefore, in the created second translation corpus 84 (translation corpus for simultaneous interpretation), when the first sentence T11 is sequentially input from the beginning of the sentence, priority is given to the order in which all the morphemes corresponding to the clauses of the second sentence T21 are prepared. The translation model can be trained so that it can be output to. In this translation model, since the phrase is translated after all the morphemes of the first sentence T11 corresponding to the phrase are prepared, the accuracy of translation in the second language for the first sentence T11 represented by the first language is correct. Can be improved.

以下、別の実施形態を説明する。図７は、別の実施形態に係る文章作成装置の構成を示す図である。図７に示されるように、文章作成装置１Ａは、重み付け部７０をさらに備える点、及び導出部５０の特徴量導出処理において、文章作成装置１と主に相違する。 Hereinafter, another embodiment will be described. FIG. 7 is a diagram showing a configuration of a text creation device according to another embodiment. As shown in FIG. 7, the text creation device 1A is mainly different from the text creation device 1 in that the weighting unit 70 is further provided and the feature quantity derivation process of the derivation unit 50 is provided.

重み付け部７０は、重み付け処理として、第１解析文章Ｔ１２の形態素２ａ〜２ｉに対して重みを設定する機能部である。重みとは、例えば、文章内に出現する各形態素の重要度を示す。重みが大きいほど、その形態素の重要度が高いことを示す。重みは、例えば０以上１以下の値である。重み付け部７０は、例えば、公知の重み付け手法を利用して重みを設定する重み付け処理を実行する。公知の重み付け手法とは、例えば、文書（文章）内の形態素の出現頻度、又は、文書（文章）内の形態素の逆文書頻度に基づく手法である。文書は複数の文章を含む。形態素の出現頻度とは、文書又は文章内のすべての単語の出現回数のうち、対象の形態素の出現回数が占める割合を表す。形態素の逆文書頻度とは、文書の集合の中のある形態素が含まれる文書の割合の逆数を表し、形態素が他の文書に多く出現していればいるほど小さい値となる。具体的な重み付けツールとして、例えば、ＴＦ−ＩＤＦ、及びＯｋａｐｉＢＭ２５が挙げられる。なお、重み付け手法として、ユーザ操作による重み付けを実行してもよい。 The weighting unit 70 is a functional unit that sets weights for the morphemes 2a to 2i of the first analysis sentence T12 as a weighting process. The weight indicates, for example, the importance of each morpheme appearing in a sentence. The higher the weight, the higher the importance of the morpheme. The weight is, for example, a value of 0 or more and 1 or less. The weighting unit 70 executes, for example, a weighting process for setting weights by using a known weighting method. The known weighting method is, for example, a method based on the appearance frequency of morphemes in a document (text) or the inverse document frequency of morphemes in a document (text). The document contains multiple sentences. The frequency of appearance of a morpheme represents the ratio of the number of appearances of a target morpheme to the number of appearances of all words in a document or sentence. The reciprocal of a morpheme represents the reciprocal of the proportion of documents containing a certain morpheme in a set of documents, and the more morphemes appear in other documents, the smaller the value. Specific weighting tools include, for example, TF-IDF and Okapi BM25. As a weighting method, weighting by a user operation may be executed.

重み付け部７０は、解析部２０により形態素解析された第１解析文章Ｔ１２に対して重み付け処理を実行する。重み付け部７０は、例えば、第１解析文章Ｔ１２において、形態素２ｂ、形態素２ｄ、形態素２ｆ及び形態素２ｈに対して他の形態素に比べて大きい重みを設定する。重み付け部７０は、第１解析文章Ｔ１２及び形態素２ａ〜２ｉの重みを導出部５０に出力する。 The weighting unit 70 executes a weighting process on the first analysis sentence T12 that has been morphologically analyzed by the analysis unit 20. For example, in the first analysis sentence T12, the weighting unit 70 sets a larger weight for the morpheme 2b, the morpheme 2d, the morpheme 2f, and the morpheme 2h than for the other morphemes. The weighting unit 70 outputs the weights of the first analysis sentence T12 and the morphemes 2a to 2i to the derivation unit 50.

導出部５０は、上述の重み付け部７０による重み付け処理の結果を用いて、文節５ａ〜５ｄのそれぞれの特徴量を導出する。導出部５０は、文節５ａ〜５ｄのそれぞれに含まれる形態素に対応付けられた第１解析文章Ｔ１２の形態素のうち、重みが最も大きい形態素のインデックス２ｐを文節５ａ〜５ｄのそれぞれの特徴量として導出する。例えば、導出部５０は、文節５ａに対応付けられた形態素（形態素２ｃ及び形態素２ｈ）のうち最も大きい重みが設定された形態素（重要度の高い形態素）である形態素２ｈに付与されたインデックス２ｐの「７」を文節５ａの特徴量として導出する。 The derivation unit 50 derives each feature amount of the clauses 5a to 5d by using the result of the weighting process by the weighting unit 70 described above. The derivation unit 50 derives the index 2p of the morpheme having the largest weight among the morphemes of the first analysis sentence T12 associated with the morphemes included in each of the clauses 5a to 5d as the feature quantities of the clauses 5a to 5d. do. For example, the derivation unit 50 is the index 2p assigned to the morpheme 2h, which is the morpheme (highly important morpheme) to which the largest weight is set among the morphemes (morpheme 2c and morpheme 2h) associated with the clause 5a. "7" is derived as the feature quantity of the clause 5a.

導出部５０は、文節５ｂ，５ｃ，５ｄに対しても同様の処理を行うことによって、文節５ｂ，５ｃ，５ｄのそれぞれの特徴量を得る。文節５ｂの特徴量は「１」であり、文節５ｃの特徴量は「５」であり、文節５ｄの特徴量は「３」である。 The derivation unit 50 performs the same processing on the clauses 5b, 5c, and 5d to obtain the feature quantities of the clauses 5b, 5c, and 5d, respectively. The feature amount of the clause 5b is "1", the feature amount of the clause 5c is "5", and the feature amount of the clause 5d is "3".

なお、図６のフローチャートにおいて、重み付け部７０は、例えば、対応付け処理（ステップＳ４０）の後であって、かつ、特徴量導出処理（ステップＳ５０）の前において、重み付け処理を実行する。 In the flowchart of FIG. 6, the weighting unit 70 executes the weighting process, for example, after the mapping process (step S40) and before the feature amount derivation process (step S50).

なお、文章作成装置１は、形態素解析処理（ステップＳ２０）より後であって、特徴量導出処理（ステップＳ５０）の前であれば、重み付け処理（ステップＳ４５）をいつ実行してもよい。対応付け部４０は、検出文章Ｔ２３を重み付け部７０に出力せず、導出部５０に出力してもよい。 The sentence creation device 1 may execute the weighting process (step S45) at any time after the morphological analysis process (step S20) and before the feature amount derivation process (step S50). The association unit 40 may output the detection sentence T23 to the derivation unit 50 without outputting it to the weighting unit 70.

以上のように文章作成装置１Ａにおいても、文章作成装置１と同様の効果が奏される。さらに、文章作成装置１Ａにおいては、文節５ａ〜５ｄのそれぞれに含まれる形態素に対応付けられた第１解析文章Ｔ１２の形態素のうち重みが最も高い形態素のインデックス２ｐが文節５ａ〜５ｄのそれぞれの特徴量として導出される。そして、文節５ａ〜５ｄのそれぞれの特徴量が第１解析文章Ｔ１２の文頭から文末に並ぶ形態素２ａ〜２ｉのインデックス２ｐの順序に対応するように文節５ａ〜５ｄが配列されることによって、第３文章Ｔ３１Ａが作成される。文章作成装置１では、第１解析文章Ｔ１２の文頭から文末に向かって重要度が高い形態素が現れた順に、当該形態素に対応付けられた検出文章Ｔ２３の文節が並び替えられる。作成された第２対訳コーパス８４（同時通訳用の対訳コーパス）は、逐次入力される第１文章Ｔ１１のうち、第１文章Ｔ１１の重要度が高い形態素から第３文章Ｔ３１Ａに訳出できるように翻訳モデルを学習させることができる。よって、当該翻訳モデルは、第１文章Ｔ１１の重要度の高い形態素を訳出した形態素を含む文節が出力されるまでの時間を短縮させることができる。 As described above, the sentence creation device 1A also has the same effect as the sentence creation device 1. Further, in the sentence creation device 1A, the index 2p of the morpheme having the highest weight among the morphemes of the first analysis sentence T12 associated with the morphemes included in each of the clauses 5a to 5d is a feature of each of the clauses 5a to 5d. Derived as a quantity. Then, the third clauses 5a to 5d are arranged so that the feature quantities of the clauses 5a to 5d correspond to the order of the indexes 2p of the morphemes 2a to 2i arranged from the beginning to the end of the first analysis sentence T12. The sentence T31A is created. In the sentence creation device 1, the phrases of the detected sentence T23 associated with the morpheme are rearranged in the order in which the morphemes of high importance appear from the beginning of the sentence to the end of the first analysis sentence T12. The created second translation corpus 84 (translation corpus for simultaneous interpretation) is translated so that the morpheme of the first sentence T11, which is of high importance, can be translated into the third sentence T31A among the first sentences T11 that are sequentially input. You can train the model. Therefore, the translation model can shorten the time until the phrase including the morpheme that translates the highly important morpheme of the first sentence T11 is output.

なお、導出部５０は、文節５ａ〜５ｄのそれぞれに含まれる形態素に対応付けられた第１解析文章Ｔ１２の形態素のうち、重みが所定の閾値以上の第１解析文章Ｔ１２の形態素のインデックス２ｐを文節５ａ〜５ｄのそれぞれに対する特徴量として導出してもよい。第１解析文章Ｔ１２のある文節において、重みが所定の閾値以上となる複数の形態素が存在する場合、導出部５０は、例えば、当該複数の形態素に付与されたインデックス２ｐのうち最小値又は最大値を特徴量として導出してもよい。 The derivation unit 50 sets an index 2p of the morpheme of the first analysis sentence T12 whose weight is equal to or more than a predetermined threshold among the morphemes of the first analysis sentence T12 associated with the morphemes included in each of the clauses 5a to 5d. It may be derived as a feature amount for each of the clauses 5a to 5d. When there are a plurality of morphemes whose weights are equal to or greater than a predetermined threshold value in a certain clause of the first analysis sentence T12, the derivation unit 50 may, for example, use the minimum value or the maximum value of the index 2p assigned to the plurality of morphemes. May be derived as a feature quantity.

当該インデックス２ｐのうち最小値を特徴量とした場合、第２対訳コーパス８４（同時通訳用の対訳コーパス）は、第１文章Ｔ１１の重要度の高い形態素のうち、逐次入力される形態素を先に早く訳出できるように翻訳モデルを学習させることができる。当該インデックス２ｐのうち最大値を特徴量とした場合、第２対訳コーパス８４（同時通訳用の対訳コーパス）は、第１文章Ｔ１１の重要度の高い形態素のうち、逐次入力される第１文章Ｔ１１の形態素に対応する第２文章Ｔ２１の文節から順に優先的に出力できるように翻訳モデルを学習させることができる。 When the minimum value of the index 2p is used as the feature quantity, the second translation corpus 84 (translation corpus for simultaneous interpretation) has the morphemes that are sequentially input among the morphemes of high importance in the first sentence T11. You can train the translation model so that you can translate quickly. When the maximum value of the index 2p is used as the feature quantity, the second translation corpus 84 (translation corpus for simultaneous interpretation) is the first sentence T11 that is sequentially input among the highly important morphemes of the first sentence T11. The translation model can be trained so that the phrase of the second sentence T21 corresponding to the morpheme of can be output in order from the phrase.

上述の別の実施形態において、重み付け部７０は、重み付け処理として、検出文章Ｔ２３の形態素３ａ〜３ｋに対して重みを設定してもよい。重み付け部７０は、例えば、検出文章Ｔ２３において、形態素３ａ、形態素３ｃ、形態素３ｆ及び形態素３ｈに対して他の形態素に比べて大きい重みを設定する。 In another embodiment described above, the weighting unit 70 may set weights for the morphemes 3a to 3k of the detection sentence T23 as the weighting process. For example, in the detection sentence T23, the weighting unit 70 sets a larger weight for the morpheme 3a, the morpheme 3c, the morpheme 3f, and the morpheme 3h than for the other morphemes.

この場合、導出部５０は、文節５ａ〜５ｄのそれぞれに含まれる重みが最も大きい形態素に対応付けられた第１解析文章Ｔ１２の形態素のインデックス２ｐを文節５ａ〜５ｄのそれぞれの特徴量として導出する。導出部５０は、例えば、文節５ａ内の形態素３ａ及び形態素３ｂのうち、最も大きい重みが設定された形態素（重要度の高い形態素）である形態素３ａを抽出する。導出部５０は、形態素３ａに対応付けられた形態素２ｈに付与されたインデックス２ｐの「７」を文節５ａの特徴量として導出する。 In this case, the derivation unit 50 derives the index 2p of the morpheme of the first analysis sentence T12 associated with the morpheme having the largest weight contained in each of the clauses 5a to 5d as the feature quantity of each of the clauses 5a to 5d. .. The derivation unit 50 extracts, for example, the morpheme 3a which is the morpheme (the morpheme of high importance) to which the largest weight is set among the morpheme 3a and the morpheme 3b in the clause 5a. The derivation unit 50 derives "7" of the index 2p assigned to the morpheme 2h associated with the morpheme 3a as a feature amount of the clause 5a.

導出部５０は、文節５ｂ，５ｃ，５ｄに対しても同様の処理を行うことによって、文節５ｂ，５ｃ，５ｄのそれぞれにおける特徴量を得る。文節５ｂの特徴量は「１」であり、文節５ｃの特徴量は「５」であり、文節５ｄの特徴量は「３」である。 The derivation unit 50 performs the same processing on the clauses 5b, 5c, and 5d to obtain the feature quantities in each of the clauses 5b, 5c, and 5d. The feature amount of the clause 5b is "1", the feature amount of the clause 5c is "5", and the feature amount of the clause 5d is "3".

文章作成装置１では、第１解析文章Ｔ１２の文頭から文末に向かって第２文章Ｔ２１の重要度が高い形態素に対応付けられた形態素が現れた順に、当該形態素に対応付けられた検出文章Ｔ２３の文節が並び替えられる。作成された第２対訳コーパス８４（同時通訳用の対訳コーパス）は、逐次入力される第１文章Ｔ１１のうち、第２文章Ｔ２１の重要度が高い形態素に対応付けられた第１文章Ｔ１１の形態素から第３文章Ｔ３１Ａに訳出できるように翻訳モデルを学習させることができる。よって、当該翻訳モデルは、第２文章Ｔ２１の重要度の高い形態素を含む文節を出力するまでの時間を短縮させることができる。 In the sentence creation device 1, the detection sentence T23 associated with the morpheme associated with the morpheme of high importance of the second sentence T21 appears from the beginning of the sentence to the end of the first analysis sentence T12. The clauses are rearranged. The created second translation corpus 84 (translation corpus for simultaneous interpretation) is a morpheme of the first sentence T11 associated with a morpheme of high importance of the second sentence T21 among the first sentences T11 that are sequentially input. The translation model can be trained so that it can be translated into the third sentence T31A. Therefore, the translation model can shorten the time until the phrase including the highly important morpheme of the second sentence T21 is output.

文章作成装置１，１Ａは、第１対訳コーパス８２及び第２対訳コーパス８４の少なくとも一方を備えていてもよい。 The writing devices 1 and 1A may include at least one of the first translation corpus 82 and the second translation corpus 84.

なお、上記実施形態の説明に用いられたブロック図は、機能単位のブロックを示している。これらの機能ブロック（構成部）は、ハードウェア及びソフトウェアの少なくとも一方の任意の組み合わせによって実現される。各機能ブロックの実現方法は特に限定されない。すなわち、各機能ブロックは、物理的又は論理的に結合した１つの装置を用いて実現されてもよいし、物理的又は論理的に分離した２つ以上の装置を直接的又は間接的に（例えば、有線、無線などを用いて）接続し、これら複数の装置を用いて実現されてもよい。機能ブロックは、上記１つの装置又は上記複数の装置にソフトウェアを組み合わせて実現されてもよい。 The block diagram used in the description of the above embodiment shows a block of functional units. These functional blocks (components) are realized by any combination of at least one of hardware and software. The method of realizing each functional block is not particularly limited. That is, each functional block may be realized using one physically or logically coupled device, or two or more physically or logically separated devices can be directly or indirectly (eg, for example). , Wired, wireless, etc.) and may be realized using these plurality of devices. The functional block may be realized by combining the software with the one device or the plurality of devices.

機能には、判断、決定、判定、計算、算出、処理、導出、調査、探索、確認、受信、送信、出力、アクセス、解決、選択、選定、確立、比較、想定、期待、見做し、報知（broadcasting）、通知（notifying）、通信（communicating）、転送（forwarding）、構成（configuring）、再構成（reconfiguring）、割り当て（allocating、mapping）、及び割り振り（assigning）などがあるが、これらの機能に限られない。たとえば、送信を機能させる機能ブロック（構成部）は、送信部（transmitting unit）又は送信機（transmitter）と呼称される。いずれも、上述したとおり、実現方法は特に限定されない。 Functions include judgment, decision, judgment, calculation, calculation, processing, derivation, investigation, search, confirmation, reception, transmission, output, access, solution, selection, selection, establishment, comparison, assumption, expectation, and assumption. These include broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, and assigning. Not limited to functions. For example, a functional block (constituent unit) that causes transmission to function is referred to as a transmitting unit or a transmitter. In each case, as described above, the realization method is not particularly limited.

例えば、本開示の一実施形態における文章作成装置１，１Ａは、本開示の情報処理を行うコンピュータとして機能してもよい。図８は、本開示の一実施形態に係る文章作成装置１，１Ａのハードウェア構成の一例を示す図である。上述の文章作成装置１，１Ａは、物理的には、プロセッサ１００１、メモリ１００２、ストレージ１００３、通信装置１００４、入力装置１００５、出力装置１００６、及びバス１００７などを含むコンピュータ装置として構成されてもよい。 For example, the text creation devices 1 and 1A in one embodiment of the present disclosure may function as a computer that performs information processing of the present disclosure. FIG. 8 is a diagram showing an example of the hardware configuration of the text creation devices 1 and 1A according to the embodiment of the present disclosure. The above-mentioned sentence creating devices 1, 1A may be physically configured as a computer device including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like. ..

なお、以下の説明では、「装置」という文言は、回路、デバイス、及びユニットなどに読み替えることができる。文章作成装置１，１Ａのハードウェア構成は、図に示された各装置を１つ又は複数含むように構成されてもよいし、一部の装置を含まずに構成されてもよい。 In the following description, the word "device" can be read as a circuit, a device, a unit, or the like. The hardware configuration of the writing devices 1 and 1A may be configured to include one or more of the devices shown in the figure, or may be configured not to include some of the devices.

文章作成装置１，１Ａにおける各機能は、プロセッサ１００１及びメモリ１００２などのハードウェア上に所定のソフトウェア（プログラム）を読み込ませることによって、プロセッサ１００１が演算を行い、通信装置１００４による通信を制御したり、メモリ１００２及びストレージ１００３におけるデータの読み出し及び書き込みの少なくとも一方を制御したりすることによって実現される。 For each function in the text creation devices 1 and 1A, the processor 1001 performs an operation by loading predetermined software (program) on the hardware such as the processor 1001 and the memory 1002, and controls the communication by the communication device 1004. It is realized by controlling at least one of reading and writing of data in the memory 1002 and the storage 1003.

プロセッサ１００１は、例えば、オペレーティングシステムを動作させてコンピュータ全体を制御する。プロセッサ１００１は、周辺装置とのインターフェース、制御装置、演算装置、及びレジスタなどを含む中央処理装置（ＣＰＵ：Central Processing Unit）によって構成されてもよい。例えば、上述の文章作成装置１における各機能は、プロセッサ１００１によって実現されてもよい。 Processor 1001 operates, for example, an operating system to control the entire computer. The processor 1001 may be configured by a central processing unit (CPU) including an interface with a peripheral device, a control device, an arithmetic unit, a register, and the like. For example, each function in the above-mentioned sentence creating device 1 may be realized by the processor 1001.

プロセッサ１００１は、プログラム（プログラムコード）、ソフトウェアモジュール、及びデータなどを、ストレージ１００３及び通信装置１００４の少なくとも一方からメモリ１００２に読み出し、これらに従って各種の処理を実行する。プログラムとしては、上述の実施形態において説明された動作の少なくとも一部をコンピュータに実行させるプログラムが用いられる。例えば、文章作成装置１，１Ａにおける各機能は、メモリ１００２に格納され、プロセッサ１００１において動作する制御プログラムによって実現されてもよい。上述の各種処理は、１つのプロセッサ１００１によって実行される旨を説明してきたが、２以上のプロセッサ１００１により同時又は逐次に実行されてもよい。プロセッサ１００１は、１以上のチップによって実装されてもよい。なお、プログラムは、電気通信回線を介してネットワークから送信されてもよい。 The processor 1001 reads a program (program code), a software module, data, and the like from at least one of the storage 1003 and the communication device 1004 into the memory 1002, and executes various processes according to these. As the program, a program that causes a computer to execute at least a part of the operations described in the above-described embodiment is used. For example, each function in the sentence creating devices 1 and 1A may be realized by a control program stored in the memory 1002 and operating in the processor 1001. Although it has been described that the various processes described above are executed by one processor 1001, they may be executed simultaneously or sequentially by two or more processors 1001. Processor 1001 may be mounted by one or more chips. The program may be transmitted from the network via a telecommunication line.

メモリ１００２は、コンピュータ読み取り可能な記録媒体であり、例えば、ＲＯＭ（Read Only Memory）、ＥＰＲＯＭ（Erasable Programmable ＲＯＭ）、ＥＥＰＲＯＭ（Electrically Erasable Programmable ＲＯＭ）、及びＲＡＭ（Random Access Memory）などの少なくとも１つによって構成されてもよい。メモリ１００２は、レジスタ、キャッシュ、又はメインメモリ（主記憶装置）などと呼ばれてもよい。メモリ１００２は、本開示の一実施形態に係る情報処理を実施するために実行可能なプログラム（プログラムコード）、ソフトウェアモジュールなどを保存することができる。 The memory 1002 is a computer-readable recording medium, for example, by at least one of ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), RAM (Random Access Memory), and the like. It may be configured. The memory 1002 may be referred to as a register, a cache, a main memory (main storage device), or the like. The memory 1002 can store a program (program code), a software module, or the like that can be executed to perform information processing according to the embodiment of the present disclosure.

ストレージ１００３は、コンピュータ読み取り可能な記録媒体であり、例えば、ＣＤ−ＲＯＭ（Compact Disc ＲＯＭ）などの光ディスク、ハードディスクドライブ、フレキシブルディスク、光磁気ディスク（例えば、コンパクトディスク、デジタル多用途ディスク、Ｂｌｕ−ｒａｙ（登録商標）ディスク）、スマートカード、フラッシュメモリ（例えば、カード、スティック、キードライブ）、フロッピー（登録商標）ディスク、及び磁気ストリップなどの少なくとも１つによって構成されてもよい。ストレージ１００３は、補助記憶装置と呼ばれてもよい。第１対訳コーパス８２及び第２対訳コーパス８４は、例えば、メモリ１００２及びストレージ１００３の少なくとも一方を含むデータベース、サーバ、その他の適切な媒体であってもよい。 The storage 1003 is a computer-readable recording medium, for example, an optical disk such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disk, an optical magnetic disk (for example, a compact disk, a digital versatile disk, a Blu-ray). It may consist of at least one such as a (registered trademark) disk), a smart card, a flash memory (eg, a card, stick, key drive), a floppy (registered trademark) disk, and a magnetic strip. The storage 1003 may be referred to as an auxiliary storage device. The first translation corpus 82 and the second translation corpus 84 may be, for example, a database, server, or other suitable medium containing at least one of the memory 1002 and the storage 1003.

通信装置１００４は、有線ネットワーク及び無線ネットワークの少なくとも一方を介してコンピュータ間の通信を行うためのハードウェア（送受信デバイス）であり、例えばネットワークデバイス、ネットワークコントローラ、ネットワークカード、又は通信モジュールなどともいう。例えば、上述の取得部１０などは、通信装置１００４によって実現されてもよい。 The communication device 1004 is hardware (transmission / reception device) for communicating between computers via at least one of a wired network and a wireless network, and is also referred to as, for example, a network device, a network controller, a network card, or a communication module. For example, the above-mentioned acquisition unit 10 and the like may be realized by the communication device 1004.

入力装置１００５は、外部からの入力を受け付ける入力デバイス（例えば、キーボード、マウス、マイクロフォン、スイッチ、ボタン、及びセンサなど）である。出力装置１００６は、外部への出力を実施する出力デバイス（例えば、ディスプレイ、スピーカー、及びＬＥＤランプなど）である。なお、入力装置１００５及び出力装置１００６は、一体となった構成（例えば、タッチパネル）であってもよい。 The input device 1005 is an input device (for example, a keyboard, a mouse, a microphone, a switch, a button, a sensor, etc.) that receives an input from the outside. The output device 1006 is an output device (for example, a display, a speaker, an LED lamp, etc.) that performs output to the outside. The input device 1005 and the output device 1006 may have an integrated configuration (for example, a touch panel).

プロセッサ１００１及びメモリ１００２などの各装置は、情報を通信するためのバス１００７によって接続される。バス１００７は、単一のバスを用いて構成されてもよいし、装置間ごとに異なるバスを用いて構成されてもよい。 Each device such as the processor 1001 and the memory 1002 is connected by a bus 1007 for communicating information. The bus 1007 may be configured by using a single bus, or may be configured by using a different bus for each device.

文章作成装置１，１Ａは、マイクロプロセッサ、デジタル信号プロセッサ（ＤＳＰ：Digital Signal Processor）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）、ＦＰＧＡ（Field Programmable Gate Array）などのハードウェアを含んで構成されてもよく、当該ハードウェアにより、各機能ブロックの一部又は全てが実現されてもよい。例えば、プロセッサ１００１は、これらのハードウェアの少なくとも１つを用いて実装されてもよい。 The writing devices 1 and 1A include hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array). It may be configured by, and a part or all of each functional block may be realized by the hardware. For example, processor 1001 may be implemented using at least one of these hardware.

本開示において説明された各態様／実施形態の処理手順、シーケンス、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本開示において説明された方法については、例示的な順序を用いて様々なステップの要素を提示しており、提示された特定の順序に限定されない。 The processing procedures, sequences, flowcharts, and the like of the embodiments / embodiments described in the present disclosure may be rearranged in order as long as there is no contradiction. For example, the methods described in the present disclosure present elements of various steps using exemplary order, and are not limited to the particular order presented.

入出力された情報等は特定の場所（例えば、メモリ）に保存されてもよいし、管理テーブルを用いて管理されてもよい。入出力される情報等は、上書き、更新、又は追記され得る。出力された情報等は削除されてもよい。入力された情報等は他の装置へ送信されてもよい。 The input / output information and the like may be stored in a specific place (for example, a memory), or may be managed using a management table. Information to be input / output may be overwritten, updated, or added. The output information and the like may be deleted. The input information or the like may be transmitted to another device.

判定は、１ビットで表される値（０か１か）によって行われてもよいし、真偽値（Boolean：true又はfalse）によって行われてもよいし、数値の比較（例えば、所定の値との比較）によって行われてもよい。 The determination may be made by a value represented by 1 bit (0 or 1), by a true / false value (Boolean: true or false), or by comparing numerical values (for example, a predetermined value). It may be done by comparison with the value).

本開示において説明された各態様／実施形態は単独で用いられてもよいし、組み合わせて用いられてもよいし、実行に伴って切り替えて用いられてもよい。また、所定の情報の通知（例えば、「Ｘであること」の通知）は、明示的な通知に限られず、暗黙的に（例えば、当該所定の情報の通知を行わないことによって）行われてもよい。 The embodiments / embodiments described in the present disclosure may be used alone, in combination, or switched with each other in practice. Further, the notification of predetermined information (for example, the notification of "being X") is not limited to the explicit notification, but is implicitly (for example, by not notifying the predetermined information). May be good.

以上、本開示について詳細に説明したが、当業者にとっては、本開示が本開示中に説明された実施形態に限定されないということは明らかである。本開示は、請求の範囲の記載により定まる本開示の趣旨及び範囲を逸脱することなく修正及び変更態様として実施されることができる。したがって、本開示の記載は、例示説明を目的とし、本開示に対して何ら制限的な意味を有しない。 Although the present disclosure has been described in detail above, it is clear to those skilled in the art that the present disclosure is not limited to the embodiments described in the present disclosure. The present disclosure may be implemented as amendments and modifications without departing from the spirit and scope of the present disclosure as defined by the claims. Therefore, the description of the present disclosure is for purposes of illustration and has no limiting meaning to the present disclosure.

ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、他の名称で呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。 Software, whether called software, firmware, middleware, microcode, hardware description language, or other names, is an instruction, instruction set, code, code segment, program code, program, subprogram, software module. , Applications, software applications, software packages, routines, subroutines, objects, executable files, execution threads, procedures, features, etc. should be broadly interpreted.

ソフトウェア、命令、及び情報などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、有線技術（同軸ケーブル、光ファイバケーブル、ツイストペア、デジタル加入者回線（ＤＳＬ：Digital Subscriber Line）など）及び無線技術（赤外線、マイクロ波など）の少なくとも一方を使用してウェブサイト、サーバ、又は他のリモートソースから送信される場合、これらの有線技術及び無線技術の少なくとも一方は、伝送媒体の定義内に含まれる。 Software, instructions, information and the like may be transmitted and received via a transmission medium. For example, the software may use at least one of wired technology (coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), etc.) and wireless technology (infrared, microwave, etc.) to create a website. When transmitted from a server or other remote source, at least one of these wired and wireless technologies is included within the definition of transmission medium.

本開示において説明された情報、及び信号などは、様々な異なる技術のいずれかを使用して表されてもよい。例えば、上記の説明全体に渡って言及され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、及びチップなどは、電圧、電流、電磁波、磁界若しくは磁性粒子、光場若しくは光子、又はこれらの任意の組み合わせによって表されてもよい。 The information, signals, etc. described in this disclosure may be represented using any of a variety of different techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referred to throughout the above description are voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or theirs. It may be represented by any combination.

なお、本開示において説明された用語及び本開示の理解に必要な用語については、同一の又は類似する意味を有する用語と置き換えられてもよい。 The terms described in the present disclosure and the terms necessary for understanding the present disclosure may be replaced with terms having the same or similar meanings.

本開示において使用される「システム」及び「ネットワーク」という用語は、互換的に使用される。 The terms "system" and "network" used in this disclosure are used interchangeably.

本開示において説明された情報、及びパラメータなどは、絶対値を用いて表されてもよいし、所定の値からの相対値を用いて表されてもよいし、対応する別の情報を用いて表されてもよい。 The information, parameters, etc. described in the present disclosure may be expressed using absolute values, relative values from a predetermined value, or other corresponding information. It may be represented.

本開示で使用される「判断（determining）」、及び「決定（determining）」という用語は、多種多様な動作を包含する場合がある。「判断」、「決定」は、例えば、判定（judging）、計算（calculating）、算出（computing）、処理（processing）、導出（deriving）、調査（investigating）、探索（looking up、search、inquiry）（例えば、テーブル、データベース又は別のデータ構造での探索）、確認（ascertaining）した事を「判断」「決定」したとみなす事などを含み得る。「判断」、「決定」は、受信（receiving）（例えば、情報を受信すること）、送信（transmitting）（例えば、情報を送信すること）、入力（input）、出力（output）、アクセス（accessing）（例えば、メモリ中のデータにアクセスすること）した事を「判断」「決定」したとみなす事などを含み得る。「判断」、「決定」は、解決（resolving）、選択（selecting）、選定（choosing）、確立（establishing）、比較（comparing）などした事を「判断」「決定」したとみなす事を含み得る。つまり、「判断」「決定」は、何らかの動作を「判断」「決定」したとみなす事を含み得る。「判断（決定）」は、「想定する（assuming）」、「期待する（expecting）」、又は「みなす（considering）」などで読み替えられてもよい。 The terms "determining" and "determining" as used in this disclosure may include a wide variety of actions. "Judgment" and "decision" are, for example, judgment (judging), calculation (calculating), calculation (computing), processing (processing), derivation (deriving), investigation (investigating), search (looking up, search, inquiry). It may include (eg, searching in a table, database or another data structure), ascertaining as "judgment" or "decision". "Judgment" and "decision" are receiving (for example, receiving information), transmitting (for example, transmitting information), input (input), output (output), and accessing (accessing). ) (For example, accessing the data in the memory) may be regarded as "judgment" or "decision". "Judgment" and "decision" may include resolving, selecting, choosing, establishing, comparing, etc. as being regarded as "judgment" and "decision". .. That is, "judgment" and "decision" may include considering some action as "judgment" and "decision". "Judgment (decision)" may be read as "assuming", "expecting", "considering", or the like.

「接続された（connected）」、「結合された（coupled）」という用語、又はこれらのあらゆる変形は、２又はそれ以上の要素間の直接的又は間接的なあらゆる接続又は結合を意味し、互いに「接続」又は「結合」された２つの要素間に１又はそれ以上の中間要素が存在することを含むことができる。要素間の結合又は接続は、物理的に行われても、論理的に行われても、或いはこれらの組み合わせで実現されてもよい。例えば、「接続」は「アクセス」で読み替えられてもよい。本開示で使用される場合、２つの要素は、１又はそれ以上の電線、ケーブル及びプリント電気接続の少なくとも一つを用いて、並びにいくつかの非限定的かつ非包括的な例として、無線周波数領域、マイクロ波領域及び光（可視及び不可視の両方）領域の波長を有する電磁エネルギーなどを用いて、互いに「接続」又は「結合」されると考えることができる。 The terms "connected", "coupled", or any variation thereof, mean any direct or indirect connection or connection between two or more elements and each other. It can include the presence of one or more intermediate elements between two "connected" or "combined" elements. The combination or connection between the elements may be physically performed, logically performed, or a combination thereof. For example, "connection" may be read as "access". As used in the present disclosure, the two elements use at least one of one or more wires, cables and printed electrical connections, and, as some non-limiting and non-comprehensive examples, radio frequencies. It can be considered to be "connected" or "coupled" to each other using electromagnetic energy having wavelengths in the region, microwave region and light (both visible and invisible) regions.

本開示において使用される「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 The phrase "based on" as used in this disclosure does not mean "based on" unless otherwise stated. In other words, the statement "based on" means both "based only" and "at least based on".

本開示において使用される「第１の」、及び「第２の」などの呼称を使用した要素へのいかなる参照も、それらの要素の量又は順序を全般的に限定しない。これらの呼称は、２つ以上の要素間を区別する便利な方法として本開示において使用され得る。したがって、第１及び第２の要素への参照は、２つの要素のみが採用され得ること、又は何らかの形で第１の要素が第２の要素に先行しなければならないことを意味しない。 Any reference to elements using designations such as "first" and "second" as used in the present disclosure does not generally limit the quantity or order of those elements. These designations can be used in the present disclosure as a convenient way to distinguish between two or more elements. Therefore, references to the first and second elements do not mean that only two elements can be adopted, or that the first element must somehow precede the second element.

上記の各装置の構成における「部」は、「回路」、又は「デバイス」等に置き換えられてもよい。 The "part" in the configuration of each of the above devices may be replaced with a "circuit", a "device" or the like.

本開示において、「含む（include）」、「含んでいる（including）」及びそれらの変形が使用されている場合、これらの用語は、用語「備える（comprising）」と同様に、包括的であることが意図される。さらに、本開示において使用されている用語「又は（or）」は、排他的論理和ではないことが意図される。 When "include", "including" and variations thereof are used in the present disclosure, these terms are as inclusive as the term "comprising". Is intended. Moreover, the term "or" used in the present disclosure is intended not to be an exclusive OR.

本開示において、例えば、英語での「a」、「an」、及び「the」のように、翻訳により冠詞が追加された場合、本開示は、これらの冠詞の後に続く名詞が複数形であることを含んでもよい。 In the present disclosure, if articles are added by translation, for example, "a", "an", and "the" in English, the present disclosure is a plural form of the noun following these articles. May include that.

本開示において、「ＡとＢが異なる」という用語は、「ＡとＢが互いに異なる」ことを意味してもよい。なお、当該用語は、「ＡとＢがそれぞれＣと異なる」ことを意味してもよい。「離れる」、及び「結合される」などの用語も、「異なる」と同様に解釈されてもよい。 In the present disclosure, the term "A and B are different" may mean "A and B are different from each other". The term may mean that "A and B are different from C". Terms such as "separate" and "combined" may be interpreted in the same way as "different".

１，１Ａ…文章作成装置、２ａ，２ｂ，２ｃ，２ｄ，２ｅ，２ｆ，２ｇ，２ｈ，２ｉ，３ａ，３ｂ，３ｃ，３ｄ，３ｅ，３ｆ，３ｇ，３ｈ，３ｉ，３ｊ，３ｋ…形態素、２ｐ…インデックス、５ａ，５ｂ，５ｃ，５ｄ…文節、８ａ，８ｂ，８ｃ，８ｄ…特徴量、１０…取得部、２０…解析部、３０…検出部、４０…対応付け部、５０…導出部、６０…作成部、７０…重み付け部、８２…第１対訳コーパス、８４…第２対訳コーパス、１００１…プロセッサ、１００２…メモリ、１００３…ストレージ、１００４…通信装置、１００５…入力装置、１００６…出力装置、１００７…バス、Ｔ１１…第１文章、Ｔ２１…第２文章、Ｔ３１…第３文章。 1,1A ... Sentence writing device, 2a, 2b, 2c, 2d, 2e, 2f, 2g, 2h, 2i, 3a, 3b, 3c, 3d, 3e, 3f, 3g, 3h, 3i, 3j, 3k ... Morphological elements, 2p ... Index, 5a, 5b, 5c, 5d ... Phrase, 8a, 8b, 8c, 8d ... Feature quantity, 10 ... Acquisition unit, 20 ... Analysis unit, 30 ... Detection unit, 40 ... Correspondence unit, 50 ... Derivation unit , 60 ... Creation unit, 70 ... Weighting unit, 82 ... First translation corpus, 84 ... Second translation corpus, 1001 ... Processor, 1002 ... Memory, 1003 ... Storage, 1004 ... Communication device, 1005 ... Input device, 1006 ... Output Device, 1007 ... Bus, T11 ... 1st sentence, T21 ... 2nd sentence, T31 ... 3rd sentence.

Claims

An acquisition unit that acquires the first sentence of the first language and the second sentence of the second language, which is a translation of the first sentence, and
An analysis unit that morphologically analyzes each of the first sentence and the second sentence,
A detection unit that detects all clauses having morphemes in the second sentence, and
A matching unit that associates the morpheme of the first sentence with the morpheme of the second sentence based on the meaning of each morpheme.
A derivation unit for deriving a feature amount in each clause of the second sentence based on an index indicating the position of the morpheme of the first sentence in the first sentence associated with the morpheme of the second sentence.
A creation unit that creates a third sentence by arranging the phrases of the second sentence based on the feature amount of each phrase of the second sentence.
A writing device equipped with.

The creating unit arranges each clause of the second sentence so that the feature amount of each clause of the second sentence corresponds to the order of the index of the morpheme arranged from the beginning of the sentence to the end of the sentence. 3. The sentence creation device according to claim 1, which creates sentences.

The derivation unit according to claim 2, wherein the minimum value among the indexes of the morphemes of the first sentence associated with the morphemes in the phrase of the second sentence is the feature amount of the phrase of the second sentence. Writing device.

The derivation unit according to claim 2, wherein the maximum value of the index of the morpheme of the first sentence associated with the morpheme in the phrase of the second sentence is the feature amount of the phrase of the second sentence. Writing device.

Further, a weighting unit for setting a weight for the morpheme of the first sentence is provided.
The derivation unit sets the index of the morpheme of the first sentence having the highest weight among the morphemes of the first sentence associated with the morpheme in each clause of the second sentence in each clause of the second sentence. The text creation device according to claim 2, which is the feature amount.

Further, a weighting unit for setting a weight for a morpheme in the phrase of the second sentence is provided.
The derivation unit sets the index of the morpheme of the first sentence associated with the morpheme having the highest weight among the morphemes in each clause of the second sentence as the feature of each clause of the second sentence. The text-creating device according to claim 2, which is a quantity.

When the feature quantities in the plurality of clauses of the second sentence have the same value, the creating unit arranges the plurality of clauses so as to maintain the order of the clauses arranged from the beginning to the end of the second sentence. The sentence creating device according to any one of claims 1 to 6, wherein the third sentence is created.