JP2016057815A

JP2016057815A - Sentence rewrite processing device, learning device, method, and program

Info

Publication number: JP2016057815A
Application number: JP2014183319A
Authority: JP
Inventors: 千明宮崎; Chiaki Miyazaki; 徹平野; Toru Hirano; 竜一郎東中; Ryuichiro Higashinaka; 俊朗牧野; Toshiaki Makino; 義博松尾; Yoshihiro Matsuo; 理史佐藤; Michifumi Sato
Original assignee: Nagoya University NUC; Nippon Telegraph and Telephone Corp
Current assignee: Nagoya University NUC; Nippon Telegraph and Telephone Corp
Priority date: 2014-09-09
Filing date: 2014-09-09
Publication date: 2016-04-21

Abstract

PROBLEM TO BE SOLVED: To impart consistently character-likeness as a whole sentence.SOLUTION: An operation on morphemes is classified for each morpheme included in an input sentence by a rewrite place estimation unit 226. Determination is made for each of morphemes which are classified into replacement or deletion and which are auxiliary verbs by an auxiliary verb replacement unit 230 as to whether or not to delete the morpheme, the morpheme determined to be deleted is deleted from the input sentence, and for those classified into replacement, an auxiliary verb is inserted into each of positions in the input sentence from which the morphemes have been deleted. A sentence-ending particle is inserted in to each of positions in the input sentence from which the morphemes have been deleted by a sentence-ending particle insertion unit 234. For each of morphemes which are classified into deletion and which are postpositional particles other than the sentence-ending particle, the morphemes are deleted from the input sentence by a postpositional particle replacement unit 236, and for each of morphemes which are classified into replacement and which are postpositional particles other than the sentence-ending particle, the morphemes in the input sentence are replaced with predetermined postpositional particles.SELECTED DRAWING: Figure 5

Description

本発明は、文書き換え処理装置、学習装置、方法、及びプログラムに係り、特に、特定のキャラクタらしさを持つ文に書き換える文書き換え処理装置、学習装置、方法、及びプログラムに関する。 The present invention relates to a sentence rewriting processing apparatus, a learning apparatus, a method, and a program, and more particularly, to a sentence rewriting processing apparatus, a learning apparatus, a method, and a program for rewriting a sentence having a certain character character.

従来技術としては、文末の機能語列（文末表現）のみを対象とした書き換えによるキャラクタ付けが挙げられる。従来技術においては、著者の属性が付与されたテキストデータを用いて著者の属性値ごとに偏って多く使われる文末表現を抽出し、発話のキャラクタ付けに利用している（非特許文献１）。 As a conventional technique, there is a character addition by rewriting only for a function word sequence (end of sentence expression) at the end of a sentence. In the prior art, sentence end expressions that are used in a biased manner are extracted for each attribute value of the author using text data to which the attribute of the author is assigned and used for characterizing the utterance (Non-patent Document 1).

宮崎千明，平野徹，東中竜一郎，牧野俊朗，松尾義博，「発話にキャラクタ性を与えるための文末表現の変換」，人工知能学会研究会資料(SIG-SLUD-68)，pp．41-46，2013．Chiaki Miyazaki, Toru Hirano, Ryuichiro Higashinaka, Toshiro Makino, Yoshihiro Matsuo, “Conversion of sentence ending expression to give utterance character”, Artificial Intelligence Society meeting (SIG-SLUD-68), pp. 41-46, 2013.

しかし、非特許文献１の手法においては、文末表現のみを書き換えの対象としているため、文中に出現する言語表現と文末で使用される言語表現とのミスマッチが起こり得る。例えば、書き換えた文が「今日は雨ですけど気にしないよ」である場合、文中の「です」と文末の「よ」とで使用される言語表現がミスマッチしている。 However, in the method of Non-Patent Document 1, since only the sentence end expression is a target for rewriting, there may be a mismatch between the language expression appearing in the sentence and the language expression used at the sentence end. For example, if the rewritten sentence is “It's raining today but I don't care”, there is a mismatch in the linguistic expressions used in the sentence “I” and “Yo” at the end of the sentence.

本発明は、上記問題点を解決するために成されたものであり、文全体として一貫したキャラクタらしさを持つ文に書き換えることができる文書き換え処理装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a sentence rewriting processing apparatus, method, and program that can be rewritten into a sentence having a consistent character character as a whole sentence. To do.

また、本発明は、上記問題点を解決するために成されたものであり、文全体として一貫したキャラクタらしさを持つ文に付与するためのパラメータ又はモデルを学習することができる学習装置、方法、及びプログラムを提供することを目的とする。 Further, the present invention is made to solve the above problems, and a learning apparatus, method, and method capable of learning a parameter or a model to be given to a sentence having a consistent character character as a whole sentence, And to provide a program.

上記目的を達成するために、第１の発明に係る文書き換え処理装置は、形態素解析済みの入力文を、特定の人物像を表すキャラクタらしさを持つ文に書き換える文書き換え処理装置であって、前記形態素解析済みの入力文に含まれる各形態素について、前記形態素の特徴量と、予め学習された、前記キャラクタらしさを付与するための前記形態素に対する操作を、置換、削除、又は操作無しに分類するための書き換え箇所推定モデルと、に基づいて、前記形態素に対する操作を、置換、削除、又は操作無しに分類する書き換え箇所推定部と、前記書き換え箇所推定部により置換又は削除に分類され、かつ、助動詞である形態素の各々について、前記形態素の表記と、予め学習された、前記キャラクタらしさを付与するために前記形態素を書き換える規則を表す書き換えパラメータと、に基づいて、前記形態素を削除するか置換するかまたは操作しないかを判定し、前記形態素の表記と、前記書き換えパラメータとに基づいて、削除すると判定された前記形態素の削除、もしくは、置換すると判定された前記形態素と予め定められた形態素又は形態素列との置換を行う助動詞置換部と、前記助動詞置換部により前記形態素が削除又は置換された前記入力文の位置の各々について、前記位置で削除又は置換された形態素の表記と、前記書き換えパラメータとに基づいて、前記入力文の前記位置に続いて、予め定められた終助詞を挿入する終助詞挿入部と、前記書き換え箇所推定部により削除に分類され、かつ、終助詞以外の助詞である形態素の各々について、前記書き換えパラメータに基づいて、前記入力文から前記形態素を削除し、前記書き換え箇所推定部により置換に分類され、かつ、終助詞以外の助詞である形態素の各々について、前記書き換えパラメータに基づいて、前記入力文の前記形態素を予め定められた助詞に置き換える助詞置換部と、を含んで構成されている。 In order to achieve the above object, a sentence rewriting processing device according to a first invention is a sentence rewriting processing device that rewrites an input sentence that has been subjected to morphological analysis into a sentence having character-like character representing a specific person image, To classify each morpheme included in the morpheme-analyzed input sentence as a replacement, deletion, or no operation for the morpheme feature value and the previously learned operation on the morpheme for imparting character character. Based on the rewrite location estimation model, the operation for the morpheme is classified as replacement, deletion, or no operation, a rewrite location estimation unit, and the rewrite location estimation unit is classified as replacement or deletion, and an auxiliary verb For each morpheme, write the morpheme to give the morpheme notation and pre-learned character character The morpheme that is determined to be deleted based on the morpheme notation and the rewrite parameter. An auxiliary verb replacement unit that replaces the morpheme determined to be deleted or replaced with a predetermined morpheme or morpheme string, and the position of the input sentence where the morpheme is deleted or replaced by the auxiliary verb replacement unit For each, a final particle insertion unit that inserts a predetermined final particle following the position of the input sentence based on the notation of the morpheme deleted or replaced at the position and the rewriting parameter, and For each morpheme that is classified as deleted by the rewrite location estimator and is a particle other than the final particle, the rewrite parameter Therefore, the morpheme is deleted from the input sentence, classified into replacement by the rewrite location estimation unit, and for each morpheme that is a particle other than a final particle, based on the rewrite parameter, the input sentence And a particle replacement unit that replaces the morpheme with a predetermined particle.

第２の発明に係る文書き換え処理方法は、書き換え箇所推定部と、助動詞置換部と、終助詞挿入部と、助詞置換部と、を含む、形態素解析済みの入力文を、特定の人物像を表すキャラクタらしさを持つ文に書き換える文書き換え処理装置における、文書き換え処理方法であって、前記書き換え箇所推定部は、前記形態素解析済みの入力文に含まれる各形態素について、前記形態素の特徴量と、予め学習された、前記キャラクタらしさを付与するための前記形態素に対する操作を、置換、削除、又は操作無しに分類するための書き換え箇所推定モデルと、に基づいて、前記形態素に対する操作を、置換、削除、又は操作無しに分類し、前記助動詞置換部は、前記書き換え箇所推定部により置換又は削除に分類され、かつ、助動詞である形態素の各々について、前記形態素の表記と、予め学習された、前記キャラクタらしさを付与するために前記形態素を書き換える規則を表す書き換えパラメータと、に基づいて、前記形態素を削除するか置換するかまたは操作しないかを判定し、前記形態素の表記と、前記書き換えパラメータとに基づいて、削除すると判定された前記形態素の削除、もしくは、置換すると判定された前記形態素と予め定められた形態素又は形態素列との置換を行い、前記終助詞挿入部は、前記助動詞置換部により前記形態素が削除又は置換された前記入力文の位置の各々について、前記位置で削除又は置換された形態素の表記と、前記書き換えパラメータとに基づいて、前記入力文の前記位置に続いて、予め定められた終助詞を挿入し、前記助詞置換部は、前記書き換え箇所推定部により削除に分類され、かつ、終助詞以外の助詞である形態素の各々について、前記書き換えパラメータに基づいて、前記入力文から前記形態素を削除し、前記書き換え箇所推定部により置換に分類され、かつ、終助詞以外の助詞である形態素の各々について、前記書き換えパラメータに基づいて、前記入力文の前記形態素を予め定められた助詞に置き換える。 The sentence rewrite processing method according to the second invention is a morphological analysis input sentence including a rewrite location estimation unit, an auxiliary verb replacement unit, a final particle insertion unit, and a particle replacement unit. A sentence rewriting processing method in a sentence rewriting processing apparatus that rewrites a sentence having character-likeness, wherein the rewriting location estimation unit includes, for each morpheme included in the input sentence after morpheme analysis, a feature quantity of the morpheme, Replacing and deleting the operation for the morpheme based on the rewritten location estimation model for classifying the operation for the morpheme for imparting character likeness as replacement, deletion, or no operation in advance. Or the auxiliary verb replacement unit is classified as replacement or deletion by the rewrite location estimation unit, and the morpheme is an auxiliary verb. Whether the morpheme is deleted, replaced, or not manipulated based on the morpheme notation and a rewriting parameter that is learned in advance and represents a rule for rewriting the morpheme to give the character character And deleting the morpheme determined to be deleted or replacing the morpheme determined to be deleted with a predetermined morpheme or morpheme sequence based on the morpheme notation and the rewriting parameter. The final particle insertion unit is based on the notation of the morpheme deleted or replaced at the position and the rewriting parameter for each position of the input sentence from which the morpheme has been deleted or replaced by the auxiliary verb replacement unit. Then, after the position of the input sentence, a predetermined final particle is inserted, and the particle replacement unit For each morpheme that is a particle other than a final particle, the morpheme is deleted from the input sentence based on the rewrite parameter, and is classified as a replacement by the rewrite location estimator. For each morpheme that is a particle other than a final particle, the morpheme of the input sentence is replaced with a predetermined particle based on the rewriting parameter.

第１及び第２の発明によれば、書き換え箇所推定部により、形態素解析済みの入力文に含まれる各形態素について、形態素の特徴量と、予め学習された、キャラクタらしさを付与するための形態素に対する操作を、置換、削除、又は操作無しに分類するための書き換え箇所推定モデルと、に基づいて、形態素に対する操作を、置換、削除、又は操作無しに分類し、助動詞置換部により、書き換え箇所推定部により置換又は削除に分類され、かつ、助動詞である形態素の各々について、形態素の表記と、予め学習された、キャラクタらしさを付与するために形態素を書き換える規則を表す書き換えパラメータと、に基づいて、前記形態素の削除、もしくは、予め定められた形態素又は形態素列への置換を行い、終助詞挿入部は、助動詞置換部により形態素が削除又は置換された入力文の位置の各々について、当該位置で削除又は置換された形態素の表記と、書き換えパラメータとに基づいて、入力文の当該位置に続いて、予め定められた終助詞を挿入し、助詞置換部は、書き換え箇所推定部により削除に分類され、かつ、終助詞以外の助詞である形態素の各々について、書き換えパラメータに基づいて、入力文から形態素を削除し、書き換え箇所推定部により置換に分類され、かつ、終助詞以外の助詞である形態素の各々について、書き換えパラメータに基づいて、入力文の形態素を予め定められた助詞に置き換える。 According to the first and second aspects of the present invention, the rewrite location estimation unit applies a morpheme feature amount and a pre-learned morpheme for imparting character likeness to each morpheme included in the input sentence that has been subjected to morpheme analysis. Based on the rewrite location estimation model for classifying the operation as replacement, deletion, or no operation, the operation for the morpheme is classified as replacement, deletion, or no operation, and the rewrite location estimation unit by the auxiliary verb replacement unit For each of the morphemes that are classified as replacements or deletions, and that are auxiliary verbs, based on the morpheme notation and the rewriting parameters representing the rules for rewriting the morpheme in order to impart character likeness, learned in advance, The morpheme is deleted or replaced with a predetermined morpheme or morpheme string, and the final particle insertion unit is replaced with the auxiliary verb replacement unit. For each position of the input sentence from which the morpheme has been deleted or replaced, a predetermined end following the position of the input sentence based on the notation of the morpheme deleted or replaced at that position and the rewriting parameters. A particle is inserted, and the particle replacement unit is classified as deleted by the rewrite location estimation unit, and for each morpheme that is a particle other than the final particle, the morpheme is deleted from the input sentence based on the rewrite parameter, and the rewrite location For each morpheme that is classified as a replacement by the estimation unit and is a particle other than the final particle, the morpheme of the input sentence is replaced with a predetermined particle based on the rewrite parameter.

このように、第１及び第２の発明によれば、書き換え箇所推定モデルに基づいて、形態素解析済みの入力文に含まれる各形態素について、形態素に対する操作を、置換、削除、又は操作無しに分類し、この分類結果と書き換えパラメータとに従って、品詞が助動詞となる形態素を削除、もしくは、予め定められた形態素又は形態素列へ置換し、終助詞となる形態素を挿入し、品詞が終助詞以外の助詞となる形態素を削除、もしくは、別の助詞に置き換えることにより、文全体として一貫したキャラクタらしさを持つ文に書き換えることができる。 As described above, according to the first and second inventions, for each morpheme included in the morpheme-analyzed input sentence, the operations on the morpheme are classified as replacement, deletion, or no operation based on the rewritten location estimation model. Then, according to the classification result and the rewriting parameter, the morpheme whose part of speech is an auxiliary verb is deleted or replaced with a predetermined morpheme or morpheme sequence, and the morpheme which becomes a final particle is inserted. By deleting the morpheme that becomes or replacing it with another particle, the entire sentence can be rewritten into a sentence having a consistent character character.

第３の発明に係る学習装置は、特定の人物像を表すキャラクタらしさを文に付与するために文の形態素を書き換える規則を表す書き換えパラメータを学習する学習装置であって、前記キャラクタらしさを有していないキャラ無し文と前記キャラクタらしさを持つキャラ有り文とのペアの各々について、前記ペアにおける前記キャラ無し文に含まれる文字の各々と、前記ペアにおける前記キャラ有り文に含まれる文字の各々との対応関係を取得する置換文字列アライメント部と、前記置換文字列アライメント部により前記ペアの各々について取得した前記対応関係に基づいて取得した、特定の助動詞を削除するか否かを決定するための第１の書き換えパラメータ、特定のキャラクタらしさを持つ文と持たない文とのペアで構成されるコーパスから取得した、特定の助動詞を置換するか否かを決定するための第２の書き換えパラメータ、特定のキャラクタらしさを持つ文と持たない文とのペアで構成されるコーパスから取得した、使用可能な終助詞を決定するための第３の書き換えパラメータ、特定のキャラクタらしさを持つ文と持たない文とのペアで構成されるコーパスから取得した、特定の助詞を使用するか否かを決定するための第４の書き換えパラメータ、及び前記置換文字列アライメント部により前記ペアの各々について取得した前記対応関係に基づいて取得した、助詞を省略するか否かを決定するための第５の書き換えパラメータのうち少なくとも１つの書き換えパラメータを取得する書き換えパラメータ取得部と、を含んで構成されている。 A learning device according to a third aspect of the present invention is a learning device that learns a rewriting parameter that represents a rule for rewriting a morpheme of a sentence in order to give a character a character representing a specific person image, and has the character character Each of the characters included in the character-free sentence in the pair, and each of the characters included in the character-present sentence in the pair. A replacement character string alignment unit that obtains the correspondence relationship between and a decision to determine whether or not to delete a specific auxiliary verb acquired based on the correspondence relationship obtained for each of the pairs by the replacement character string alignment unit Is the corpus composed of pairs of the first rewrite parameter, a sentence with a certain character character and a sentence without it The obtained second rewriting parameter for determining whether or not to replace a specific auxiliary verb, a usable end result acquired from a corpus composed of a pair of a sentence having a character-like character and a sentence having no specific character A third rewriting parameter for determining a particle, a second rewriting parameter for determining whether or not to use a specific particle acquired from a corpus composed of a pair of a sentence having a character-like character and a sentence having no character At least one of four rewriting parameters and a fifth rewriting parameter for determining whether or not to omit a particle acquired based on the correspondence acquired for each of the pairs by the replacement character string alignment unit. And a rewrite parameter acquisition unit that acquires one rewrite parameter.

第４の発明に係る学習方法は、置換文字列アライメント部と、書き換えパラメータ取得部と、を含む、特定の人物像を表すキャラクタらしさを文に付与するために文の形態素を書き換える規則を表す書き換えパラメータを学習する学習装置における、学習方法であって、前記置換文字列アライメント部は、前記キャラクタらしさを有していないキャラ無し文と前記キャラクタらしさを持つキャラ有り文とのペアの各々について、前記ペアにおける前記キャラ無し文に含まれる文字の各々と、前記ペアにおける前記キャラ有り文に含まれる文字の各々との対応関係を取得し、前記書き換えパラメータ取得部は、前記置換文字列アライメント部により前記ペアの各々について取得した前記対応関係に基づいて取得した、特定の助動詞を削除するか否かを決定するための第１の書き換えパラメータ、特定のキャラクタらしさを持つ文と持たない文とのペアで構成されるコーパスから取得した、特定の助動詞を置換するか否かを決定するための第２の書き換えパラメータ、特定のキャラクタらしさを持つ文と持たない文とのペアで構成されるコーパスから取得した、使用可能な終助詞を決定するための第３の書き換えパラメータ、特定のキャラクタらしさを持つ文と持たない文とのペアで構成されるコーパスから取得した、特定の助詞を使用するか否かを決定するための第４の書き換えパラメータ、及び前記置換文字列アライメント部により前記ペアの各々について取得した前記対応関係に基づいて取得した、助詞を省略するか否かを決定するための第５の書き換えパラメータのうち少なくとも１つの書き換えパラメータを取得する。 A learning method according to a fourth aspect of the present invention includes a replacement character string alignment unit and a rewrite parameter acquisition unit, and a rewrite representing a rule for rewriting a morpheme of a sentence in order to impart character likeness to a sentence to a sentence A learning method in a learning device for learning a parameter, wherein the replacement character string alignment unit is configured for each of a pair of a character-less sentence not having character character and a character-present sentence having character character. A correspondence relationship between each of the characters included in the characterless sentence in the pair and each of the characters included in the character presence sentence in the pair is acquired, and the rewrite parameter acquiring unit is configured to perform the replacement by the replacement character string alignment unit. Whether to delete a specific auxiliary verb acquired based on the correspondence acquired for each pair A first rewriting parameter for determining whether or not to replace a specific auxiliary verb obtained from a corpus composed of a pair of a sentence having a character-like character and a sentence having no character 2 rewriting parameters, a third rewriting parameter for determining a usable final particle obtained from a corpus composed of a pair of a sentence having a character-like character and a sentence having no character-character, having a character-like character A fourth rewrite parameter for determining whether or not to use a specific particle, obtained from a corpus composed of a sentence and a sentence pair, and each of the pairs by the replacement character string alignment unit Less of the fifth rewriting parameters acquired based on the acquired correspondence relationship and used to determine whether to omit the particle Also acquires one rewrite parameters.

第３及び第４の発明によれば、置換文字列アライメント部により、キャラクタらしさを有していないキャラ無し文とキャラクタらしさを持つキャラ有り文とのペアの各々について、ペアにおけるキャラ無し文に含まれる文字の各々と、ペアにおけるキャラ有り文に含まれる文字の各々との対応関係を取得し、書き換えパラメータ取得部により、ペアの各々について取得した対応関係に基づいて取得した、特定の助動詞を削除するか否かを決定するための第１の書き換えパラメータ、特定のキャラクタらしさを持つ文と持たない文とのペアで構成されるコーパスから取得した、特定の助動詞を置換するか否かを決定するための第２の書き換えパラメータ、特定のキャラクタらしさを持つ文と持たない文とのペアで構成されるコーパスから取得した、使用可能な終助詞を決定するための第３の書き換えパラメータ、特定のキャラクタらしさを持つ文と持たない文とのペアで構成されるコーパスから取得した、特定の助詞を使用するか否かを決定するための第４の書き換えパラメータ、及びペアの各々について取得した対応関係に基づいて取得した、助詞を省略するか否かを決定するための第５の書き換えパラメータのうち少なくとも１つの書き換えパラメータを取得する。 According to the third and fourth inventions, the replacement character string alignment unit includes, for each pair of a character-less sentence having no character character and a character-present sentence having character character, included in the character-less sentence in the pair. The correspondence between each of the characters to be included and each of the characters included in the sentence with characters in the pair is acquired, and the specific auxiliary verb acquired based on the correspondence acquired for each of the pairs is deleted by the rewrite parameter acquisition unit A first rewrite parameter for determining whether to replace or not, to determine whether or not to replace a specific auxiliary verb acquired from a corpus composed of a pair of a sentence having a character-like character and a sentence having no character The second rewrite parameter for, obtained from a corpus consisting of a pair of a sentence with a certain character character and a sentence without A third rewriting parameter for determining a usable final particle, whether or not to use a specific particle obtained from a corpus composed of a pair of a sentence having a character-like character and a sentence having no character At least one rewriting parameter among the fourth rewriting parameter for determining and the fifth rewriting parameter for determining whether or not to omit the particle acquired based on the correspondence acquired for each of the pairs. get.

このように、第３及び第４の発明によれば、キャラクタらしさを表現していないキャラ無し文とキャラクタらしさを表現しているキャラ有り文とのペアの各々について、キャラ無し文に含まれる文字の各々と、キャラ有り文に含まれる文字の各々との対応関係を取得し、第１の書き換えパラメータ、第２の書き換えパラメータ、第３の書き換えパラメータ、第４の書き換えパラメータ、及び第５の書き換えパラメータのうち少なくとも１つの書き換えパラメータを取得することにより、文全体として一貫したキャラクタらしさを文に付与するためのパラメータを決定することができる。 Thus, according to the third and fourth inventions, the characters included in the characterless sentence for each pair of the characterless sentence that does not represent character character and the character present sentence that represents character character. And a correspondence relationship between each of the characters included in the sentence with character and the first rewriting parameter, the second rewriting parameter, the third rewriting parameter, the fourth rewriting parameter, and the fifth rewriting. By acquiring at least one rewrite parameter among the parameters, it is possible to determine a parameter for giving the sentence a consistent character character as the whole sentence.

第５の発明に係る学習装置は、特定の人物像を表すキャラクタらしさを文に付与するための文の形態素に対する操作を、置換、削除、又は操作無しに分類するための書き換え箇所推定モデルを学習する学習装置であって、前記キャラクタらしさを有していないキャラ無し文と前記キャラクタらしさを持つキャラ有り文とのペアの各々について、前記ペアにおける前記キャラ無し文に含まれる文字の各々と、前記ペアにおける前記キャラ有り文に含まれる文字の各々との対応関係を取得する置換文字列アライメント部と、前記ペアの各々について、前記置換文字列アライメント部により取得した対応関係に基づいて、前記ペアにおける前記キャラ無し文に含まれる形態素の各々について、前記ペアにおける前記キャラ有り文に書き換えられたときの前記形態素に対する操作として、置換、削除、又は操作無しを表すラベルを付与し、前記ラベルが付与された形態素の各々の表記に基づいて、前記書き換え箇所推定モデルを学習する書き換え箇所推定モデル作成部と、を含んで構成されている。 A learning device according to a fifth aspect of the present invention learns a rewrite location estimation model for classifying operations on a sentence morpheme for giving a character like a specific person image to a sentence as replacement, deletion, or no operation. Each of the characters included in the character-free sentence in the pair, for each of a pair of a character-less sentence not having character character and a character-present sentence having character character, A replacement character string alignment unit that acquires a correspondence relationship with each of the characters included in the character presence sentence in the pair, and for each of the pairs, based on the correspondence relationship acquired by the replacement character string alignment unit, in the pair For each of the morphemes contained in the character-less sentence, it has been rewritten to the character-present sentence in the pair A rewrite location estimation model creation unit that learns the rewrite location estimation model based on each notation of the morpheme to which the label is assigned, as a manipulation for the morpheme And.

第６の発明に係る学習方法は、置換文字列アライメント部と、書き換え箇所推定モデル作成部と、を含む、特定の人物像を表すキャラクタらしさを文に付与するための文の形態素に対する操作を、置換、削除、又は操作無しに分類するための書き換え箇所推定モデルを学習する学習装置における学習方法であって、前記置換文字列アライメント部は、前記キャラクタらしさを有していないキャラ無し文と前記キャラクタらしさを持つキャラ有り文とのペアの各々について、前記ペアにおける前記キャラ無し文に含まれる文字の各々と、前記ペアにおける前記キャラ有り文に含まれる文字の各々との対応関係を取得し、前記書き換え箇所推定モデル作成部は、前記ペアの各々について、前記置換文字列アライメント部により取得した対応関係に基づいて、前記ペアにおける前記キャラ無し文に含まれる形態素の各々について、前記ペアにおける前記キャラ有り文に書き換えられたときの前記形態素に対する操作として、置換、削除、又は操作無しを表すラベルを付与し、前記ラベルが付与された形態素の各々の表記に基づいて、前記書き換え箇所推定モデルを学習する。 A learning method according to a sixth aspect of the invention includes an operation on a sentence morpheme for adding a character character representing a specific person image to a sentence, including a replacement character string alignment unit and a rewritten location estimation model creation unit. A learning method in a learning device that learns a rewrite location estimation model for classifying as replacement, deletion, or no operation, wherein the replacement string alignment unit includes the character-less sentence and the character that do not have the character character For each pair of characters with a character that has a character, obtain a correspondence relationship between each of the characters included in the character-less sentence in the pair and each of the characters included in the character-present sentence in the pair, The rewrite location estimation model creation unit creates a correspondence relationship acquired by the replacement character string alignment unit for each of the pairs. Therefore, for each morpheme included in the character-free sentence in the pair, a label indicating replacement, deletion, or no operation is given as an operation on the morpheme when the character-in-the-character sentence in the pair is rewritten. The rewriting location estimation model is learned based on the notation of each morpheme with the label.

第５及び第６の発明によれば、置換文字列アライメント部により、キャラクタらしさを有していないキャラ無し文とキャラクタらしさを持つキャラ有り文とのペアの各々について、ペアにおけるキャラ無し文に含まれる文字の各々と、ペアにおけるキャラ有り文に含まれる文字の各々との対応関係を取得し、書き換え箇所推定モデル作成部により、ペアの各々について、置換文字列アライメント部により取得した対応関係に基づいて、ペアにおけるキャラ無し文に含まれる形態素の各々について、ペアにおけるキャラ有り文に書き換えられたときの形態素に対する操作として、置換、削除、又は操作無しを表すラベルを付与し、ラベルが付与された形態素の各々の表記に基づいて、書き換え箇所推定モデルを学習する。 According to the fifth and sixth inventions, the replacement character string alignment unit includes, for each pair of a character-less sentence having no character character and a character-present sentence having character character, included in the character-less sentence in the pair. The correspondence between each of the characters to be included and each of the characters included in the sentence with characters in the pair, and based on the correspondence obtained by the replacement character string alignment unit for each of the pairs by the rewrite location estimation model creation unit Then, for each morpheme included in the characterless sentence in the pair, a label indicating substitution, deletion, or no operation was given as an operation on the morpheme when it was rewritten to a charactery sentence in the pair, and the label was given Based on the notation of each morpheme, the rewritten location estimation model is learned.

このように、第５及び第６の発明によれば、キャラクタらしさを表現していないキャラ無し文とキャラクタらしさを表現しているキャラ有り文とのペアの各々について、キャラ無し文に含まれる文字の各々と、キャラ有り文に含まれる文字の各々との対応関係を取得し、ペアの各々について取得した対応関係に基づいて、形態素に対する操作を、置換、削除、又は操作無しに分類するための書き換え箇所推定モデルを学習することにより、文全体として一貫したキャラクタらしさを文に付与するためのモデルを学習することができる。 Thus, according to the fifth and sixth inventions, for each pair of a character-less sentence not expressing character character and a character-present sentence expressing character character, the characters included in the character-less sentence For each character of the character and each character included in the sentence with the character, and based on the correspondence obtained for each of the pairs, the operation for the morpheme is classified as replacement, deletion, or no operation By learning the rewrite location estimation model, it is possible to learn a model for imparting character consistency to the sentence as a whole sentence.

また、本発明のプログラムは、コンピュータを、上記の文書き換え処理装置及び学習装置を構成する各部として機能させるためのプログラムである。 Moreover, the program of this invention is a program for functioning a computer as each part which comprises said sentence rewriting processing apparatus and learning apparatus.

以上説明したように、本発明の文書き換え処理装置、方法、及びプログラムによれば、書き換え箇所推定モデルに基づいて、形態素解析済みの入力文に含まれる各形態素について、形態素に対する操作を、置換、削除、又は操作無しに分類し、書き換えパラメータに従って、品詞が助動詞となる形態素を削除、もしくは、予め定められた形態素又は形態素列へ置換し、終助詞となる形態素を挿入し、品詞が終助詞以外の助詞となる形態素を削除、もしくは、特定の助詞に置換することにより、文全体として一貫したキャラクタらしさを持つ文に書き換えることができる。 As described above, according to the sentence rewriting processing device, method, and program of the present invention, based on the rewritten location estimation model, for each morpheme included in the morpheme-analyzed input sentence, the operation on the morpheme is replaced, Delete or classify as no operation, delete morpheme whose part of speech is an auxiliary verb according to rewrite parameters, or replace with a predetermined morpheme or morpheme sequence, insert morpheme as final particle, part of speech is other than final particle By deleting a morpheme that is a particle of or substituting it with a specific particle, the sentence as a whole can be rewritten into a sentence having a consistent character character.

また、本発明の学習装置、方法、及びプログラムによれば、文全体として一貫したキャラクタらしさを持つ文に付与するためのパラメータ又はモデルを学習することができる。 Further, according to the learning apparatus, method, and program of the present invention, it is possible to learn a parameter or model to be given to a sentence having a character character consistent as a whole sentence.

本発明の第１の実施の形態に係る学習装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the learning apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る学習装置における書き換えパラメータ決定部の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the rewriting parameter determination part in the learning apparatus which concerns on the 1st Embodiment of this invention. 品詞付与済みのペアの例を示す図である。It is a figure which shows the example of the pair to which part of speech has been given. 書き換えパラメータセットの例を示す図である。It is a figure which shows the example of a rewriting parameter set. 本発明の第１の実施の形態に係る文書き換え処理装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the sentence rewriting processing apparatus which concerns on the 1st Embodiment of this invention. 推定ラベルを付与した例を示す図である。It is a figure which shows the example which provided the presumed label. 助動詞置換部による助動詞の削除の処理結果の例を示す図である。It is a figure which shows the example of the processing result of deletion of an auxiliary verb by an auxiliary verb replacement part. 助動詞置換部による助動詞の置換の処理結果の例を示す図である。It is a figure which shows the example of the processing result of the replacement of an auxiliary verb by an auxiliary verb replacement part. 終助詞挿入部の処理結果の例を示す図である。It is a figure which shows the example of the process result of a final particle insertion part. 助詞置換部の処理結果の例を示す図である。It is a figure which shows the example of the process result of a particle replacement part. 本発明の第１の実施の形態に係る学習装置における学習処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the learning process routine in the learning apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る学習装置における書き換えパラメータの決定処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the determination process routine of the rewriting parameter in the learning apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る学習装置における書き換え箇所推定モデルの学習処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the learning process routine of the rewriting location estimation model in the learning apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る文書き換え処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the sentence rewriting process routine which concerns on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係る学習装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the learning apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る文書き換え処理装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the sentence rewriting processing apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る学習装置における学習処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the learning process routine in the learning apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る学習装置における書き換えパラメータの決定処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the determination process routine of the rewriting parameter in the learning apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る文書き換え処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the sentence rewriting process routine based on the 2nd Embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態の原理＞
まず、本発明の実施の形態の原理について説明する。本実施の形態においては、文全体として一貫したキャラクタらしさを付与するために、文末表現だけでなく、文中に出現する機能語も対象とした書き換えを行う。機能語の中で、特にキャラクタ性が表れやすい「助詞」「助動詞」に対象を絞って書き換えを行う。 <Principle of Embodiment of the Present Invention>
First, the principle of the embodiment of the present invention will be described. In the present embodiment, in order to give a consistent character character as a whole sentence, rewriting is performed not only on the sentence end expression but also on function words appearing in the sentence. In the function word, rewriting is performed by focusing on “particles” and “auxiliary verbs” in which character characteristics are easy to appear.

書き換えの準備として、キャラクタらしさを持たない文（以後、キャラ無し文とする。）とキャラクタらしさを持つ文（以後、キャラ有り文とする。）を用いて、各キャラクタについて「助詞（格助詞，係助詞，終助詞）」「助動詞（ます，です，だ）」の書き換えに必要な情報（書き換えパラメータ）を取得しておく。未知の入力文の書き換えを行う際は、機械学習の手法によって、入力文中の書き換えを施すべき形態素を判定し、目的キャラクタ用の書き換えパラメータに即した書き換えを実施する。 In preparation for rewriting, a sentence with no character character (hereinafter referred to as a character-less sentence) and a sentence with character character (hereinafter referred to as a character-with-sentence) are used for each character. Information (rewriting parameters) required for rewriting “participant particle, final particle”) “auxiliary verb (mas, is, da)” is acquired in advance. When rewriting an unknown input sentence, a morpheme to be rewritten in the input sentence is determined by a machine learning technique, and rewriting according to the rewriting parameters for the target character is performed.

そのため、本実施の形態においては、日本語で書かれた文の「助詞」「助動詞」を書き換えることによって、文に特定のキャラクタらしさを付与する。具体的には、特定のキャラクタらしさを持つ文と持たない文とのペアで構成されるコーパスから「助詞」「助動詞」の書き換えに必要な情報を取得し、得られた情報に従って、新規に入力された文に対して書き換えを施す。ここで、日本語で書かれた文とは、発話の音声認識結果やテキストチャットなど、文字化された日本語のデータであれば何でも良い。また、ここで言う「キャラクタ」とは、話者の人物像のことを指す。言語学の分野における先行研究では、人物像の例として年齢・性別・職業・階層・時代・容姿・風貌・性格が挙げられている。 For this reason, in the present embodiment, a particular character character is given to the sentence by rewriting the “particle” and “auxiliary verb” of the sentence written in Japanese. Specifically, information necessary for rewriting "particles" and "auxiliary verbs" is acquired from a corpus that consists of sentences with specific character-like sentences and sentences that do not have them, and newly input according to the obtained information Rewrite the sentence. Here, the sentence written in Japanese may be any textual Japanese data such as a speech recognition result of an utterance or a text chat. Further, the “character” here refers to a person image of a speaker. In previous studies in the field of linguistics, examples of human figures include age, gender, occupation, hierarchy, age, appearance, appearance, and personality.

＜本発明の第１の実施の形態に係る学習装置の構成＞
次に、本発明の第１の実施の形態に係る学習装置の構成について説明する。図１に示すように、本発明の実施の形態に係る学習装置１００は、ＣＰＵと、ＲＡＭと、後述する学習処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この学習装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部９０とを備えている。 <Configuration of Learning Device According to First Embodiment of the Present Invention>
Next, the configuration of the learning device according to the first embodiment of the present invention will be described. As shown in FIG. 1, a learning device 100 according to an embodiment of the present invention is a computer that includes a CPU, a RAM, and a ROM that stores a program for executing a learning processing routine described later and various data. Can be configured. Functionally, the learning device 100 includes an input unit 10, a calculation unit 20, and an output unit 90 as shown in FIG.

入力部１０は、キャラ無し文と、当該キャラ無し文に対して人手で特定のキャラクタらしさを強調するように書き換えたキャラ有り文とのペアの各々を受け付ける。例えば、キャラ無し文「駅の傍に美味しい寿司屋があります」、及び女性キャラクタのキャラ有り文「駅の近くに美味しいお寿司屋さんがあるよ」のペアを受け付ける。なお、第１の実施の形態においては、予め処理対象となる特定のキャラクタが定められており、メモリ（図示省略）に当該キャラクタのキャラクタＩＤが記憶されているものとする。キャラクタＩＤは後述する書き換えパラメータセット記憶部４４において格納する書き換えパラメータセットを識別するために用いられる。 The input unit 10 accepts each of a pair of a character-free sentence and a character-with-text rewritten so as to manually emphasize a certain character like the character-less sentence. For example, a character-less sentence “There is a delicious sushi restaurant near the station” and a female character's character sentence “There is a delicious sushi restaurant near the station” are accepted. In the first embodiment, it is assumed that a specific character to be processed is determined in advance, and the character ID of the character is stored in a memory (not shown). The character ID is used to identify a rewrite parameter set stored in a rewrite parameter set storage unit 44 described later.

演算部２０は、文ペア記憶部２２と、書き換えパラメータ決定部３０と、書き換え箇所推定モデル作成部６０と、書き換え箇所推定モデル記憶部６２とを含んで構成されている。 The calculation unit 20 includes a sentence pair storage unit 22, a rewrite parameter determination unit 30, a rewrite location estimation model creation unit 60, and a rewrite location estimation model storage unit 62.

文ペア記憶部２２には、入力部１０において受け付けたキャラ無し文、及びキャラ有り文のペアの各々が記憶されている。以下の説明においては、キャラ無し文とキャラ有り文とのペアを簡潔に「ペア」と呼ぶこととする。 The sentence pair storage unit 22 stores a character-less sentence and a character-added sentence pair received by the input unit 10. In the following description, a pair of a sentence without a character and a sentence with a character will be simply referred to as a “pair”.

書き換えパラメータ決定部３０は、キャラ無し文から、予め定められた特定のキャラクタのキャラ有り文へ書き換えるための、形態素の表記を変換する規則を定めた書き換えパラメータを決定し、キャラクタＩＤ毎に、書き換えパラメータの集合である書き換えパラメータセットを、出力部９０に出力する。また、書き換えパラメータ決定部３０は、図２に示すように、置換文字列アライメント部３２と、書き換えパラメータ取得部３３とを備えている。 The rewriting parameter determination unit 30 determines a rewriting parameter that defines a rule for converting the morpheme notation for rewriting from a character-less sentence to a predetermined character-with-character sentence, and rewriting is performed for each character ID. A rewrite parameter set that is a set of parameters is output to the output unit 90. The rewrite parameter determination unit 30 includes a replacement character string alignment unit 32 and a rewrite parameter acquisition unit 33, as shown in FIG.

置換文字列アライメント部３２は、まず、文ペア記憶部２２に記憶されているペアの各々について、動的計画法を利用して、キャラ無し文に含まれる文字の各々とキャラ有り文に含まれる文字の各々とにおける文字単位の対応付けを行い、当該ペアにおける文字の対応関係を取得する。ここで、当該ペア間における書き換え箇所として、連続する置換箇所、削除箇所、又は挿入箇所は、まとめて一つの書き換え箇所として連結する。次に、ペアの各々について、当該ペアに含まれるキャラ無し文とキャラ有り文とのそれぞれに対し、形態素解析を行い、当該ペアの書き換え箇所の各々について、当該書き換え箇所の範囲内に左端が位置する形態素の品詞を、当該書き換え箇所の品詞として付与し、図３に示すような品詞付与済みのペアを取得する。図３の品詞付与済みのペアの例においては、「り」「ます」や「る」「よ」という連続した書き換え箇所が連結され、一つの書き換え箇所とされている。図３では、「ります」の「り」及び「るよ」の「る」を形態素「ある」、「あり」の一部とみなしており、形態素の左端が書き換え箇所の範囲に含まれていないため、品詞は付与しないものとする。 First, the replacement character string alignment unit 32 uses dynamic programming for each of the pairs stored in the sentence pair storage unit 22 and is included in each of the characters included in the character-less sentence and the character-included sentence. The character unit is associated with each character, and the character correspondence in the pair is acquired. Here, as rewrite locations between the pairs, consecutive replacement locations, deletion locations, or insertion locations are collectively connected as one rewrite location. Next, for each pair, a morpheme analysis is performed on each of the character-free sentence and the character-present sentence included in the pair, and the left end is positioned within the range of the rewritten portion for each rewritten portion of the pair. The part of speech of the morpheme to be given is assigned as the part of speech of the rewritten portion, and a part-of-speech assigned pair as shown in FIG. In the example of the part-of-speech added pair in FIG. 3, consecutive rewrite locations such as “ri”, “mas”, “ru”, and “yo” are connected to form one rewrite location. In FIG. 3, “Ri” of “Risuma” and “Ru” of “Ruyo” are regarded as part of the morpheme “A” and “A”, and the left end of the morpheme is included in the range of the rewritten part. Because there is no part of speech, no part of speech is given.

書き換えパラメータ取得部３３は、助動詞削除判定部３４と、助動詞置換判定部３６と、終助詞選択部３８と、助詞置換判定部４０と、助詞省略可否判定部４２と、書き換えパラメータセット記憶部４４とを備えている。 The rewrite parameter acquisition unit 33 includes an auxiliary verb deletion determination unit 34, an auxiliary verb replacement determination unit 36, a final particle selection unit 38, a particle replacement determination unit 40, a particle omission omission determination unit 42, and a rewrite parameter set storage unit 44. It has.

助動詞削除判定部３４は、文の書き換えにおいて、助動詞「です」、「ます」を削除するか否かの規則を決定する。具体的には、品詞付与済みのペアの各々について、当該ペアに含まれるキャラ無し文に出現する「です」、「ます」のうち、キャラ有り文への書き換えで削除された回数とキャラ有り文への書き換えで削除されなかった回数をカウントし、キャラ無し文の各々において削除された回数の総和Ｄ_１と、キャラ無し文の各々において削除されなかった回数の総和Ｄ_２とを比較し、削除された回数の総和Ｄ_１が削除されなかった回数の総和Ｄ_２に比べて統計的有意に大きかった場合は、「です」、「ます」を「削除する」として決定し、それ以外の場合には、「です」、「ます」を「削除しない」として決定し、決定した結果を、第１の書き換えパラメータとして、書き換えパラメータセット記憶部４４に記憶する。なお、第１の実施の形態においては、統計検定としてカイ二乗検定を用いる。 The auxiliary verb deletion determination unit 34 determines a rule as to whether or not to delete the auxiliary verbs “is” and “mas” in the rewriting of the sentence. Specifically, for each pair with part-of-speech added, the number of characters that have been deleted by rewriting to the character-presented sentence and the character-presented sentence appearing in the no-character sentence included in the pair. The number of times that the character-less sentence is deleted is compared with the sum D _{1 of} the number of times that the character-less sentence is deleted and the total number D ₂ that is not deleted for each of the character-less sentences. If statistically was significantly greater than the sum D ₂ of the number of times the sum D ₁ of the number of times has not been deleted, which is, "we", to determine the "masu" as the "delete", otherwise. Determines “is” and “mas” as “do not delete”, and stores the determined result in the rewrite parameter set storage unit 44 as the first rewrite parameter. In the first embodiment, a chi-square test is used as a statistical test.

助動詞置換判定部３６は、文の書き換えにおいて助動詞「だ（んだ）」に置換するか否かの規則を決定する。具体的には、品詞付与済みのペアの各々について、当該ペアに含まれるキャラ無し文の各々に「です」、「ます」及び「だ」が出現する回数をカウントし、キャラ無し文の各々に出現する回数の総和Ｅ_１を取得し、一方、当該ペアに含まれるキャラ有り文の各々に「です」、「ます」及び「だ」が出現する回数をカウントし、キャラ有り文の各々に出現する回数の総和Ｅ_２を取得し、キャラ無し文に出現する回数の総和Ｅ_１とキャラ有り文に出現する回数の総和Ｅ_２とを比較し、キャラ有り文に出現する回数の総和Ｅ_２が、キャラ無し文に出現する回数の総和Ｅ_１に比べて統計的有意に大きかった場合は、「だ（んだ）」へ置換するとして決定し、それ以外の場合には、「だ（んだ）」へ置換しないと決定し、決定した結果を、第２の書き換えパラメータとして、書き換えパラメータセット記憶部４４に記憶する。なお、本実施の形態においては、統計検定としてカイ二乗検定を用いる。 The auxiliary verb replacement determination unit 36 determines a rule as to whether or not to replace the auxiliary verb “DA” in the sentence rewriting. Specifically, for each pair with part of speech, the number of occurrences of “Da”, “Mas”, and “Da” appears in each of the characterless sentences included in the pair, and each characterless sentence is counted. to get the total sum E ₁ of the number of times that appear, on the other hand, "it is" to each of the characters there statements that are included in the pair, and count the number of times that the "cell" and "I" appears, appears in each of the characters there sentence to get the sum total E ₂ of the number of times that, compared with the sum total E ₂ of the number of times that appear in the number of times the sum E ₁ and Characterization There sentence of which appears in the character without a statement, the sum E ₂ of the number of times that appear in the character there sentence , if statistically was significantly greater than the sum E ₁ of the number of times that appear in the character without a sentence, it is determined as the replacement to the "I (do it)", in other cases, but "it's (do ) "Is decided not to be replaced, and the result of the decision is As recombinant parameter, and stores the rewritten parameter set storage unit 44. In the present embodiment, chi-square test is used as a statistical test.

終助詞選択部３８は、文の書き換えにおいて使われやすい終助詞の規則を決定する。具体的には、品詞付与済みのペアの各々について、予め定められた、特定のキャラクタで使われやすい終助詞（な／よ／の／わ／ね等）毎に、当該ペアのキャラ無し文に当該終助詞が出現する回数をカウントし、キャラ無し文の各々に当該終助詞が出現する回数の総和Ｇ_１を取得し、一方、当該ペアのキャラ有り文における当該終助詞が出現する回数をカウントし、キャラ有り文の各々に当該終助詞が出現する回数の総和Ｇ_２を取得し、キャラ無し文に出現する回数の総和Ｇ_１とキャラ有り文に出現する回数の総和Ｇ_２とを比較し、キャラ有り文に出現する回数の総和Ｇ_２が、キャラ無し文に出現する回数の総和Ｇ_１に比べて統計的有意に大きかった場合は、当該終助詞が使われやすいと決定し、決定した結果を、第３の書き換えパラメータとして、当該統計的有意を表す値と共に、書き換えパラメータセット記憶部４４に記憶する。なお、第１の実施の形態においては、統計検定としてカイ二乗検定を用いる。 The final particle selector 38 determines a final particle rule that is easily used in sentence rewriting. Specifically, for each pair of parts of speech that has been given a part-of-speech, a character-less sentence of the pair is determined for each predetermined final particle (such as na / yo / no / wa / ne) that is likely to be used by a specific character. counting the number of times that the final particle appears, it obtains the sum G ₁ of the number of times the final particle to each of the characters without statement appears, while counting the number of times the final particle appears in character there sentence of the pair Then, the sum G ₂ of the number of times that the final particle appears in each of the sentences with characters is obtained, and the sum G _{1 of} the number of times of appearance in the sentences without characters is compared with the sum G _{2 of} the number of appearances of the sentences with characters. the sum G ₂ times appearing in character there statement, if statistically was significantly greater than the sum G ₁ times appearing in character without sentence, it determines that easily the final particles are used to determine The result is the third rewrite parameter As data, together with the value representing the statistical significance, and stores the rewritten parameter set storage unit 44. In the first embodiment, a chi-square test is used as a statistical test.

助詞置換判定部４０は、文の書き換えにおいて「って」を使用するか否かの規則を決定する。具体的には、品詞付与済みのペアの各々について、当該ペアにおける当該キャラ無し文に「って」が出現する回数をカウントし、キャラ無し文の各々に「って」が出現する回数の総和Ｈ_１を取得し、一方、当該ペアに含まれるキャラ有り文における「って」が出現する回数をカウントし、キャラ有り文の各々に「って」が出現する回数の総和Ｈ_２を取得し、キャラ無し文に出現する回数の総和Ｈ_１とキャラ有り文に出現する回数の総和Ｈ_２とを比較し、キャラ有り文に出現する回数の総和Ｈ_２が、キャラ無し文に出現する回数の総和Ｈ_１に比べて統計的有意に大きかった場合は、「って」を使用すると決定し、それ以外の場合には、「って」を使用しないと決定し、決定した結果を、第４の書き換えパラメータとして、書き換えパラメータセット記憶部４４に記憶する。なお、第１の実施の形態においては、統計検定としてカイ二乗検定を用いる。 The particle replacement determination unit 40 determines a rule as to whether or not to use “te” in sentence rewriting. Specifically, for each pair of parts of speech given, count the number of times “te” appears in the characterless sentence in the pair, and sum the number of times “te” appears in each characterless sentence. H ₁ is acquired, and on the other hand, the number of times “te” appears in the sentence with character included in the pair is counted, and the total number H ₂ of the number of times “te” appears in each sentence with character is obtained. , compared with the sum H ₂ of the number of times that appear in the number of times the sum H ₁ and Characterization There sentence of which appears in the character without a statement, the sum H ₂ of the number of times that appear in the character there statement, the number of times that appear in the character without sentence If the sum statistically was significantly greater compared to H _1, it decides to use the "I", in other cases, decides not to use the "I", the determined result, the fourth As the rewrite parameter of It is stored in the set storage unit 44. In the first embodiment, a chi-square test is used as a statistical test.

助詞省略可否判定部４２は、文の書き換えにおいて、終助詞以外の助詞の省略が可か不可かの規則を決定する。具体的には、品詞付与済みのペアの各々について、当該ペアにおける当該キャラ無し文に出現する、終助詞以外の助詞の各々が、当該ペアにおける当該キャラ有り文への書き換えで削除された回数をカウントし、キャラ無し文の各々において書き換えで終助詞以外の助詞が削除された回数の総和Ｉ_１を取得し、一方、当該ペアにおける当該キャラ無し文に出現する、終助詞以外の助詞の各々が、当該ペアにおける当該キャラ有り文への書き換えで削除されなかった回数をカウントし、キャラ無し文の各々において書き換えで終助詞以外の助詞が削除されなかった回数の総和Ｉ_２を取得し、削除された回数の総和Ｉ_１と削除されなかった回数の総和Ｉ_２とを比較し、削除された回数の総和Ｉ_１が、削除されなかった回数の総和Ｉ_２に比べて統計的有意に大きかった場合は、助詞の「省略可能」を決定し、それ以外の場合には、助詞の「省略不可」を決定し、決定した結果を、第５の書き換えパラメータとして、書き換えパラメータセット記憶部４４に記憶する。なお、第１の実施の形態においては、統計検定としてカイ二乗検定を用いる。 The particle omission omission determination unit 42 determines a rule for whether or not omission of particles other than the final particle is omissible in sentence rewriting. Specifically, for each pair of parts of speech, the number of times that each particle other than the final particle that appears in the character-less sentence in the pair has been deleted by rewriting to the character-indicated sentence in the pair is calculated. Count and obtain the sum I _{1 of} the number of times particles other than final particles were deleted by rewriting in each characterless sentence, while each particle other than final particles appearing in the characterless sentence in the pair , Count the number of times that the pair was not deleted by rewriting to the character with sentence, and obtain the sum I _{2 of} the number of times the particles other than the final particle were not deleted by rewriting in each of the character-less sentences comparing the number of times the sum total I ₂ times that were not removed as I _1, and the sum I ₁ of the number of deleted, compared to the total sum I ₂ times that were not removed If it is statistically significant, the particle “can be omitted” is determined. Otherwise, the particle “cannot be omitted” is determined, and the determined result is used as the fifth rewriting parameter. Store in the set storage unit 44. In the first embodiment, a chi-square test is used as a statistical test.

書き換えパラメータセット記憶部４４には、助動詞削除判定部３４、助動詞置換判定部３６、終助詞選択部３８、助詞置換判定部４０、及び助詞省略可否判定部４２から受け付けた書き換えパラメータと、メモリ（図示省略）に記憶されているキャラクタＩＤとをセットにしたデータが、対象となるキャラクタの書き換えパラメータセットとして記憶される。図４に書き換えパラメータセットの例を示す。図４では、書き換えパラメータセット記憶部４４に記憶する際の識別子と位置付けるキャラクタＩＤ毎に、上から順に、助動詞削除判定部３４で選択した助動詞「です」、「ます」を「削除する」または「しない」の情報、助動詞置換判定部３６で選択した助動詞「だ（んだ）」に「置換する」または「しない」の情報、終助詞選択部３８で使われやすい終助詞として選択した終助詞の情報、助詞置換判定部４０で選択した「って」を「使用する」または「しない」の情報、及び助詞省略可否判定部４２で決定した助詞の「省略可」「省略不可」を格納した例を示しており、いずれも太字部分が選択もしくは、決定された事項を表している。 The rewrite parameter set storage unit 44 includes a rewrite parameter received from the auxiliary verb deletion determination unit 34, the auxiliary verb replacement determination unit 36, the final particle selection unit 38, the particle replacement determination unit 40, and the particle omission omission determination unit 42, and a memory (illustrated). Data that is a set of the character ID stored in (omitted) is stored as a rewrite parameter set of the target character. FIG. 4 shows an example of the rewrite parameter set. In FIG. 4, the auxiliary verbs “is” and “mas” selected by the auxiliary verb deletion determination unit 34 are “deleted” or “deleted” in order from the top for each character ID positioned as an identifier when stored in the rewrite parameter set storage unit 44. Information of “do not”, information of “replace” or “do not” to the auxiliary verb “da” selected by the auxiliary verb replacement determination unit 36, and the final particle selected as the final particle that is easily used by the final particle selection unit 38 Example of storing information, “use” or “do not” information of “te” selected by the particle replacement determination unit 40, and “no omission” and “no omission” of the particles determined by the particle omission omission determination unit 42 In each case, the bold part represents the item selected or determined.

書き換え箇所推定モデル作成部６０は、置換文字列アライメント部３２において取得した対応関係に基づいて、置換された形態素、削除された形態素、及び操作が加えられなかった形態素の各々に、それぞれの教師ラベルとしてＳＵＢ（−Ｂ／Ｉ）、ＤＥＬ（−Ｂ／Ｉ）、Ｏを付与し（−Ｂは書き換え箇所の先頭形態素、−Ｉは先頭以外の形態素であることを表す）、教師ラベル及び様々な特徴量を用いて機械学習の手法により、置換、削除、及び操作無しの系列をモデル化し、キャラクタらしさを付与するための形態素に対する操作を、置換、削除、又は操作無しに分類するための書き換え箇所推定モデルを学習し、書き換え箇所推定モデル記憶部６２に記憶すると共に、出力部９０に出力する。具体的には、第１の実施の形態において機械学習にConditional Random Field（CRF）を用いて、教師ラベルが付与された形態素についての特徴量として、当該教師ラベルが付与された形態素、教師ラベルが付与された形態素より前２つ以内に位置する形態素の各々、及び教師ラベルが付与された形態素より後ろ２つ以内に位置する形態素の各々の表記、品詞、及び文字種を用いる。第１の実施の形態では、文字種として、漢字（ラベル「ａ」）、ひらがな（ラベル「ｂ」）、カタカナ（ラベル「ｃ」）、アルファベット（ラベル「ｄ」）、及び数字（ラベル「ｅ」）を用いて、一つの形態素に漢字とひらがな等、複数の文字種が混在している場合には、「ａｂ」のようにラベルの組み合わせで文字種を表現する。なお、文字種は学習においては必須の特徴量ではなく、利用しなくても良い。 Based on the correspondence acquired by the replacement character string alignment unit 32, the rewritten location estimation model creation unit 60 assigns each teacher label to each of the replaced morpheme, the deleted morpheme, and the morpheme to which no operation has been added. SUB (-B / I), DEL (-B / I), and O (-B represents the first morpheme at the rewritten location, -I represents a morpheme other than the first), a teacher label, and various A rewritten part to model a sequence of substitution, deletion, and no operation by a machine learning method using feature quantities, and classify operations on morphemes for imparting character character as substitution, deletion, or no operation The estimated model is learned, stored in the rewritten location estimated model storage unit 62, and output to the output unit 90. Specifically, using the conditional random field (CRF) for machine learning in the first embodiment, the morpheme to which the teacher label is assigned and the teacher label as the feature quantity for the teacher label are given. The notation, the part of speech, and the character type of each of the morphemes that are located within the two preceding morphemes and the morphemes that are located within the two following the morphemes given the teacher label are used. In the first embodiment, the character types are kanji (label “a”), hiragana (label “b”), katakana (label “c”), alphabet (label “d”), and numbers (label “e”). ), When a plurality of character types such as kanji and hiragana are mixed in one morpheme, the character type is expressed by a combination of labels such as “ab”. Note that the character type is not an essential feature amount in learning, and may not be used.

＜本発明の第１の実施の形態に係る文書き換え処理装置の構成＞
次に、本発明の第１の実施の形態に係る文書き換え処理装置の構成について説明する。図５に示すように、本発明の第１の実施の形態に係る文書き換え処理装置２００は、ＣＰＵと、ＲＡＭと、後述する文書き換え処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この文書き換え処理装置２００は、機能的には図５に示すように入力部２１０と、演算部２２０と、出力部２９０とを備えている。 <Configuration of sentence rewrite processing device according to first embodiment of the present invention>
Next, the configuration of the sentence rewrite processing device according to the first embodiment of the present invention will be described. As shown in FIG. 5, the sentence rewrite processing device 200 according to the first embodiment of the present invention includes a CPU, a RAM, and a ROM that stores a program and various data for executing a sentence rewrite processing routine to be described later. And a computer including Functionally, the sentence rewriting processing device 200 includes an input unit 210, a calculation unit 220, and an output unit 290 as shown in FIG.

入力部２１０は、キャラ無し文を受け付ける。 The input unit 210 accepts a characterless sentence.

演算部２２０は、形態素解析部２２２と、書き換え箇所推定モデル記憶部２２４と、書き換え箇所推定部２２６と、書き換えパラメータセット記憶部２２８と、助動詞置換部２３０と、終助詞挿入部２３４と、助詞置換部２３６とを含んで構成されている。 The calculation unit 220 includes a morphological analysis unit 222, a rewrite location estimation model storage unit 224, a rewrite location estimation unit 226, a rewrite parameter set storage unit 228, an auxiliary verb replacement unit 230, a final particle insertion unit 234, and a particle replacement. Part 236.

形態素解析部２２２は、入力部２１０において受け付けたキャラ無し文について、形態素解析を行い、当該キャラ無し文の形態素の各々と、形態素の各々の品詞を取得する。 The morpheme analysis unit 222 performs morpheme analysis on the characterless sentence received by the input unit 210, and acquires each morpheme of the characterless sentence and each part of speech of the morpheme.

書き換え箇所推定モデル記憶部２２４には、学習装置１００の書き換え箇所推定モデル記憶部６２に記憶されている書き換え箇所推定モデルと同一の書き換え箇所推定モデルが記憶されている。 The rewrite location estimation model storage unit 224 stores the same rewrite location estimation model as the rewrite location estimation model stored in the rewrite location estimation model storage unit 62 of the learning device 100.

書き換え箇所推定部２２６は、学習装置１００の書き換え箇所推定モデル作成部６０において取得した形態素の特徴量と同様の特徴量を、形態素解析部２２２において取得した、形態素解析済みのキャラ無し文に含まれる各形態素について抽出し、キャラ無し文に含まれる各形態素の特徴量と、書き換え箇所推定モデル記憶部２２４に記憶されている書き換え箇所推定モデルとに基づいて、キャラ無し文の各形態素について、置換されるべき形態素（ＳＵＢ（−Ｂ／Ｉ））、削除されるべき形態素（ＤＥＬ（−Ｂ／Ｉ））、及び操作が加えられるべきでない形態素（Ｏ）の何れであるかを推定して、推定ラベルを付与し、書き換え箇所ラベル付き入力文とする。第１の実施の形態では、学習装置１００と同様に、推定には、Conditional Random Field （CRF）を用いる。書き換え箇所推定部２２６の入出力の例を図６に示す。 The rewrite location estimation unit 226 includes a feature quantity similar to the morpheme feature quantity acquired by the rewrite location estimation model creation unit 60 of the learning device 100 in the morpheme-analyzed characterless sentence acquired by the morpheme analysis unit 222. Each morpheme is extracted and replaced for each morpheme of the characterless sentence based on the feature amount of each morpheme included in the characterless sentence and the rewrite location estimation model stored in the rewrite location estimation model storage unit 224. Estimating whether a morpheme to be deleted (SUB (-B / I)), a morpheme to be deleted (DEL (-B / I)), and a morpheme to which an operation should not be added (O) A label is given to make the input sentence with a rewritten part label. In the first embodiment, as in the learning device 100, Conditional Random Field (CRF) is used for estimation. An example of input / output of the rewrite location estimation unit 226 is shown in FIG.

書き換えパラメータセット記憶部２２８には、学習装置１００の書き換えパラメータセット記憶部４４に記憶されている書き換えパラメータセットと同一の書き換えパラメータセットが記憶されている。 The rewrite parameter set storage unit 228 stores the same rewrite parameter set as the rewrite parameter set stored in the rewrite parameter set storage unit 44 of the learning apparatus 100.

助動詞置換部２３０は、書き換えパラメータセット記憶部２２８に記憶されている、特定のキャラクタの書き換えパラメータセットに基づいて、書き換え箇所ラベル付き入力文に含まれる助動詞「です」、「ます」を削除または別の形態素又は形態素列への置換を行う。具体的には、当該書き換え箇所ラベル付き入力文に含まれる、表記が「です」及び「ます」の形態素の各々について、書き換え箇所推定部２２６によって推定された当該形態素の推定ラベルが置換を意味するＳＵＢ又は削除を意味するＤＥＬであり、かつ、特定のキャラクタの書き換えパラメータセットに「です」、「ます」を削除するように規定されている場合には、図７に示すように、当該形態素を削除する。なお、「ます」を削除する際は、「ます」の左隣の動詞を終止形に置換する必要がある。 The auxiliary verb replacement unit 230 deletes or separates the auxiliary verbs “is” and “mas” included in the input sentence with the rewritten portion label based on the rewriting parameter set of the specific character stored in the rewriting parameter set storage unit 228. Is replaced with a morpheme or morpheme string. Specifically, for each morpheme whose notation is “is” and “mas” included in the input sentence with the rewrite location label, the estimated label of the morpheme estimated by the rewrite location estimation unit 226 means replacement. If the DEL signifies SUB or deletion, and the rewrite parameter set for a specific character specifies that “is” or “masu” be deleted, as shown in FIG. delete. When deleting “mas”, it is necessary to replace the verb on the left of “mas” with the final form.

そして、当該書き換え箇所ラベル付き入力文の表記が「です」及び「ます」である形態素であって、書き換え箇所推定部２２６によって推定された当該形態素の推定ラベルが置換を意味するＳＵＢ又は削除を意味するＤＥＬであり、かつ、当該形態素の直前及び直後の形態素の各々の品詞（例えば、直前の形態素の品詞が名詞であり、直後の形態素の品詞が接続助詞等）に基づいて、当該形態素の箇所において、文法的に「だ」又は「んだ」を必要とする箇所である、または、特定のキャラクタの書き換えパラメータセットに「だ（んだ）」に置換するように規定されている場合には、図８に示すように、当該形態素部分の「です」又は「ます」を「だ」又は「んだ」へ置換する。 And, the morpheme whose notation of the input sentence with the rewrite location label is “is” and “mas”, and the estimated label of the morpheme estimated by the rewrite location estimation unit 226 means SUB meaning replacement or deletion And the location of the morpheme based on each part of speech of the morpheme immediately before and after the morpheme (for example, the part of speech of the previous morpheme is a noun and the part of speech of the morpheme immediately after is a connective particle) In the grammar where “da” or “da” is required, or when it is specified to replace “da” in the rewrite parameter set of a specific character As shown in FIG. 8, “is” or “mass” in the morpheme portion is replaced with “da” or “da”.

終助詞挿入部２３４は、書き換えパラメータセット記憶部２２８に記憶されている、特定のキャラクタの書き換えパラメータセットと、助動詞置換部２３０の処理結果とに基づいて、処理済みの書き換え箇所ラベル付き入力文に、終助詞を挿入する。具体的には、当該書き換え箇所ラベル付き入力文の表記が「です」及び「ます」である形態素に対して、助動詞置換部２３０において置換された形態素（表記を「だ」、又は「んだ」とした形態素）の各々について、書き換えパラメータセットに使用可能な終助詞が規定されている場合、図９に示すように、当該形態素部分の「だ」又は「んだ」の直後に、使用可能な終助詞のうちキャラ有り文における出現頻度が一番大きい終助詞を挿入する。また、当該書き換え箇所ラベル付き入力文の表記が「です」及び「ます」である形態素であって、助動詞置換部２３０により削除された形態素（表記を「φ」とした形態素）の各々について、書き換えパラメータセットに使用可能な終助詞が規定されている場合、当該形態素部分の「φ」を使用可能な終助詞のうちキャラ有り文における出現頻度が一番大きい終助詞に置換する。当該書き換え箇所ラベル付き入力文の表記が「です」及び「ます」である形態素であって、書き換え操作が為されなかった場合は、「です」及び「ます」の直後に、使用可能な終助詞のうちキャラ有り文における出現頻度が一番大きい終助詞を挿入する。 The final particle insertion unit 234 converts the rewritten parameter set storage unit 228 into a rewritten parameter-labeled input sentence that has been processed based on the rewrite parameter set of a specific character and the processing result of the auxiliary verb replacement unit 230. Insert a final particle. Specifically, the morpheme (notation is “da” or “da”) replaced in the auxiliary verb replacement unit 230 with respect to the morpheme whose notation of the input sentence with the rewritten location label is “is” and “mas”. For each morpheme), when a final particle that can be used is defined in the rewrite parameter set, as shown in FIG. 9, it can be used immediately after “da” or “dan” of the morpheme part. Inserts the final particle with the highest frequency of occurrence in the sentence with characters among the final particles. In addition, for each morpheme in which the notation of the input sentence with the rewrite location label is “is” and “mas” and deleted by the auxiliary verb replacement unit 230 (morpheme with the notation “φ”), rewriting is performed. When a usable final particle is defined in the parameter set, “φ” in the morpheme part is replaced with a final particle having the highest frequency of appearance in the character presence sentence among usable final particles. If the input statement with the rewritten part label is “Da” and “Mas” and the rewriting operation is not performed, the final particle that can be used immediately after “Da” and “Mas” The final particle with the highest frequency of occurrence in characters with characters is inserted.

助詞置換部２３６は、書き換えパラメータセット記憶部２２８に記憶されている、特定のキャラクタの書き換えパラメータセットに基づいて、助動詞置換部２３０、及び終助詞挿入部２３４の処理結果として得られた、処理済みの書き換え箇所ラベル付き入力文に含まれる助詞を削除又は置換し、出力結果として出力部２９０に出力する。具体的には、処理済みの書き換え箇所ラベル付き入力文に含まれる品詞が終助詞以外の助詞である形態素の各々について、当該形態素の推定ラベルとして削除を意味するＤＥＬが付されており、かつ、特定のキャラクタの書き換えパラメータセットに助詞の「省略可能」が規定されている場合には、終助詞以外の助詞である当該形態素を削除する。また、書き換え箇所ラベル付き入力文に含まれる品詞が終助詞以外の助詞である形態素の各々について、当該形態素の推定ラベルとして置換を意味するＳＵＢが付されており、かつ、特定のキャラクタの書き換えパラメータセットに「って」を使用すると規定されている場合には、図１０に示すように、終助詞以外の助詞である当該形態素の表記を「って」に置換する。 The particle replacement unit 236 is a processed result obtained as a result of processing by the auxiliary verb replacement unit 230 and the final particle insertion unit 234 based on the rewrite parameter set of a specific character stored in the rewrite parameter set storage unit 228. The particle included in the input sentence with the rewritten location label is deleted or replaced, and output to the output unit 290 as an output result. Specifically, for each morpheme whose part-of-speech included in the processed input sentence with the rewritten location label is a particle other than the final particle, DEL which means deletion is attached as an estimated label of the morpheme, and When the particle parameter “can be omitted” is defined in the rewriting parameter set of a specific character, the morpheme that is a particle other than the final particle is deleted. In addition, for each morpheme whose part of speech included in the input sentence with the rewrite location label is a particle other than the final particle, a SUB meaning substitution is attached as an estimated label of the morpheme, and the rewrite parameters of a specific character When it is defined that “te” is used for the set, as shown in FIG. 10, the notation of the morpheme that is a particle other than the final particle is replaced with “te”.

＜本発明の第１の実施の形態に係る学習装置の作用＞
次に、本発明の第１の実施の形態に係る学習装置１００の作用について説明する。まず、入力部１０においてキャラ無し文とキャラ有り文とのペアの各々を受け付け文ペア記憶部２２に記憶する。そして、文ペア記憶部２２からキャラ無し文とキャラ有り文とのペアの各々を読み込むと、学習装置１００は、図１１に示す学習処理ルーチンを実行する。 <Operation of the learning apparatus according to the first embodiment of the present invention>
Next, the operation of the learning device 100 according to the first embodiment of the present invention will be described. First, in the input unit 10, each of a pair of a character-less sentence and a character-present sentence is stored in the received sentence pair storage unit 22. Then, when each of the pair of the character-free sentence and the character-present sentence is read from the sentence pair storage unit 22, the learning device 100 executes a learning processing routine shown in FIG.

まず、ステップＳ１００では、読み込んだペアの各々に基づいて、キャラ無し文から、予め定められた特定のキャラクタのキャラ有り文へ書き換えるための、形態素の表記を変換する規則を定めた書き換えパラメータを決定し、特定のキャラクタの書き換えパラメータセットとして出力部９０に出力する。 First, in step S100, based on each of the read pairs, a rewriting parameter that defines a rule for converting a morpheme notation for rewriting from a character-less sentence to a predetermined character-with-character sentence is determined. Then, the data is output to the output unit 90 as a rewrite parameter set for a specific character.

次に、ステップＳ１０２では、ステップＳ１００において取得した品詞付与済みのペアの各々について、当該ペアにおける、当該キャラ無し文と当該キャラ有り文の表記に基づいて、置換された形態素、削除された形態素、及び操作が加えられなかった形態素の各々に、それぞれの教師ラベルを付与し、機械学習の手法を用いて置換、削除、及び操作無しの系列をモデル化し、書き換え箇所推定モデルとして、書き換え箇所推定モデル記憶部６２に記憶すると共に、出力部９０に出力し、学習処理ルーチンを終了する。 Next, in step S102, for each pair of parts of speech that has been acquired in step S100, the replaced morpheme, the deleted morpheme, Each morpheme that has not been subjected to an operation is assigned a respective teacher label, and a series of replacement, deletion, and no operation is modeled using a machine learning technique, and a rewrite location estimation model is used as a rewrite location estimation model. While storing in the memory | storage part 62, it outputs to the output part 90, and complete | finishes a learning process routine.

上記ステップＳ１００について、図１２における書き換えパラメータの決定処理ルーチンにおいて詳細に説明する。 Step S100 will be described in detail in the rewrite parameter determination processing routine in FIG.

図１２のステップＳ２００では、読み込んだペアの各々について、動的計画法を利用して、当該ペアにおけるキャラ無し文に含まれる文字の各々と、キャラ有り文に含まれる文字の各々とにおける文字単位の対応付けを行い、当該ペアにおける文字の対応関係である書き換え箇所を取得する。当該ペア間における書き換え箇所の取得に際しては、連続する置換箇所、削除箇所、又は挿入箇所は、まとめて一つの書き換え箇所として連結する。 In step S200 of FIG. 12, for each read pair, using dynamic programming, character units in each of the characters included in the character-free sentence and each character included in the character-added sentence in the pair. And the rewriting part which is the correspondence of the character in the said pair is acquired. When acquiring rewrite locations between the pairs, consecutive replacement locations, deletion locations, or insertion locations are connected together as a single rewrite location.

次に、ステップＳ２０２では、読み込んだペアの各々について、当該ペアにおけるキャラ無し文及びキャラ有り文について形態素解析を行い、ステップＳ２００において取得した当該ペアにおける書き換え箇所の各々について、品詞を付与する。品詞の付与条件は、置換文字列アライメント部３２の動作でも説明したように、当該ペアに含まれるキャラ無し文とキャラ有り文とのそれぞれに対し、形態素解析を行い、当該ペアの書き換え箇所の各々について、当該書き換え箇所の範囲内に左端が位置する形態素の品詞を、当該書き換え箇所の品詞として付与するものとする。 Next, in step S202, morphological analysis is performed on the character-less sentence and the character-present sentence in the pair for each read pair, and the part of speech is assigned to each rewritten location in the pair acquired in step S200. As described in the operation of the replacement character string alignment unit 32, the part-of-speech assignment condition is such that morphological analysis is performed on each of the character-free sentence and the character-present sentence included in the pair, , The part of speech of the morpheme whose left end is located within the range of the rewritten part is given as the part of speech of the rewritten part.

次に、ステップＳ２０４では、ステップＳ２０２において取得した品詞付与済みのペアの各々に基づいて、特定のキャラクタのキャラ有り文への書き換えにおいて助動詞「です」、「ます」を削除するか否かの規則を決定し、決定した結果を、書き換えパラメータとして、書き換えパラメータセット記憶部４４に記憶する。 Next, in step S204, based on each of the part-of-speech added pairs acquired in step S202, a rule as to whether or not to delete the auxiliary verbs “is” and “mas” in rewriting a character with a character of a specific character. And the determined result is stored in the rewrite parameter set storage unit 44 as a rewrite parameter.

次に、ステップＳ２０６では、特定のキャラクタらしさを持つ文と持たない文とのペアで構成されるコーパスに基づいて、特定のキャラクタのキャラ有り文への書き換えにおいて助動詞「だ（んだ）」をステップＳ２０４で削除した助動詞「です」、「ます」と置換するか否かの規則を決定し、決定した結果を、書き換えパラメータとして、書き換えパラメータセット記憶部４４に記憶する。 Next, in step S206, on the basis of a corpus composed of a pair of a sentence having a character-like character and a sentence having no character, the auxiliary verb “Da” is rewritten in the rewriting of a character with a character. The rule whether or not to replace the auxiliary verbs “is” and “mas” deleted in step S204 is determined, and the determined result is stored in the rewrite parameter set storage unit 44 as a rewrite parameter.

次に、ステップＳ２０８では、特定のキャラクタらしさを持つ文と持たない文とのペアで構成されるコーパスに基づいて、特定のキャラクタのキャラ有り文への書き換えにおいて使われやすい終助詞の規則を決定し、決定した結果を、書き換えパラメータとして、書き換えパラメータセット記憶部４４に記憶する。 Next, in step S208, a final particle rule that is likely to be used in rewriting a sentence with a character of a specific character is determined based on a corpus composed of a pair of a sentence with a specific character and a sentence with no specific character. The determined result is stored in the rewrite parameter set storage unit 44 as a rewrite parameter.

次に、ステップＳ２１０では、特定のキャラクタらしさを持つ文と持たない文とのペアで構成されるコーパスに基づいて、予め定められた特定のキャラクタのキャラ有り文への書き換えにおいて「って」を使用するか否かの規則を決定し、決定した結果を、書き換えパラメータとして、書き換えパラメータセット記憶部４４に記憶する。 Next, in step S210, based on a corpus composed of a sentence having a character-like character and a sentence having no character-like character, the word “te” is rewritten to a predetermined character-with-character sentence. The rule of whether or not to use is determined, and the determined result is stored in the rewrite parameter set storage unit 44 as a rewrite parameter.

次に、ステップＳ２１２では、ステップＳ２０２において取得した品詞付与済みのペアの各々に基づいて、特定のキャラクタのキャラ有り文への書き換えにおいて助詞の省略が可か不可かの規則を決定し、決定した結果を、書き換えパラメータとして、書き換えパラメータセット記憶部４４に記憶する。 Next, in step S212, based on each part-of-speech given pair acquired in step S202, a rule is determined as to whether or not a particle can be omitted when rewriting a character with a character. The result is stored in the rewrite parameter set storage unit 44 as a rewrite parameter.

次に、ステップＳ２１４では、ステップＳ２０４〜ステップＳ２１２において取得した書き換えパラメータの各々と、メモリ（図示省略）に記憶されている特定のキャラクタに対するキャラクタＩＤとをセットにして、特定のキャラクタの書き換えパラメータセットとして書き換えパラメータセット記憶部４４に記憶する。 Next, in step S214, a rewrite parameter set for a specific character is set by setting each of the rewrite parameters acquired in steps S204 to S212 and a character ID for a specific character stored in a memory (not shown). Is stored in the rewrite parameter set storage unit 44.

次に、ステップＳ２１６では、ステップＳ２１４において取得した特定のキャラクタの書き換えパラメータセットを出力部９０に出力して書き換えパラメータの決定処理ルーチンを終了する。 Next, in step S216, the rewrite parameter set for the specific character acquired in step S214 is output to the output unit 90, and the rewrite parameter determination processing routine ends.

上記ステップＳ１０２について、図１３における書き換え箇所推定モデルの学習処理ルーチンにおいて詳細に説明する。 Step S102 will be described in detail in the learning process routine for the rewritten location estimation model in FIG.

図１３のステップＳ３００では、ステップＳ１００において取得した品詞付与済みの文のペアの各々について、当該ペアにおける、当該キャラ無し文と当該キャラ有り文の表記に基づいて、置換された形態素、削除された形態素、及び操作が加えられなかった形態素の各々に、それぞれの教師ラベルを付与する。 In step S300 of FIG. 13, for each pair of parts of speech given in step S100, the replaced morpheme is deleted based on the notation of the character without sentence and the character with sentence in the pair. Each teacher label is assigned to each morpheme and each morpheme that has not been manipulated.

次に、ステップＳ３０２では、ステップＳ３００においてラベルが付与された当該ペアにおける、当該キャラ無し文の形態素の各々について、特徴量を抽出する。特徴量の具体例は、書き換え箇所推定モデル作成部６０の動作の説明において説明したものである。 Next, in step S302, a feature amount is extracted for each morpheme of the character-less sentence in the pair assigned the label in step S300. A specific example of the feature amount has been described in the description of the operation of the rewritten location estimation model creation unit 60.

次に、ステップＳ３０４では、ステップＳ３００において取得した品詞付与済みのペアの各々における、キャラ無し文に含まれる形態素毎の教師ラベルと、ステップＳ３０２において取得した当該形態素毎の特徴量とに基づいて、書き換え箇所を推定するための書き換え箇所推定モデルを学習し、書き換え箇所推定モデル記憶部６２に記憶する。 Next, in step S304, based on the teacher label for each morpheme included in the character-less sentence in each part-of-speech given pair acquired in step S300 and the feature amount for each morpheme acquired in step S302, A rewrite location estimation model for estimating the rewrite location is learned and stored in the rewrite location estimation model storage unit 62.

次に、ステップＳ３０６では、ステップＳ３０４において取得した書き換え箇所推定モデルを出力部９０に出力して書き換え箇所推定モデルの学習処理ルーチンを終了する。 Next, in step S306, the rewrite location estimation model acquired in step S304 is output to the output unit 90, and the rewrite location estimation model learning processing routine ends.

＜本発明の第１の実施の形態に係る文書き換え処理装置の作用＞
次に、本発明の第１の実施の形態に係る文書き換え処理装置２００の作用について説明する。まず、入力部２１０において、学習装置１００において学習した書き換え箇所推定モデルを受け付け、書き換え箇所推定モデル記憶部２２４に記憶する。次に、入力部２１０において、学習装置１００において決定した書き換えパラメータセットを受け付け、書き換えパラメータセット記憶部２２８に記憶する。そして、入力部２１０において、キャラ無し文を受け付けると、文書き換え処理装置２００は、図１４に示す文書き換え処理ルーチンを実行する。 <Operation of sentence rewriting processing device according to first embodiment of the present invention>
Next, the operation of the sentence rewrite processing device 200 according to the first embodiment of the present invention will be described. First, in the input unit 210, the rewrite location estimation model learned in the learning device 100 is received and stored in the rewrite location estimation model storage unit 224. Next, the input unit 210 receives the rewrite parameter set determined by the learning device 100 and stores it in the rewrite parameter set storage unit 228. Then, when the characterless sentence is received in the input unit 210, the sentence rewriting processing device 200 executes a sentence rewriting processing routine shown in FIG.

まず、ステップＳ４００では、書き換え箇所推定モデル記憶部２２４に記憶されている書き換え箇所推定モデルを読み込む。 First, in step S400, the rewrite location estimation model stored in the rewrite location estimation model storage unit 224 is read.

次に、ステップＳ４０２では、書き換えパラメータセット記憶部２２８に記憶されている特定のキャラクタの書き換えパラメータセットを読み込む。特定のキャラクタとは、キャラクタＩＤによって特定されたキャラクタである。 Next, in step S <b> 402, a rewrite parameter set for a specific character stored in the rewrite parameter set storage unit 228 is read. The specific character is a character specified by the character ID.

次に、ステップＳ４０４では、入力部２１０において受け付けたキャラ無し文について形態素解析を行う。 Next, in step S404, a morphological analysis is performed on the characterless sentence received by the input unit 210.

次に、ステップＳ４０６では、ステップＳ４０４において取得した形態素解析済みのキャラ無し文に含まれる形態素の各々について、上記ステップＳ３０２と同様の特徴量を抽出し、当該特徴量と、ステップＳ４００において取得した書き換え箇所推定モデルとに基づいて、推定ラベルを付与し、書き換え箇所ラベル付き入力文を取得する。 Next, in step S406, for each morpheme included in the morpheme-analyzed sentence obtained in step S404, a feature quantity similar to that in step S302 is extracted, and the feature quantity and the rewrite obtained in step S400 are extracted. Based on the location estimation model, an estimated label is assigned and an input sentence with a rewritten location label is acquired.

次に、ステップＳ４０８では、ステップＳ４０２において取得した書き換えパラメータに基づいて、ステップＳ４０６において取得した書き換え箇所ラベル付き入力文について、「です」及び「ます」の表記である形態素の各々を削除するかを判定し、削除すると判定された形態素を削除する。 Next, in step S408, on the basis of the rewrite parameter acquired in step S402, whether or not to delete each of the morphemes that are represented by “is” and “mas” for the input sentence with the rewrite location label acquired in step S406. Determine and delete the morpheme determined to be deleted.

次に、ステップＳ４１０では、ステップＳ４０２において取得した書き換えパラメータに基づいて、ステップＳ４０８の処理済みの書き換え箇所ラベル付き入力文について、「だ」及び「んだ」へ置換するか（すなわちステップＳ４０８で削除した部分に挿入するか）を判定し、置換すると判定された場合、当該「だ」又は「んだ」の表記である形態素へ置換する。 Next, in step S410, based on the rewrite parameter acquired in step S402, the input sentence with the rewritten portion label processed in step S408 is replaced with “DA” and “DA” (that is, deleted in step S408). If it is determined to replace the morpheme, it is replaced with the morpheme having the notation “da” or “da”.

次に、ステップＳ４１２では、ステップＳ４０２において取得した書き換えパラメータに基づいて、ステップＳ４０８〜ステップＳ４１０の処理済みの書き換え箇所ラベル付き入力文について、終助詞を挿入するかを判定し、挿入すると判定された場合、予め定められた適切な位置に、適切な終助詞を挿入する。 Next, in step S412, on the basis of the rewrite parameter acquired in step S402, it is determined whether to insert a final particle for the input sentence with the rewritten portion label processed in steps S408 to S410, and it is determined to insert it. In this case, an appropriate final particle is inserted at an appropriate predetermined position.

次に、ステップＳ４１４では、ステップＳ４０２において取得した書き換えパラメータに基づいて、ステップＳ４０８〜ステップＳ４１２の処理済みの書き換え箇所ラベル付き入力文について、終助詞以外の助詞を削除又は置換するかを判定し、削除又は置換すると判定された場合、終助詞以外の助詞を削除し、又は当該助詞を「って」に置換する。 Next, in step S414, based on the rewrite parameters acquired in step S402, it is determined whether or not to delete or replace particles other than final particles for the input sentence with the rewritten portion label processed in steps S408 to S412. When it is determined to be deleted or replaced, the particles other than the final particle are deleted, or the particle is replaced with “te”.

次に、ステップＳ４１６では、ステップＳ４０８〜ステップＳ４１４の処理済みの書き換え箇所ラベル付き入力文を、入力部２１０において受け付けたキャラ無し文から特定のキャラ有り文への書き換え結果として出力部２９０から出力して文書き換え処理ルーチンを終了する。 Next, in step S416, the input sentence with the rewritten portion label processed in steps S408 to S414 is output from the output unit 290 as a rewrite result from the character-less sentence received in the input unit 210 to the specific character-present sentence. To complete the sentence rewriting routine.

以上説明したように、本発明の第１の実施の形態に係る学習装置によれば、文全体として一貫したキャラクタらしさを持つ文に付与するためのパラメータ又はモデルを学習することができる。 As described above, according to the learning device according to the first embodiment of the present invention, it is possible to learn a parameter or a model to be given to a sentence having a character character consistent as a whole sentence.

また、本発明の第１の実施の形態に係る文書き換え処理装置によれば、書き換え箇所推定モデルに基づいて、形態素解析済みの入力文に含まれる各形態素について、形態素に対する操作を、置換、削除、又は操作無しに分類し、書き換えパラメータに従って、品詞が助動詞となる形態素を削除、もしくは、予め定められた形態素又は形態素列へ置換し、終助詞となる形態素を挿入し、品詞が終助詞以外の助詞となる形態素を削除し、別の助詞に置換することにより、文全体として一貫したキャラクタらしさを持つ文に書き換えることができる。 Further, according to the sentence rewriting processing device according to the first exemplary embodiment of the present invention, the operation for the morpheme is replaced or deleted for each morpheme included in the input sentence after the morpheme analysis based on the rewritten location estimation model. Or, classify as no operation and delete the morpheme whose part of speech is an auxiliary verb according to the rewrite parameter, or replace it with a predetermined morpheme or morpheme sequence, insert the morpheme as final particle, and the part of speech is other than final particle By deleting a morpheme that becomes a particle and replacing it with another particle, the sentence as a whole can be rewritten into a sentence having a consistent character character.

また、非特許文献１の手法では防ぐことができなかった、文中に出現する言語表現と文末で使用される言語表現とのミスマッチを解消することが可能となる（例えば、「今日は雨ですけど気にしないよ」→「今日は雨だけど気にしないよ」）。これにより、人間と会話をするコンピュータ（対話システム）の発話に対し、文全体として一貫したキャラクタ性を付与することができる。 Moreover, it becomes possible to eliminate the mismatch between the linguistic expression appearing in the sentence and the linguistic expression used at the end of the sentence, which could not be prevented by the method of Non-Patent Document 1 (for example, “Today is raining I do n’t care ”→“ It ’s raining today but I do n’t care ”). As a result, it is possible to give consistent character characteristics as a whole sentence to the utterances of a computer (dialog system) that has a conversation with a human.

また、人と対話をするシステム（対話システム）に本実施の形態に係る学習装置及び文書き換え処理装置を適用すると、システムの発話にキャラクタ性を持たせることが可能となり、対話システムをより人間らしく親しみやすい存在にすることができる。 In addition, when the learning device and the sentence rewriting processing device according to the present embodiment are applied to a system that interacts with people (dialogue system), it becomes possible to give character to the utterance of the system, making the dialogue system more human-like. It can be easy to exist.

また、クチコミやＱ＆Ａの要約に本実施の形態に係る学習装置及び文書き換え処理装置を適用すると、方言や性別、年代を異にする複数の人物が書いた投稿内容に表れるキャラクタ性を統一することが可能になり、複数の人物が書いた文から成るものだと気付かせない、より自然な要約文を生成できるようになり、文全体として一貫したキャラクタ性を付与することが実現できれば、キャラクタ性を統一する効果をより高めることが可能となる。 In addition, when the learning device and sentence rewriting processing device according to the present embodiment are applied to word-of-mouth and Q & A summaries, the character characteristics that appear in the contents of posts written by multiple people with different dialects, genders, and ages are unified. If it becomes possible to generate a more natural summary sentence that does not notice that it is composed of sentences written by multiple people, and can provide consistent character characteristics as a whole sentence, character characteristics It is possible to further enhance the effect of unifying.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、第１の実施の形態においては、統計検定としてカイ二乗検定を用いる場合について説明したが、これに限定されるものではなく、二項検定や対数尤度比検定などの、他の比率の検定手法を用いてもよい。 For example, in the first embodiment, the case where the chi-square test is used as a statistical test has been described. However, the present invention is not limited to this, and other ratios such as a binomial test and a log likelihood ratio test are used. An assay method may be used.

また、第１の実施の形態においては、助動詞「だ（んだ）」に置換するか否かを決定する場合に、キャラ無し文に出現する回数の総和Ｅ_１とキャラ有り文に出現する回数の総和Ｅ_２とを比較する場合について説明したが、これに限定されるものではない。例えば、キャラ無し文からキャラ有り文への書き換えの際に、「です」、「ます」が削除された回数の総和Ｆ_１と、当該書き換えの際に「です」、「ます」の形態素が「だ（んだ）」に置換された回数の総和Ｆ_２とを比較し、削除された回数の総和Ｆ_１が、置換された回数の総和Ｆ_２に比べて統計的有意に大きかった場合に、「だ（んだ）」へ置換すると決定してもよい。 In addition, the number of times in the first embodiment, the auxiliary verb "It's (do it)" in the case to determine whether or not to replace the, which appears in the number of times the sum E ₁ and Characterization There sentence of which appears in the character without sentence It has been described for the case of comparing the sum E ₂ of, but is not limited thereto. For example, when rewriting from a character-free sentence to a character-with-sentence, the total number F _{1 of} the number of times “I” and “Masu” have been deleted, and the “I” and “Masu” morphemes are “ When the sum F _{2 of} the number of times of replacement with “Da” is compared, and the sum F _{1 of} the number of times of deletion is statistically significantly larger than the sum of the number of times of replacement F ₂ , You may decide to replace it with “Da”.

第１の実施の形態においては、機械学習にCRFを用いる場合について説明したが、これに限定されるものではない。例えば、系列をモデル化できる手法であれば、どのような機械学習手法を用いてもよい。 In the first embodiment, the case where CRF is used for machine learning has been described. However, the present invention is not limited to this. For example, any machine learning method may be used as long as it can model a sequence.

第１の実施の形態においては、特徴量として、ラベルが付与された形態素の前後２つ以内の形態素の各々を含む５つの形態素の全てを対象として特徴量を抽出する場合について説明したが、これに限定されるものではない。例えば、ラベルが付与された形態素の前後２つずつの形態素の各々を含む５つの形態素のうち、連続する２つの形態素を特徴量の抽出対象としてもよい。 In the first embodiment, a case has been described in which feature amounts are extracted for all five morphemes including each of the two morphemes before and after the morpheme to which a label is assigned. It is not limited to. For example, two consecutive morphemes among the five morphemes including two morphemes before and after the morpheme to which the label is attached may be used as feature quantity extraction targets.

また、第１の実施の形態においては、書き換え箇所の推定には、Conditional Random Field （CRF）を用いる場合について説明したが、これに限定されるものではない。例えば、系列モデリングの手法であればどのような手法を用いてもよい。 In the first embodiment, the case where the conditional random field (CRF) is used for the estimation of the rewritten portion has been described. However, the present invention is not limited to this. For example, any method may be used as long as it is a sequence modeling method.

また、第１の実施の形態においては、挿入する終助詞を、統計的有意を表す値が一番大きい終助詞とする場合について説明したが、これに限定されるものではない。例えば、使用可能な終助詞のうちランダムに挿入する終助詞を選択してもよいし、予め人手で挿入する終助詞を選択するためのルールを規定しておき、当該ルールに基づいて、挿入する終助詞を選択してもよい。 In the first embodiment, a case has been described in which the final particle to be inserted is a final particle having the largest statistical significance value, but the present invention is not limited to this. For example, a final particle to be inserted at random may be selected from available final particles, or a rule for selecting a final particle to be manually inserted in advance is specified, and insertion is performed based on the rule. A final particle may be selected.

また、第１の実施の形態においては、書き換えパラメータとして、第１〜第５の書き換えパラメータの全てを用いる場合について説明したが、これに限定されるものではない。例えば、第１〜第５の書き換えパラメータのうち、少なくとも１つの書き換えパラメータを用いてもよい。 In the first embodiment, the case where all of the first to fifth rewrite parameters are used as the rewrite parameters has been described. However, the present invention is not limited to this. For example, at least one of the first to fifth rewrite parameters may be used.

次に、第２の実施の形態に係る学習装置及び文書き換え処理装置について説明する。 Next, a learning device and a sentence rewriting processing device according to the second embodiment will be described.

第２の実施の形態においては、学習装置において複数のキャラクタの各々について書き換えパラメータセットを決定し、文書き換え処理装置において、付与対象となるキャラクタをキャラクタＩＤによって指定する点が第１の実施の形態と異なる。なお、第１の実施の形態に係る学習装置１００及び文書き換え処理装置２００と同様の構成及び作用については、同一の符号を付して説明を省略する。 In the second embodiment, the learning apparatus determines a rewriting parameter set for each of a plurality of characters, and the sentence rewriting processing apparatus specifies a character to be given by a character ID in the first embodiment. And different. In addition, about the structure and effect | action similar to the learning apparatus 100 and the sentence rewriting processing apparatus 200 which concern on 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

＜本発明の第２の実施の形態に係る学習装置の構成＞
次に、本発明の第２の実施の形態に係る学習装置の構成について説明する。図１５に示すように、本発明の実施の形態に係る学習装置３００は、ＣＰＵと、ＲＡＭと、後述する学習処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この学習装置３００は、機能的には図１５に示すように入力部３１０と、演算部３２０と、出力部９０とを備えている。 <Configuration of Learning Device According to Second Embodiment of the Present Invention>
Next, the configuration of the learning device according to the second embodiment of the present invention will be described. As shown in FIG. 15, a learning device 300 according to an embodiment of the present invention is a computer that includes a CPU, a RAM, and a ROM that stores a program for executing a learning processing routine described later and various data. Can be configured. Functionally, the learning apparatus 300 includes an input unit 310, a calculation unit 320, and an output unit 90 as shown in FIG.

入力部３１０は、キャラ無し文と、当該キャラ無し文に対して、人手で特定のキャラクタらしさを強調するように書き換えたキャラ有り文とのペアの各々を受け付ける。なお、第２の実施の形態においては、対象とするキャラクタは複数存在するため、キャラ有り文に、当該キャラ有り文のキャラクタを表すキャラクタＩＤが付与されているものとする。 The input unit 310 accepts each of a pair of a character-without sentence and a character-with-text rewritten so as to manually emphasize the character-like character with respect to the character-less sentence. In the second embodiment, since there are a plurality of target characters, it is assumed that a character ID representing a character of the character presence sentence is assigned to the character presence sentence.

演算部３２０は、文ペア記憶部３２２と、書き換えパラメータ決定部３３０と、書き換え箇所推定モデル作成部６０と、書き換え箇所推定モデル記憶部６２とを含んで構成されている。 The calculation unit 320 includes a sentence pair storage unit 322, a rewrite parameter determination unit 330, a rewrite location estimation model creation unit 60, and a rewrite location estimation model storage unit 62.

文ペア記憶部３２２には、入力部１０において受け付けた、複数のキャラクタについての、キャラ無し文、及びキャラ有り文のペアの各々が記憶されている。 The sentence pair storage unit 322 stores each of a character-less sentence and a character-present sentence pair for a plurality of characters received by the input unit 10.

書き換えパラメータ決定部３３０は、キャラクタ毎に、キャラ無し文から、特定のキャラクタのキャラ有り文への書き換えの規則を定めた書き換えパラメータを決定し、キャラクタＩＤ毎の書き換えパラメータの集合である書き換えパラメータセットを、出力部９０に出力する。具体的には、まず、文ペア記憶部３２２に記憶されている、複数のキャラクタについての、文ペアの各々を、当該文ペアに含まれるキャラ有り文に付与されているキャラクタＩＤに基づいて、同一のキャラクタＩＤを有するキャラクタ毎に分類する。そして、第１の実施の形態に係る学習装置１００の書き換えパラメータ決定部３０と同様に、分類されたキャラクタ毎の文ペア集合に処理を行い、キャラクタ毎の、キャラ無し文から、特定のキャラクタのキャラ有り文への書き換えの規則を定めた書き換えパラメータを決定し、キャラクタＩＤ毎の書き換えパラメータの集合である書き換えパラメータセットを、出力部９０に出力する。 The rewriting parameter determination unit 330 determines a rewriting parameter that defines a rewriting rule from a sentence without a character to a sentence with a character for each character for each character, and a rewriting parameter set that is a set of rewriting parameters for each character ID. Is output to the output unit 90. Specifically, first, each of the sentence pairs for a plurality of characters stored in the sentence pair storage unit 322 is determined based on the character ID assigned to the character-present sentence included in the sentence pair. Classification is made for each character having the same character ID. Then, similar to the rewrite parameter determination unit 30 of the learning device 100 according to the first embodiment, the processing is performed on the sentence pair set for each classified character, and the character-less sentence for each character is changed to the specific character. A rewriting parameter that defines a rewriting rule for a character-with-character is determined, and a rewriting parameter set that is a set of rewriting parameters for each character ID is output to the output unit 90.

＜本発明の第２の実施の形態に係る文書き換え処理装置の構成＞
次に、本発明の第２の実施の形態に係る文書き換え処理装置の構成について説明する。図１６に示すように、本発明の第２の実施の形態に係る文書き換え処理装置４００は、ＣＰＵと、ＲＡＭと、後述する文書き換え処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この文書き換え処理装置４００は、機能的には図１６に示すように入力部４１０と、演算部４２０と、出力部２９０とを備えている。 <Configuration of sentence rewrite processing device according to second embodiment of the present invention>
Next, the configuration of the sentence rewrite processing device according to the second embodiment of the present invention will be described. As shown in FIG. 16, a sentence rewrite processing device 400 according to the second embodiment of the present invention includes a CPU, a RAM, and a ROM that stores a program and various data for executing a sentence rewrite processing routine to be described later. And a computer including Functionally, the sentence rewriting processing device 400 includes an input unit 410, a calculation unit 420, and an output unit 290 as shown in FIG.

入力部４１０は、付与対象のキャラクタのキャラクタＩＤと、キャラ無し文とを受け付ける。 The input unit 410 receives the character ID of the character to be given and the no character sentence.

演算部４２０は、形態素解析部２２２と、書き換え箇所推定モデル記憶部２２４と、書き換え箇所推定部２２６と、書き換えパラメータセット記憶部４２８と、助動詞置換部４３０と、終助詞挿入部４３４と、助詞置換部４３６とを含んで構成されている。 The calculation unit 420 includes a morphological analysis unit 222, a rewrite location estimation model storage unit 224, a rewrite location estimation unit 226, a rewrite parameter set storage unit 428, an auxiliary verb replacement unit 430, a final particle insertion unit 434, and a particle replacement. Part 436.

書き換えパラメータセット記憶部４２８には、学習装置３００において決定された、キャラクタＩＤ毎の書き換えパラメータセットと同一の書き換えパラメータセットが記憶されている。 The rewrite parameter set storage unit 428 stores the same rewrite parameter set as the rewrite parameter set for each character ID determined by the learning device 300.

以下、助動詞置換部４３０、終助詞挿入部４３４、及び助詞置換部４３６における処理においては、入力部４１０において取得した、付与対象のキャラクタのキャラクタＩＤと同一のキャラクタＩＤを有している書き換えパラメータセットを用いる点以外は、第１の実施の形態と同様の処理となるため、詳細な説明は省略する。 Hereinafter, in the processing in the auxiliary verb replacement unit 430, the final particle insertion unit 434, and the particle replacement unit 436, the rewrite parameter set having the same character ID as the character ID of the character to be given, acquired in the input unit 410 Since the processing is the same as that of the first embodiment except that is used, detailed description thereof is omitted.

＜本発明の第２の実施の形態に係る学習装置の作用＞
次に、本発明の第２の実施の形態に係る学習装置３００の作用について説明する。まず、入力部３１０において複数のキャラクタについての、キャラ無し文とキャラ有り文とのペアの各々を受け付け文ペア記憶部３２２に記憶する。そして、文ペア記憶部３２２から、複数のキャラクタＩＤを付与された複数のキャラクタについてのキャラ無し文とキャラ有り文とのペアの各々を読み込むと、学習装置３００は、図１７に示す学習処理ルーチンを実行する。 <Operation of the learning apparatus according to the second embodiment of the present invention>
Next, the operation of the learning device 300 according to the second embodiment of the present invention will be described. First, in the input unit 310, each of a pair of a character-less sentence and a character-present sentence for a plurality of characters is stored in the received sentence pair storage unit 322. Then, when each of the pairs of the character-less sentence and the character-present sentence for the plurality of characters assigned with the plurality of character IDs is read from the sentence pair storage unit 322, the learning apparatus 300 performs a learning processing routine shown in FIG. Execute.

まず、ステップＳ５００では、読み込んだ複数のキャラクタについての、キャラ無し文と特定のキャラクタのキャラ有り文とのペアの各々に基づいて、キャラクタＩＤ毎に、キャラ無し文から、予め定められた特定のキャラクタのキャラ有り文への書き換えの規則を定めた書き換えパラメータを決定し、特定のキャラクタの書き換えパラメータセットとして出力部９０に出力する。 First, in step S500, based on each of a pair of a character-less sentence and a character-present sentence of a specific character for a plurality of read characters, a predetermined specific character is determined from a character-less sentence for each character ID. A rewriting parameter that defines rules for rewriting a character with a character is determined, and is output to the output unit 90 as a rewriting parameter set for a specific character.

上記ステップＳ５００について、図１８における書き換えパラメータの決定処理ルーチンにおいて詳細に説明する。 Step S500 will be described in detail in the rewrite parameter determination processing routine in FIG.

図１８のステップＳ６００では、読み込んだペアの各々について、当該ペアのキャラ有り文に付与されているキャラクタＩＤに基づいて、同一のキャラクタＩＤを有するペアを取りまとめ、キャラクタＩＤ毎のペアに分類する。次の、ステップＳ２０４〜ステップＳ６０２までは、同一のキャラクタとしてキャラクタＩＤに基づいて分類されたペア集合について処理を行うものとする。 In step S600 of FIG. 18, for each read pair, pairs having the same character ID are collected and classified into pairs for each character ID based on the character ID assigned to the character presence sentence of the pair. From the next step S204 to step S602, it is assumed that processing is performed on the pair set classified as the same character based on the character ID.

ステップＳ６０２では、ステップＳ２０４〜ステップＳ２１２において取得した書き換えパラメータの各々と、処理対象となる特定のキャラクタのキャラクタＩＤとをセットにして、特定のキャラクタの書き換えパラメータセットとして書き換えパラメータセット記憶部４４に記憶する。 In step S602, each rewrite parameter acquired in steps S204 to S212 and the character ID of the specific character to be processed are set and stored in the rewrite parameter set storage unit 44 as a specific character rewrite parameter set. To do.

次に、ステップＳ６０４では、文ペア記憶部３２２から読み込んだペアに付与された全てのキャラクタについて、ステップＳ２０４〜ステップＳ６０２までの処理を終了したか否かを判断する。文ペア記憶部３２２から読み込んだペアに付与された全てのキャラクタについてステップＳ２０４〜ステップＳ６０２までの処理を終了している場合には、ステップＳ２１６へ移行し、文ペア記憶部３２２から読み込んだペアに付与された全てのキャラクタについてステップＳ２０４〜ステップＳ６０２までの処理を終了していない場合には、処理対象となるキャラクタを変更し、ステップＳ２０４〜ステップＳ６０２までの処理を繰り返す。 Next, in step S604, it is determined whether or not the processing from step S204 to step S602 has been completed for all characters assigned to the pair read from the sentence pair storage unit 322. If the processing from step S204 to step S602 has been completed for all characters assigned to the pair read from the sentence pair storage unit 322, the process proceeds to step S216, and the pair read from the sentence pair storage unit 322 If the processing from step S204 to step S602 has not been completed for all the assigned characters, the character to be processed is changed, and the processing from step S204 to step S602 is repeated.

＜本発明の第２の実施の形態に係る文書き換え処理装置の作用＞
次に、本発明の第２の実施の形態に係る文書き換え処理装置４００の作用について説明する。まず、入力部４１０において、学習装置３００において学習した書き換え箇所推定モデルを受け付け、書き換え箇所推定モデル記憶部２２４に記憶する。次に、入力部４１０において、学習装置３００において決定した、キャラクタ毎の書き換えパラメータセットを受け付け、書き換えパラメータセット記憶部４２８に記憶する。そして、入力部４１０において、付与対象となるキャラクタのキャラクタＩＤと、キャラ無し文とを受け付けると、文書き換え処理装置４００は、図１９に示す文書き換え処理ルーチンを実行する。 <Operation of sentence rewriting processing device according to second embodiment of the present invention>
Next, the operation of the sentence rewrite processing device 400 according to the second embodiment of the present invention will be described. First, in the input unit 410, the rewrite location estimation model learned in the learning device 300 is received and stored in the rewrite location estimation model storage unit 224. Next, the input unit 410 accepts the rewrite parameter set for each character determined by the learning device 300 and stores it in the rewrite parameter set storage unit 428. Then, when the input unit 410 receives the character ID of the character to be given and the character-less sentence, the sentence rewriting processing device 400 executes a sentence rewriting processing routine shown in FIG.

次に、ステップＳ７００では、書き換えパラメータセット記憶部２２８に記憶されている、入力部４１０において受け付けた、付与対象となるキャラクタのキャラクタＩＤと同一のキャラクタＩＤを有する書き換えパラメータセットを読み込む。ステップＳ４０４〜ステップＳ４１６は、ステップＳ７００において読み込んだ書き換えパラメータセットを用いる点以外については同様の処理となるため、詳細な説明は省略する。 Next, in step S700, a rewrite parameter set having the same character ID as the character ID of the character to be given and received in the input unit 410, which is stored in the rewrite parameter set storage unit 228, is read. Steps S404 to S416 are the same processing except that the rewrite parameter set read in step S700 is used, and thus detailed description thereof is omitted.

なお、第２の実施の形態に係るクラス分類装置の他の構成及び作用については、第１の実施の形態と同様であるため、説明を省略する。 In addition, about the other structure and effect | action of the class classification apparatus which concern on 2nd Embodiment, since it is the same as that of 1st Embodiment, description is abbreviate | omitted.

以上説明したように、本発明の第２の実施の形態に係る学習装置によれば、文全体として一貫したキャラクタらしさを持つ文に付与するためのパラメータ又はモデルを学習することができる。 As described above, according to the learning device according to the second embodiment of the present invention, it is possible to learn a parameter or a model to be given to a sentence having a character character consistent as a whole sentence.

また、本発明の第２の実施の形態に係る文書き換え処理装置によれば、書き換え箇所推定モデルに基づいて、形態素解析済みの入力文に含まれる各形態素について、形態素に対する操作を、置換、削除、又は操作無しに分類し、書き換えパラメータに従って、品詞が助動詞となる形態素を削除し、もしくは、予め定められた形態素又は形態素列へ置換し、終助詞となる形態素を挿入し、品詞が終助詞以外の助詞となる形態素を削除し特定の助詞に置換することにより、文全体として一貫したキャラクタらしさを持つ文に書き換えることができる Further, according to the sentence rewriting processing device according to the second exemplary embodiment of the present invention, the operation for the morpheme is replaced or deleted for each morpheme included in the input sentence after the morpheme analysis based on the rewritten location estimation model. Classify as no operation, delete morpheme whose part of speech becomes auxiliary verb according to rewrite parameter, or replace with morpheme or morpheme sequence determined in advance, insert morpheme as final particle, part of speech other than final particle By deleting a morpheme that becomes a particle and replacing it with a specific particle, the sentence can be rewritten into a sentence with a consistent character character.

例えば、第２の実施の形態においては、書き換え箇所推定モデルの学習において、全てのキャラクタについての文ペアについて包括的に処理を行い、１つの書き換え箇所推定モデルを学習する場合について説明したが、これに限定されるものではない。例えば、キャラクタ毎に書き換え箇所推定モデルを学習してもよい。この場合、文書き換え処理装置において、書き換え対象となるキャラ無し文に付与されているラベルに基づいて、書き換え箇所推定部において用いる書き換え箇所推定モデルを決定する。また、キャラ無し文からキャラ有り文への変換は、書き換え箇所推定モデルが存在しなくても、書き換えパラメータセットが存在するだけで実行可能である。この場合は、キャラ無し文に対して人手で書き換え箇所ラベルを付与したものを入力文として、図５に示す入力部２１０から入力し、助動詞置換部２３０以降の処理を行わせることができる。 For example, in the second embodiment, in the learning of the rewrite location estimation model, a case has been described in which the sentence pairs for all characters are comprehensively processed and one rewrite location estimation model is learned. It is not limited to. For example, a rewrite location estimation model may be learned for each character. In this case, in the sentence rewriting processing device, a rewrite location estimation model to be used in the rewrite location estimation unit is determined based on the label given to the characterless sentence to be rewritten. Further, the conversion from the character-free sentence to the character-added sentence can be executed only by the rewrite parameter set even if the rewrite location estimation model does not exist. In this case, it is possible to input from the input unit 210 shown in FIG. 5 an input sentence with a rewritten part label added manually to the character-less sentence, and to perform the processing after the auxiliary verb replacement part 230.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能であるし、ネットワークを介して提供することも可能である。 Further, in the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium or provided via a network. It is also possible to do.

１０入力部
２０演算部
２２文ペア記憶部
３０書き換えパラメータ決定部
３２置換文字列アライメント部
３３書き換えパラメータ取得部
３４助動詞削除判定部
３６助動詞置換判定部
３８終助詞選択部
４０助詞置換判定部
４２助詞省略可否判定部
４４書き換えパラメータセット記憶部
６０書き換え箇所推定モデル作成部
６２書き換え箇所推定モデル記憶部
９０出力部
１００学習装置
２００文書き換え処理装置
２１０入力部
２２０演算部
２２２形態素解析部
２２４書き換え箇所推定モデル記憶部
２２６書き換え箇所推定部
２２８書き換えパラメータセット記憶部
２３０助動詞置換部
２３４終助詞挿入部
２３６助詞置換部
２９０出力部
３００学習装置
３１０入力部
３２０演算部
３２２文ペア記憶部
３３０書き換えパラメータ決定部
４００文書き換え処理装置
４１０入力部
４２０演算部
４２８書き換えパラメータセット記憶部
４３０助動詞置換部
４３４終助詞挿入部
４３６助詞置換部 DESCRIPTION OF SYMBOLS 10 Input part 20 Computation part 22 Sentence pair memory | storage part 30 Rewriting parameter determination part 32 Replacement character string alignment part 33 Rewriting parameter acquisition part 34 Auxiliary verb deletion determination part 36 Auxiliary verb replacement determination part 38 Final particle selection part 40 Adjective replacement determination part 42 Availability determination unit 44 Rewrite parameter set storage unit 60 Rewrite site estimation model creation unit 62 Rewrite site estimation model storage unit 90 Output unit 100 Learning device 200 Sentence rewrite processing device 210 Input unit 220 Operation unit 222 Morphological analysis unit 224 Rewrite site estimation model storage Unit 226 rewrite location estimation unit 228 rewrite parameter set storage unit 230 auxiliary verb replacement unit 234 final particle insertion unit 236 particle replacement unit 290 output unit 300 learning device 310 input unit 320 calculation unit 322 sentence pair storage unit 330 rewrite parameter determination Part 400 sentence rewriting processing apparatus 410 input unit 420 calculating unit 428 rewrites the parameter set storage unit 430 auxiliary verb substitution unit 434 final particle insertion unit 436 particle replacement unit

Claims

A sentence rewriting processing device that rewrites an input sentence that has been subjected to morpheme analysis into a sentence having character-like characteristics representing a specific person image, and for each morpheme included in the input sentence that has been analyzed for morpheme, Replacing and deleting the operation for the morpheme based on the rewritten location estimation model for classifying the operation for the morpheme for imparting character likeness as replacement, deletion, or no operation in advance. Or a rewrite location estimation unit that classifies as no operation,
Rewrite that represents the rules for rewriting the morpheme to give the character-likeness and the notation of the morpheme and the character learned in advance for each of the morphemes that are classified as replacement or deletion by the rewrite location estimation unit and are auxiliary verbs Determining whether to delete or replace or not operate the morpheme based on the parameter, deleting the morpheme determined to be deleted based on the morpheme notation and the rewriting parameter, or An auxiliary verb replacement unit for replacing the morpheme determined to be replaced with a predetermined morpheme or morpheme sequence;
For each position of the input sentence from which the morpheme has been deleted or replaced by the auxiliary verb replacement unit, the position of the input sentence is determined based on the notation of the morpheme deleted or replaced at the position and the rewrite parameter. Subsequently, a final particle insertion part for inserting a predetermined final particle;
For each morpheme that is classified as a deletion by the rewrite location estimation unit and is a particle other than a final particle, the morpheme is deleted from the input sentence based on the rewrite parameter, and replaced by the rewrite location estimation unit For each morpheme that is classified and is a particle other than a final particle, a particle replacement unit that replaces the morpheme of the input sentence with a predetermined particle based on the rewriting parameter;
A sentence rewrite processing device including

A learning device that learns a rewriting parameter that represents a rule for rewriting a morpheme of a sentence in order to give a character a character representing a specific person image, the characterless sentence not having the character character and the character character Replacement character string alignment for obtaining a correspondence relationship between each character included in the character-free sentence in the pair and each character included in the character-present sentence in the pair for each pair with a character-with-sentence And
A first rewriting parameter for determining whether or not to delete a specific auxiliary verb acquired based on the correspondence acquired for each of the pairs by the replacement character string alignment unit, a sentence having a specific character character A second rewrite parameter for determining whether or not to replace a specific auxiliary verb, obtained from a corpus composed of a pair of and a sentence not having, a pair of a sentence having a certain character character and a sentence not having A third rewrite parameter for determining a usable final particle obtained from a corpus constituted by a specific corpus obtained from a corpus consisting of a sentence having a character-like character and a sentence having no character A fourth rewriting parameter for determining whether or not to use a particle, and each of the pairs by the replacement character string alignment unit It was obtained based on the acquired correspondence relationship with the rewriting parameter acquisition unit that acquires at least one of rewriting the parameters of the fifth rewrite parameter for determining whether to omit particle,
Including a learning device.

A learning device that learns a rewrite location estimation model for classifying operations on a sentence morpheme for imparting character-likeness representing a specific person image to a sentence as replacement, deletion, or no operation, the character-likeness For each of a pair of a character-less sentence having no character and a character-present sentence having character character, each of the characters included in the character-less sentence in the pair and a character included in the character-present sentence in the pair A replacement character string alignment unit for acquiring a correspondence relationship with each of
For each of the pairs, based on the correspondence acquired by the replacement character string alignment unit, for each of the morphemes included in the character-less sentence in the pair, As operations for morphemes, a replacement, deletion, or a label indicating no operation is given, and based on each notation of the morpheme with the label, a rewrite location estimation model creation unit that learns the rewrite location estimation model,
Including a learning device.

A sentence rewriting processing device that rewrites an input sentence that has undergone a morphological analysis into a sentence having a character character representing a specific person image, including a rewriting location estimation part, an auxiliary verb replacement part, a final particle insertion part, and a particle replacement part In the sentence rewriting processing method,
For each morpheme included in the input sentence that has undergone the morpheme analysis, the rewrite location estimation unit replaces and deletes the feature amount of the morpheme and an operation on the morpheme that has been learned in advance to give the character likeness Or, based on the rewritten location estimation model for classifying without operation, classifying the operation for the morpheme as replacement, deletion, or no operation,
The auxiliary verb replacement unit is classified into replacement or deletion by the rewrite location estimation unit, and for each morpheme that is an auxiliary verb, the morpheme is given in order to give the character-likeness and the learned morpheme. Based on a rewriting parameter that represents a rule for rewriting the morpheme, whether to delete, replace, or not operate the morpheme, and determined to delete based on the morpheme notation and the rewrite parameter Deleting or replacing a morpheme with a predetermined morpheme or a morpheme sequence,
The final particle insertion unit, for each position of the input sentence where the morpheme has been deleted or replaced by the auxiliary verb replacement unit, based on the morpheme notation deleted or replaced at the position, and the rewriting parameters, Following the position of the input sentence, insert a predetermined final particle,
The particle replacement unit deletes the morpheme from the input sentence based on the rewrite parameter for each morpheme that is classified as a deletion by the rewrite location estimation unit and is a particle other than a final particle, and the rewriting A sentence rewriting method for replacing the morpheme of the input sentence with a predetermined particle based on the rewriting parameter for each morpheme that is classified as a replacement by the location estimation unit and is a particle other than a final particle.

A learning method in a learning device that learns a rewriting parameter representing a rule for rewriting a morpheme of a sentence in order to give the sentence character-likeness representing a specific person image, including a replacement character string alignment unit and a rewriting parameter acquisition unit Because
The replacement character string alignment unit, for each of a pair of a character-less sentence not having the character character and a character-like sentence having the character character, each of the characters included in the character-less sentence in the pair, Obtaining a correspondence relationship with each of the characters included in the character presence sentence in the pair;
The rewrite parameter acquisition unit is a first rewrite parameter for determining whether or not to delete a specific auxiliary verb acquired based on the correspondence relationship acquired for each of the pairs by the replacement character string alignment unit, A second rewrite parameter for determining whether or not to replace a specific auxiliary verb, obtained from a corpus composed of a sentence having a specific character character and a sentence having no specific character, a sentence having a specific character character A third rewrite parameter for determining usable final particles, obtained from a corpus that consists of pairs of sentences with and without, composed of pairs of sentences with a particular character character and sentences without A fourth rewriting parameter obtained from the corpus for determining whether or not to use a specific particle, and the replacement character string Imento part by acquired on the basis of the correspondence relation, each acquired for the pair, learning how to get at least one rewriting parameters of the fifth rewrite parameter for determining whether to omit particle.

The operations on the sentence morpheme for adding the character character representing the specific person image to the sentence, including the replacement character string alignment unit and the rewritten location estimation model creating unit, are classified as replacement, deletion, or no operation. A learning method in a learning device for learning a rewrite location estimation model for
The replacement character string alignment unit, for each of a pair of a character-less sentence not having the character character and a character-like sentence having the character character, each of the characters included in the character-less sentence in the pair, Obtaining a correspondence relationship with each of the characters included in the character presence sentence in the pair;
The rewritten location estimation model creation unit, for each of the pairs, based on the correspondence acquired by the replacement character string alignment unit, for each morpheme included in the no-character sentence in the pair, A label indicating substitution, deletion, or no operation is assigned as an operation on the morpheme when it is rewritten into a sentence, and the rewritten location estimation model is learned based on the notation of each morpheme with the label. How to learn.

The program for functioning a computer as each part which comprises the sentence rewriting processing apparatus of Claim 1.

The program for functioning a computer as each part which comprises the learning apparatus of Claim 2 or 3.