JP2014191484A

JP2014191484A - Sentence end expression conversion device, method and program

Info

Publication number: JP2014191484A
Application number: JP2013064961A
Authority: JP
Inventors: Chiaki Miyazaki; 千明宮崎; Toru Hirano; 徹平野; Ryuichiro Higashinaka; 竜一郎東中; Toshiaki Makino; 俊朗牧野; Yoshihiro Matsuo; 義博松尾
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-03-26
Filing date: 2013-03-26
Publication date: 2014-10-06
Anticipated expiration: 2033-03-26
Also published as: JP5722375B2

Abstract

PROBLEM TO BE SOLVED: To convert a sentence end expression in such a manner that a converted sentence has a desired character, without manual work cost.SOLUTION: A sentence end expression extraction section 21 extracts, from a sentence, sentence end expression information including a sentence end expression, a part of speech of the sentence end expression and a part of speech of a preceding morpheme. A conversion candidate selection section 31 selects a sentence end expression included in sentence end expression information having a part of speech of the sentence end expression and a part of speech of a preceding morpheme in the sentence end expression matched with sentence end expression information extracted from an input sentence of a conversion object as a conversion candidate from among sentence end expression information within a sentence end expression list 25 created from a sentence set 41 and a sentence set 42 with writer information. While referring to a sentence end expression which appears characteristically for each of attributes in a characteristic sentence end expression list 26 created from the sentence set 42 with writer information, an attribute imparting section 32 imparts an attribute to each of selected candidates. A conversion result output section 33 converts the sentence end expression of the input sentence into a conversion candidate of which the imparted attribute is matched with a writer attribute inputted as an attribute of a writer of a converted sentence.

Description

本発明は、文末表現変換装置、方法、及びプログラムに関する。 The present invention relates to a sentence end expression conversion apparatus, method, and program.

従来、日本語による文の語彙や構文を変換する技術が提案されている。例えば、文体及び難易度を考慮しつつ、機能表現（機能語とその複合辞）を意味的に等価な別の機能表現に変換する技術が提案されている（例えば、非特許文献１参照）。非特許文献１に記載の技術では、各機能表現に対して詳細な情報が記述された「機能表現辞書」を用いて、機能表現の変換を行っている。この機能表現辞書は、機能表現の全ての異形（表記のゆれ）のリストを備えている。また、各機能表現は、変換可能性の観点から設定した「大まかな意味を保つ」、「多くの文脈において変換しても不自然ではない」、及び「ほとんど全ての文脈において変換可能である」という３階層の意味階層に分類されている。さらに、各機能表現には、文体（常体、敬体、口語体、及び堅い文体）の情報、及び「日本語能力試験出題基準」に基づいた難易度の情報が付与されている。非特許文献１に記載の技術では、この機能表現辞書を用いることにより、意味を保持したまま機能表現の文体及び難易度のみを変更する変換を行っている。 Conventionally, techniques for converting vocabulary and syntax of Japanese sentences have been proposed. For example, a technique for converting a functional expression (a functional word and its compound word) into another functionally equivalent functional expression in consideration of the style and difficulty (for example, see Non-Patent Document 1) has been proposed. In the technique described in Non-Patent Document 1, the function expression is converted using a “function expression dictionary” in which detailed information is described for each function expression. This functional expression dictionary has a list of all variants (notations) of functional expressions. In addition, each functional expression is set from the viewpoint of convertibility, “keep the rough meaning”, “is not unnatural even if converted in many contexts”, and “can be converted in almost all contexts” Are classified into three semantic layers. Further, each function expression is given style information (regular, respected, colloquial, and hard style) and difficulty information based on the “Japanese Language Proficiency Test Questions Criteria”. In the technique described in Non-Patent Document 1, by using this functional expression dictionary, conversion is performed to change only the style and difficulty level of the functional expression while retaining the meaning.

また、文を変換することにより、変換後の文にキャラクタ性を持たせる技術も提案されている。例えば、標準語による文から方言による文へ変換する技術が提案されている（例えば、非特許文献２参照）。非特許文献２に記載の技術では、方言話者に作成させた標準語から方言への翻訳文を用いて、標準語と方言とで使用される単語の対応付けを記した辞書を作成し、この辞書に基づいて、標準語による文に含まれる単語を方言で使用される単語に置換している。 In addition, a technique has been proposed in which the converted sentence has character characteristics by converting the sentence. For example, a technique for converting a sentence in a standard language into a sentence in a dialect has been proposed (see, for example, Non-Patent Document 2). In the technique described in Non-Patent Document 2, using a translation from a standard language to a dialect made by a dialect speaker, create a dictionary that describes correspondence between words used in the standard language and the dialect, Based on this dictionary, words included in a standard word sentence are replaced with words used in dialects.

松吉俊、佐藤理史、”文体と難易度を制御可能な日本語機能表現の変換”、自然言語処理１５（２）、７５−９９、２００８．Satoshi Matsuyoshi, Satoshi Sato, “Conversion of Japanese functional expressions that can control style and difficulty”, Natural Language Processing 15 (2), 75-99, 2008. 石橋季之、天野真家、”共通語方言変換”、全国大会講演論文集第７０回平成２０年（２）、一般社団法人情報処理学会、”２−１９１”−”２−１９２”、２００８−０３−１３．Toshiyuki Ishibashi, Masaya Amano, “Common Language Dialect Conversion”, Proceedings of the National Conference 70th 2008 (2), Information Processing Society of Japan, “2-191”-“2-192”, 2008- 03-13.

しかし、非特許文献１に記載の技術のように、詳細な情報が記述された辞書を作成するには、高度な言語学的知識及び膨大な作業コストを要するため、各地の方言や多様な人物属性毎の言い回しなどに対応した辞書を個別に作成することは難しい。従って、辞書のみに依存した手法で、変換後の文にキャラクタ性を持たせるような変換処理を実現することは困難である。 However, as in the technique described in Non-Patent Document 1, creating a dictionary in which detailed information is described requires advanced linguistic knowledge and enormous work costs. It is difficult to individually create a dictionary corresponding to the wording of each attribute. Therefore, it is difficult to realize a conversion process that gives the converted sentence character characteristics by a method that depends only on the dictionary.

また、非特許文献２に記載の技術も、非特許文献１に記載の技術と同様に、変換に必要な情報（変換可能性または意味的等価性）を事前に人手でリスト化しておく必要があり、各地の方言や多様な人物属性に応じた変換を実現するためには、膨大な作業コストを要する。 Similarly to the technique described in Non-Patent Document 1, the technique described in Non-Patent Document 2 needs to manually list information necessary for conversion (convertibility or semantic equivalence) in advance. In order to realize conversion according to dialects of various places and various personal attributes, enormous work costs are required.

さらに、文の文末に表れる文末表現を変換の対象とする場合には、世の中で使用される文末表現に含まれる語彙が、方言や著者（話者）の人物属性によって異なることや、さらに、促音（っ）、音引き（ー）、小文字（ぁ、ぃ、ぅ、ぇ、ぉ）等の挿入が行われることなどから、変換前の文末表現に対する全ての異形を列挙すると、表記のバリエーションは膨大な数になる。よって、多様な文末表現を全て人手で列挙することは不可能である。さらに、変換後の文のキャラクタ性を想定して、方言や著者の人物属性の全てに対応させて、文末表現の変換に必要な詳細情報を記載した辞書を人手で作成することは現実的ではない。 In addition, when the sentence ending expression that appears at the end of a sentence is to be converted, the vocabulary contained in the sentence ending expression used in the world varies depending on the dialect and author (speaker) personal attributes, (T), sound-drawing (-), lower-case letters (a, i, ぅ, e, ぉ), etc. are inserted, so enumerating all variants to the end-of-sentence expression before conversion enormous variations in notation Become a number. Therefore, it is impossible to enumerate all the various sentence end expressions manually. Furthermore, it is not realistic to manually create a dictionary that describes the detailed information necessary for conversion of sentence ending expressions in correspondence with all dialects and authors' personal attributes, assuming the character of the converted sentence. Absent.

本発明は、上記の事情を鑑みてなされたものであり、人手による作業コストをかけることなく、変換後の文が所望のキャラクタ性を有するように、文末表現を変換することができる文末表現変換装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and is capable of converting a sentence end expression so that the converted sentence has a desired character property without incurring manual work costs. An object is to provide an apparatus, a method, and a program.

上記目的を達成するために、本発明の文末表現変換装置は、日本語による文の文末に表れる文末表現、及び該文末表現の直前の形態素の品詞を含む文末表現情報を抽出する文末表現抽出手段と、複数の日本語による文の各々から抽出された複数の文末表現情報のうち、該複数の文末表現情報の各々に含まれる文末表現の直前の形態素の品詞と、変換対象の日本語による文から抽出された文末表現情報に含まれる文末表現の直前の形態素の品詞とが一致する文末表現情報に含まれる文末表現を、前記変換対象の日本語による文から抽出された文末表現情報に含まれる文末表現の変換候補として選択する変換候補選択手段と、著者の属性を含む著者情報が付与された複数の日本語による文の各々から抽出された複数の文末表現と著者の属性との対応関係から得られた属性毎に特徴的に表れる文末表現に基づいて、前記変換候補の各々に属性を付与する属性付与手段と、前記変換候補のうち、付与された属性が、変換後の文の著者の属性として予め設定された属性に一致する変換候補を選択し、前記変換対象の日本語による文から抽出された文末表現情報に含まれる文末表現を、選択した前記変換候補に変換する変換手段と、を含んで構成されている。 To achieve the above object, the sentence ending expression conversion device of the present invention extracts sentence ending expression that includes sentence ending expression that appears at the sentence ending of a sentence in Japanese and morpheme part of speech immediately before the sentence ending expression. And the morpheme part of speech immediately before the sentence ending expression included in each of the plurality of sentence ending expression information among the plurality of sentence ending expression information extracted from each of the plurality of Japanese sentences, and the Japanese sentence to be converted The sentence ending expression included in the sentence ending expression matching the morpheme part of speech immediately before the sentence ending expression included in the sentence ending expression extracted from the sentence ending expression included in the sentence ending expression extracted from the Japanese sentence to be converted A pair of conversion candidate selection means for selecting as a conversion candidate for the sentence end expression and a plurality of sentence end expressions extracted from each of a plurality of sentences in Japanese to which author information including the author attribute is added and the attribute of the author Based on the sentence end expression characteristically obtained for each attribute obtained from the relationship, an attribute assigning means for assigning an attribute to each of the conversion candidates, and the attribute assigned among the conversion candidates is the converted sentence Conversion means for selecting a conversion candidate that matches an attribute set in advance as an attribute of the author, and converting a sentence end expression included in the sentence end expression extracted from the sentence in Japanese to be converted into the selected conversion candidate And.

本発明の文末表現変換装置によれば、文末表現抽出手段が、日本語による文の文末に表れる文末表現、及び文末表現の直前の形態素の品詞を含む文末表現情報を抽出する。また、変換候補選択手段が、複数の日本語による文の各々から抽出された複数の文末表現情報のうち、複数の文末表現情報の各々に含まれる文末表現の直前の形態素の品詞と、変換対象の日本語による文から抽出された文末表現情報に含まれる文末表現の直前の形態素の品詞とが一致する文末表現情報に含まれる文末表現を、変換対象の日本語による文から抽出された文末表現情報に含まれる文末表現の変換候補として選択する。そして、属性付与手段が、著者の属性を含む著者情報が付与された複数の日本語による文の各々から抽出された複数の文末表現と著者の属性との対応関係から得られた属性毎に特徴的に表れる文末表現に基づいて、変換候補の各々に属性を付与し、変換手段が、変換候補のうち、付与された属性が、変換後の文の著者の属性として予め設定された属性に一致する変換候補を選択し、変換対象の日本語による文から抽出された文末表現情報に含まれる文末表現を、選択した変換候補に変換する。 According to the sentence ending expression conversion device of the present invention, the sentence ending expression extraction means extracts sentence ending expression information including a sentence ending expression appearing at the sentence ending of a Japanese sentence and a morpheme part of speech immediately before the sentence ending expression. Further, the conversion candidate selection means includes a morpheme part of speech immediately before the sentence ending expression included in each of the plurality of sentence ending expression information among the plurality of sentence ending expression information extracted from each of a plurality of Japanese sentences, and a conversion target The sentence ending expression extracted from the Japanese sentence to be converted is converted to the sentence ending expression contained in the sentence ending expression that matches the morpheme part of speech immediately before the sentence ending expression contained in the sentence ending expression information extracted from the Japanese sentence. Select as a conversion candidate for the sentence ending expression included in the information. The attribute assigning means is characterized for each attribute obtained from the correspondence between the plurality of sentence ending expressions extracted from each of the plurality of sentences in Japanese to which the author information including the attribute of the author is added and the attribute of the author. Based on the end-of-sentence expression that appears, an attribute is assigned to each of the conversion candidates, and the conversion means matches the attribute set among the conversion candidates as the attribute of the author of the converted sentence. The conversion candidate to be converted is selected, and the sentence ending expression included in the sentence ending expression information extracted from the Japanese sentence to be converted is converted into the selected conversion candidate.

このように、複数の文から抽出された複数の文末表現情報に基づいて変換候補を選択し、複数の著者情報付きの文から得られた属性毎に特徴的に表れる文末表現に基づいて変換候補の各々に付与された属性が設定された属性と一致する変換候補を選択するため、人手による作業コストをかけることなく、変換後の文が所望のキャラクタ性を有するように、文末表現を変換することができる。 In this way, conversion candidates are selected based on a plurality of sentence ending expression information extracted from a plurality of sentences, and conversion candidates are selected based on sentence ending expressions characteristically obtained for each attribute obtained from a plurality of sentences with author information. In order to select a conversion candidate in which the attribute assigned to each of them matches the set attribute, the sentence end expression is converted so that the converted sentence has a desired character property without incurring manual work costs. be able to.

また、前記属性毎に特徴的に表れる文末表現は、前記文末表現と著者の属性との対応関係の出現比率に基づいて抽出することができる。このように、複数の著者情報付きの文から、自動的に属性毎に特徴的に表れる文末表現を抽出することができる。 The sentence end expression that appears characteristically for each attribute can be extracted based on the appearance ratio of the correspondence between the sentence end expression and the author's attribute. In this way, sentence ending expressions that appear characteristically for each attribute can be automatically extracted from a plurality of sentences with author information.

また、前記変換手段は、前記付与された属性が前記予め設定された属性に一致する変換候補のうち、前記付与された属性に対応する前記文末表現と著者の属性との対応関係が示す統計的指標が最も高い変換候補を選択することができる。これにより、所望のキャラクタ性をより適切に表した文に変換することができる。 In addition, the conversion means includes a statistical value indicating a correspondence relationship between the sentence end expression corresponding to the assigned attribute and an author attribute among the conversion candidates in which the assigned attribute matches the preset attribute. A conversion candidate with the highest index can be selected. Thereby, it can convert into the sentence which expressed desired character property more appropriately.

また、前記変換候補選択手段は、前記変換候補のうち、変換対象の日本語による文から抽出された文末表現情報に含まれる文末表現が表すテンスまたはモダリティと一致するテンスまたはモダリティを表す変換候補を選択することができる。これにより、対話システムなどの対話行為に影響を与える要素が変更されることを防止することができる。 In addition, the conversion candidate selection means selects a conversion candidate representing a tense or modality that matches the tense or modality represented by the sentence end expression included in the sentence end expression extracted from the Japanese sentence to be converted among the conversion candidates. You can choose. Thereby, it is possible to prevent a change in an element that affects a dialogue action such as a dialogue system.

また、前記文末表現抽出手段は、前記文末表現の品詞をさらに含む文末表現情報を抽出し、前記変換候補選択手段は、複数の日本語による文の各々から抽出された複数の文末表現情報のうち、該複数の文末表現情報の各々に含まれる文末表現の品詞及び文末表現の直前の形態素の品詞と、変換対象の日本語による文から抽出された文末表現情報に含まれる文末表現の品詞及び文末表現の直前の形態素の品詞とが一致する文末表現情報に含まれる文末表現を、前記変換対象の日本語による文から抽出された文末表現情報に含まれる文末表現の変換候補として選択することができる。また、前記変換候補選択手段は、前記文末表現が複数の形態素を含む場合は、前記文末表現に含まれる形態素の品詞の少なくとも１つが一致する場合に、前記文末表現の品詞が一致すると判定することができる。変換候補の選択の際に、文末表現の品詞の一致を判定しない場合には、より多くの変換候補を選択することができ、文末表現の品詞の一致を判定する場合には、非文法的な変換候補が選択されてしまうことを抑制することができる。 The sentence ending expression extracting unit extracts sentence ending expression information further including a part of speech of the sentence ending expression, and the conversion candidate selecting unit includes a plurality of sentence ending expression information extracted from each of a plurality of Japanese sentences. , The part of speech of the sentence end expression included in each of the plurality of sentence end expression information, the part of speech of the morpheme immediately before the sentence end expression, and the part of speech of the sentence end expression included in the sentence end expression extracted from the sentence in Japanese to be converted The sentence ending expression included in the sentence ending expression information that matches the part of speech of the morpheme immediately before the expression can be selected as a conversion candidate for the sentence ending expression included in the sentence ending expression information extracted from the sentence to be converted in Japanese. . In addition, when the sentence ending expression includes a plurality of morphemes, the conversion candidate selecting unit determines that the part of speech of the sentence ending expression matches when at least one of the morpheme included in the sentence ending expression matches. Can do. When selecting a conversion candidate, if it does not determine the part-of-speech expression part-of-speech match, more conversion candidates can be selected. It can suppress that a conversion candidate is selected.

また、本発明の文末表現変換方法は、文末表現抽出手段と、変換候補選択手段と、属性付与手段と、変換手段とを含む文末表現変換装置における文末表現変換方法であって、前記文末表現抽出手段が、日本語による文の文末に表れる文末表現、及び該文末表現の直前の形態素の品詞を含む文末表現情報を抽出し、前記変換候補選択手段が、複数の日本語による文の各々から抽出された複数の文末表現情報のうち、該複数の文末表現情報の各々に含まれる文末表現の直前の形態素の品詞と、変換対象の日本語による文から抽出された文末表現情報に含まれる文末表現の直前の形態素の品詞とが一致する文末表現情報に含まれる文末表現を、前記変換対象の日本語による文から抽出された文末表現情報に含まれる文末表現の変換候補として選択し、前記属性付与手段が、著者の属性を含む著者情報が付与された複数の日本語による文の各々から抽出された複数の文末表現と著者の属性との対応関係から得られた属性毎に特徴的に表れる文末表現に基づいて、前記変換候補の各々に属性を付与し、前記変換手段が、前記変換候補のうち、付与された属性が、変換後の文の著者の属性として予め設定された属性に一致する変換候補を選択し、前記変換対象の日本語による文から抽出された文末表現情報に含まれる文末表現を、選択した前記変換候補に変換する方法である。 The sentence ending expression conversion method of the present invention is a sentence ending expression conversion method in a sentence ending expression conversion device including a sentence ending expression extracting means, a conversion candidate selecting means, an attribute assigning means, and a converting means, wherein the sentence ending expression extraction is performed. The means extracts sentence ending expression that includes the sentence ending expression that appears at the end of the sentence in Japanese and the morpheme part of speech immediately before the sentence ending expression, and the conversion candidate selection means extracts from each of a plurality of Japanese sentences Of the plurality of sentence ending expression information, the part of speech of the morpheme immediately before the sentence ending expression included in each of the plurality of sentence ending expression information and the sentence ending expression included in the sentence ending expression information extracted from the Japanese sentence to be converted The sentence ending expression included in the sentence ending expression information that matches the part of speech of the morpheme immediately before is selected as a conversion candidate for the sentence ending expression included in the sentence ending expression information extracted from the sentence in Japanese to be converted. The attribute assigning means is characteristic for each attribute obtained from a correspondence relationship between a plurality of sentence end expressions extracted from each of a plurality of sentences in Japanese to which author information including the attribute of the author is assigned and the attribute of the author The conversion means assigns an attribute to each of the conversion candidates based on the sentence ending expression appearing in the conversion means, and the conversion means assigns an attribute that is set in advance as an attribute of the author of the converted sentence. Is selected, and the sentence end expression included in the sentence end expression extracted from the Japanese sentence to be converted is converted to the selected conversion candidate.

また、本発明の文末表現変換プログラムは、コンピュータを、上記の文末表現変換装置を構成する各手段として機能させるためのプログラムである。 The sentence ending expression conversion program of the present invention is a program for causing a computer to function as each means constituting the sentence ending expression conversion apparatus.

以上説明したように、本発明の文末表現変換装置、方法、及びプログラムによれば、複数の文から抽出された複数の文末表現情報に基づいて変換候補を選択し、複数の著者情報付きの文から得られた属性毎に特徴的に表れる文末表現に基づいて変換候補の各々に付与された属性が設定された属性と一致する変換候補を選択するため、人手による作業コストをかけることなく、変換後の文が所望のキャラクタ性を有するように、文末表現を変換することができる、という効果が得られる。 As described above, according to the sentence ending expression conversion apparatus, method, and program of the present invention, a conversion candidate is selected based on a plurality of sentence ending expression information extracted from a plurality of sentences, and a sentence with a plurality of author information is provided. Based on the end-of-sentence expression that appears characteristically for each attribute obtained from, conversion candidates that match the attribute set to the attribute assigned to each of the conversion candidates is selected, so there is no human labor cost. There is an effect that the sentence end expression can be converted so that the later sentence has a desired character property.

本実施の形態に係る文末表現変換装置の機能的な構成例を示すブロック図である。It is a block diagram which shows the functional structural example of the sentence ending expression conversion apparatus which concerns on this Embodiment. 著者属性リストの一例を示すイメージ図である。It is an image figure which shows an example of an author attribute list | wrist. 文末表現リストの一例を示すイメージ図である。It is an image figure which shows an example of a sentence end expression list. 文末表現使用著者リストの一例を示すイメージ図である。It is an image figure which shows an example of a sentence end use author list. 特徴的文末表現リストの一例を示すイメージ図である。It is an image figure which shows an example of the characteristic sentence end expression list. テンス、モダリティ等リストの一例を示すイメージ図である。It is an image figure which shows an example of lists, such as a tense and a modality. 変換候補リストの一例を示すイメージ図である。It is an image figure which shows an example of a conversion candidate list | wrist. 属性及びカイ二乗値が付与された変換候補リストの一例を示すイメージ図である。It is an image figure which shows an example of the conversion candidate list | wrist provided with the attribute and chi-square value. 本実施の形態におけるリスト作成処理を示すフローチャートである。It is a flowchart which shows the list creation process in this Embodiment. 本実施の形態における変換処理を示すフローチャートである。It is a flowchart which shows the conversion process in this Embodiment.

以下、図面を参照して、本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

本実施の形態に係る文末表現変換装置１０は、ＣＰＵと、ＲＡＭと、後述するリスト作成処理及び変換処理を含む文末表現変換処理を実行するためのプログラムを記憶したＲＯＭとを備えたコンピュータで構成される。このコンピュータは、機能的には、図１に示すように、リスト作成部２０と変換部３０とを含んだ構成で表すことができる。さらに、リスト作成部２０は、文末表現抽出部２１と、特徴的文末表現抽出部２２とを含んだ構成で表すことができる。また、変換部３０は、文末表現抽出部２１と、変換候補選択部３１と、属性付与部３２と、変換結果出力部３３とを含んだ構成で表すことができる。なお、文末表現抽出部２１は、リスト作成部２０及び変換部３０の両方に共通の機能部である。以下、各部について詳述する。 The sentence ending expression conversion apparatus 10 according to the present embodiment includes a computer including a CPU, a RAM, and a ROM that stores a program for executing sentence ending expression conversion processing including list creation processing and conversion processing described later. Is done. This computer can be functionally represented by a configuration including a list creation unit 20 and a conversion unit 30 as shown in FIG. Further, the list creation unit 20 can be expressed by a configuration including a sentence end expression extraction unit 21 and a characteristic sentence end expression extraction unit 22. The conversion unit 30 can be expressed by a configuration including a sentence end expression extraction unit 21, a conversion candidate selection unit 31, an attribute assignment unit 32, and a conversion result output unit 33. The sentence ending expression extraction unit 21 is a functional unit common to both the list creation unit 20 and the conversion unit 30. Hereinafter, each part is explained in full detail.

文末表現抽出部２１は、文（テキストデータ）を入力として受け付け、文から文末表現、文末表現の品詞、及び文末表現の直前の形態素の品詞を含む文末表現情報を抽出する。 The sentence end expression extraction unit 21 receives a sentence (text data) as an input, and extracts sentence end expression information including the sentence end expression, the part of speech of the sentence end expression, and the part of speech of the morpheme immediately before the sentence end expression from the sentence.

具体的には、文末表現抽出部２１は、入力された文を形態素解析し、文末から見て初出の内容語より後ろの形態素列を文末表現として抽出する。例えば、品詞が名詞、形容詞、動詞等で、その単語のみで具体的な意味を持つ単語（形態素）を内容語として識別することができる。例えば、「今日はいい天気ですね」という文が入力された場合、文末から見て初出の内容語は名詞の「天気」となり、その後ろの形態素列である「ですね」を文末表現として抽出する。また、文末表現抽出部２１は、文末表現の品詞を、形態素解析結果から抽出する。上記の例では、文末表現「ですね」に含まれる形態素「です」の品詞「判定詞：終止」と、「ね」の品詞「終助詞」とを結合した「判定詞：終止＿終助詞」を文末表現の品詞として抽出する。さらに、文末表現抽出部２１は、文末表現の直前の形態素の品詞を、形態素解析結果から抽出する。上記の例では、文末表現「ですね」の直前の形態素「天気」の品詞である「名詞」を抽出する。 Specifically, the sentence end expression extraction unit 21 performs morphological analysis on the input sentence, and extracts a morpheme string after the first appearing content word as the sentence end expression when viewed from the end of the sentence. For example, a part of speech is a noun, an adjective, a verb, etc., and a word (morpheme) having a specific meaning only by the word can be identified as a content word. For example, if the sentence “Today is good weather” is entered, the first content word seen from the end of the sentence is the noun “weather”, and the morpheme string behind it is extracted as the end of sentence expression. To do. Further, the sentence end expression extraction unit 21 extracts the part of speech of the sentence end expression from the morphological analysis result. In the above example, the part-of-speech “determinant: ending” of the morpheme “is” contained in the sentence end expression “Nada” is combined with the part-of-speech “final particle” of “ne” “determinant: ending_final particle”. Are extracted as part-of-speech expressions. Further, the sentence end expression extraction unit 21 extracts the morpheme part of speech immediately before the sentence end expression from the morpheme analysis result. In the above example, “noun”, which is the part of speech of the morpheme “weather” immediately before the sentence end expression “I like”, is extracted.

なお、記号や英文字は文末表現に含めないようにしてもよい。また、顔文字を文末表現の一種として扱う場合など、必要に応じて記号や英文字を文末表現に含めるようにしてもよい。 Note that symbols and English characters may not be included in the sentence end expression. In addition, symbols and English characters may be included in the sentence ending expression as necessary, such as when emoticons are handled as a kind of sentence ending expression.

ここで、文末表現抽出部２１が、リスト作成部２０の機能部として機能する場合には、文末表現抽出部２１は、入力として文集合４１及び著者情報付き文集合４２を受け付ける。 Here, when the sentence end expression extraction unit 21 functions as a function unit of the list creation unit 20, the sentence end expression extraction unit 21 receives a sentence set 41 and a sentence set 42 with author information as inputs.

文集合４１は、Ｗｅｂなどから収集した大量の文（テキストデータ）の集合（コーパス）である。著者情報付き文集合４２は、文集合４１と同様、Ｗｅｂなどから収集した大量の文（テキストデータ）の集合であって、各文の著者を示す情報が対応付けられた文の集合（コーパス）である。また、著者情報付き文集合４２には、各文に対応付けられた著者毎の属性を示す著者属性リストが含まれる。図２に著者属性リストの一例を示す。図２の例では、各著者の著者名に、属性として、性別（男性／女性）、年代（２０歳未満、２０代、３０代、４０歳以上）、及び居住地（東日本／西日本）が対応付けられている。なお、属性は、上記の例に限定されず、血液型、職業、出身地等、その他の属性を用いてもよい。また、その著者が特定のＷｅｂサービスの利用者か否か、その著者の特定の趣味（ある芸能人のファン、鉄道好き等）などを属性として利用してもよい。また、図２の例の「年代」及び「居住地」は、より詳細に区分した属性を用いてもよい。 The sentence set 41 is a set (corpus) of a large amount of sentences (text data) collected from the Web or the like. Like the sentence set 41, the sentence set with author information 42 is a set of a large amount of sentences (text data) collected from the Web and the like, and is a set of sentences (corpus) associated with information indicating the author of each sentence. It is. The sentence set with author information 42 includes an author attribute list indicating attributes for each author associated with each sentence. FIG. 2 shows an example of the author attribute list. In the example of FIG. 2, the author's author name corresponds to the gender (male / female), age (under 20 years old, 20s, 30s, 40 years old), and residence (eastern Japan / western Japan) as attributes. It is attached. Note that the attributes are not limited to the above example, and other attributes such as blood type, occupation, and birthplace may be used. Further, whether or not the author is a user of a specific Web service, a specific hobby of the author (a celebrity fan, a railway enthusiast, etc.) may be used as an attribute. In addition, for the “age” and “residence” in the example of FIG. 2, more detailed attributes may be used.

文末表現抽出部２１は、文集合４１及び著者情報付き文集合４２に含まれる各文から、上記の文末表現情報を抽出し、文末表現リスト２５として、所定の記憶領域に記憶する。図３に、文末表現リスト２５の一例を示す。図３の例では、「直前の形態素の品詞」列に、文末表現及び文末表現の品詞が同一の文末表現情報に含まれる文末表現の直前の形態素の品詞がまとめて記載されている。 The sentence end expression extraction unit 21 extracts the above sentence end expression information from each sentence included in the sentence set 41 and the sentence set 42 with author information, and stores it in a predetermined storage area as a sentence end expression list 25. FIG. 3 shows an example of the sentence end expression list 25. In the example of FIG. 3, the part-of-speech of the morpheme immediately before the end-of-sentence included in the same end-of-sentence information includes the end-of-sentence expression and the end-of-sentence of the sentence end-of-sentence.

また、文末表現抽出部２１は、著者情報付き文集合４２に含まれる各文から抽出した文末表現に、その文末表現を使用した著者名として、各文の著者名を対応付けた文末表現使用著者リストを作成する。図４に、文末表現使用著者リストの一例を示す。図４の例では、「文末表現を使用した著者名」列に、同一の文末表現を使用した著者名がまとめて記載されている。また、著者名は、図２に示した著者属性リストの著者名と対応している。 In addition, the sentence end expression extraction unit 21 uses the sentence end expression using the sentence end expression associated with the sentence end expression extracted from each sentence included in the sentence set 42 with the author information as the author name using the sentence end expression. Create a list. FIG. 4 shows an example of a sentence end expression using author list. In the example of FIG. 4, author names using the same sentence ending expression are collectively listed in the “author name using sentence ending expression” column. The author name corresponds to the author name in the author attribute list shown in FIG.

なお、文末表現抽出部２１が、変換部３０の機能部として機能する場合には、文末表現の変換対象となる入力文（テキストデータ）を受け付け、上記の文末表現情報を抽出すればよい。以下では、変換対象の入力文から抽出された文末表現情報を、「変換対象文末表現情報」といい、変換対象文末表現情報に含まれる文末表現を、「変換対象文末表現」という。 When the sentence end expression extraction unit 21 functions as a function unit of the conversion unit 30, an input sentence (text data) to be converted into the sentence end expression may be received and the above sentence end expression information may be extracted. In the following, the sentence end expression information extracted from the input sentence to be converted is referred to as “conversion target sentence end expression information”, and the sentence end expression included in the conversion target sentence end expression information is referred to as “conversion target sentence end expression”.

特徴的文末表現抽出部２２は、文末表現抽出部２１で作成された文末表現使用著者リスト、及び著者情報付き文集合４２に含まれる著者属性リストを入力として受け付け、著者の属性毎に偏って多く使用される文末表現を、属性毎の特徴的な文末表現として抽出する。 The characteristic sentence ending expression extraction unit 22 accepts the sentence ending expression use author list created by the sentence ending expression extraction unit 21 and the author attribute list included in the sentence set 42 with the author information as input, and there are many biases for each attribute of the author. The used sentence ending expression is extracted as a characteristic sentence ending expression for each attribute.

具体的には、特徴的文末表現抽出部２２は、文末表現使用著者リスト及び著者属性リストに基づいて、各文末表現がどのような属性の人物によって使用されたかの対応付けを行う。例えば、図４に示す文末表現使用者リストに含まれる文末表現「あんの」について、文末表現を使用した著者名「000_kitsune」の属性を、図２の著者属性リストから取得する。ここでは、「性別：女性」、「年代：２０歳未満」、及び「居住地：西日本」という属性が取得される。この各属性を文末表現「あんの」に対応付けることにより、文末表現と属性とのペアを作成する。ここでは、「あんの−性別：女性」、「あんの−年代：２０歳未満」、及び「あんの−居住地：西日本」という文末表現と属性とのペアが作成される。この文末表現と属性とのペアの作成を、各文末表現を使用した全ての著者について行い、作成された複数の文末表現と属性とのペアについて、同じ文末表現と属性とのペアの数を集計する。 Specifically, the characteristic sentence ending expression extraction unit 22 associates what kind of attribute each sentence ending expression is used by a person based on the sentence ending expression use author list and the author attribute list. For example, for the sentence end expression “Anno” included in the sentence end expression user list shown in FIG. 4, the attribute of the author name “000_kitsune” using the sentence end expression is acquired from the author attribute list of FIG. Here, the attributes “gender: female”, “age: under 20 years old”, and “residence: West Japan” are acquired. By associating each attribute with the sentence end expression “anno”, a pair of sentence end expression and attribute is created. Here, a pair of sentence end expressions and attributes of “Anno-gender: female”, “Anno-age: under 20 years old”, and “Anno-residence: West Japan” are created. This pair of end-of-sentence expression and attribute is created for all authors using each end-of-sentence expression, and the number of the same end-of-sentence expression and attribute pairs is counted for multiple created end-sentence expression and attribute pairs. To do.

なお、ここでは、文末表現と著者名とを対応付けた文末表現使用著者リストを作成してから、文末表現と属性との対応付けを行う場合について説明したが、文末表現抽出部２１で抽出された文末表現に、著者の属性を直接対応付けてもよい。 Here, the case where the sentence end expression using author list in which the sentence end expression is associated with the author name is created and then the sentence end expression is associated with the attribute has been described. The attribute of the author may be directly associated with the last sentence expression.

さらに、特徴的文末表現抽出部２２は、文末表現と属性とのペアの集計結果を用いて、例えば有意水準を１％とするカイ二乗検定により、属性毎の特徴的な文末表現を抽出する。例えば、「Ａ」という文末表現について、「Ａ−性別：男性」ペアの出現回数がｘ、「Ａ−性別：女性」の出現回数がｙ（ｙ＞ｘ）の場合を考える。期待値を（ｘ＋ｙ）／２として計算したカイ二乗値が有意水準における値を超えている場合には、文末表現「Ａ」は、男性または女性のいずれかに偏って多く使われていると言えるため、文末表現「Ａ」を、出現回数が多い方のペアに含まれる属性（ここでは、女性）における特徴的な文末表現として抽出する。 Further, the characteristic sentence ending expression extraction unit 22 extracts a characteristic sentence ending expression for each attribute by using, for example, a chi-square test with a significance level of 1%, using the total result of pairs of sentence ending expressions and attributes. For example, for the sentence ending expression “A”, consider the case where the number of appearances of the “A-sex: male” pair is x and the number of appearances of “A-sex: female” is y (y> x). If the chi-square value calculated with the expected value of (x + y) / 2 exceeds the value at the significance level, it can be said that the sentence end expression “A” is often used with a bias toward either men or women. Therefore, the sentence ending expression “A” is extracted as a characteristic sentence ending expression in the attribute (here, female) included in the pair with the larger number of appearances.

なお、有意水準はデータや目的に応じて変更可能である。また、属性毎の特徴的な文末表現は、カイ二乗検定により抽出する場合に限定されず、ｔスコアや対数尤度比など、出現比率の比較に用いられる指標ならどのような指標を用いてもよい。特徴的文末表現抽出部２２は、抽出した属性毎の特徴的な文末表現を、特徴的文末表現リスト２６として作成し、所定の記憶領域に記憶する。図５に、特徴的文末表現リスト２６の一例を示す。図５の例では、抽出された属性毎の特徴的な文末表現及び属性と共に、カイ二乗検定の際に計算したカイ二乗値も含まれる。 The significance level can be changed according to the data and purpose. The characteristic sentence ending expression for each attribute is not limited to extraction by chi-square test, and any index can be used as long as it is an index used for comparison of appearance ratios such as t-score and log-likelihood ratio. Good. The characteristic sentence ending expression extraction unit 22 creates a characteristic sentence ending expression for each extracted attribute as a characteristic sentence ending expression list 26 and stores it in a predetermined storage area. FIG. 5 shows an example of the characteristic sentence ending expression list 26. In the example of FIG. 5, the chi-square value calculated at the time of the chi-square test is included together with the characteristic sentence end expression and attribute for each extracted attribute.

変換候補選択部３１は、文末表現抽出部２１で入力文から抽出された変換対象文末表現情報を入力として受け付け、変換対象文末表現情報に含まれる変換対象文末表現の変換候補を選択する。 The conversion candidate selection unit 31 receives as input the conversion target sentence end expression information extracted from the input sentence by the sentence end expression extraction unit 21, and selects a conversion candidate for the conversion target sentence end expression included in the conversion target sentence end expression information.

具体的には、変換候補選択部３１は、変換対象文末表現情報と、文末表現リスト２５に含まれる各文末表現情報と照合する。そして、変換対象文末表現情報に含まれる変換対象文末表現の品詞及び直前の形態素の品詞と、文末表現の品詞及び直前の形態素の品詞とが一致する文末表現リスト２５内の文末表現情報に含まれる文末表現を、変換対象文末表現の変換候補として選択する。 Specifically, the conversion candidate selection unit 31 compares the conversion target sentence end expression information with each sentence end expression information included in the sentence end expression list 25. Then, the part-of-sentence expression in the sentence-end expression list 25 in which the part-of-speech of the conversion-target sentence ending expression and the part-of-speech of the immediately preceding morpheme and the part-of-speech of the sentence-end expression and the part-of-speech of the immediately preceding morpheme included in the conversion-target sentence ending expression information are included. The sentence end expression is selected as a conversion candidate for the conversion target sentence end expression.

例えば、入力文が「明日はいい天気になるかな」の場合、文末表現抽出部２１により、動詞「なる」の活用語尾以降の「るかな」が変換対象文末表現として抽出され、「活用語尾＿終助詞」が文末表現の品詞として抽出され、「動詞語幹」が文末表現の直前の形態素の品詞として抽出される。従って、文末表現リスト２５内の文末表現情報のうち、文末表現の品詞が「活用語尾＿終助詞」で、直前の形態素の品詞が「動詞語幹」の文末表現情報に含まれる文末表現を変換候補として選択する。 For example, when the input sentence is “Is it going to be a nice weather tomorrow?”, The sentence ending expression extraction unit 21 extracts “Kana” after the utilization ending of the verb “Naru” as the conversion target sentence ending expression, “Final particle” is extracted as the part of speech of the sentence end expression, and “verb stem” is extracted as the part of speech of the morpheme immediately before the sentence end expression. Therefore, among the sentence ending expression information in the sentence ending expression list 25, the sentence ending expression included in the sentence ending expression information in which the part of speech of the sentence ending expression is “utilization ending_final particle” and the part of speech of the immediately preceding morpheme is “verb stem” is converted to a candidate. Select as.

なお、文末表現の品詞が一致するか否かを判定する際には、文末表現に含まれる全形態素の品詞が一致した場合のみ文末表現の品詞が一致すると判定してもよいし、文末表現に含まれる先頭の形態素の品詞など、少なくとも１つの形態素の品詞が一致していれば、文末表現の品詞が一致すると判定してもよい。一致させる品詞の個数を増やすことにより、非文法的な変換候補が選択されることを抑制することができる。また、方言や著者のキャラクタによっては、文末表現の品詞が誤って解析される場合もある。このような形態素解析が困難な文末表現を扱う場合には、文末表現の品詞が一致しない場合でも、変換候補として選択するようにしてもよい。すなわち、文末表現の直前の形態素の品詞の一致のみで、変換候補を選択してもよい。これにより、形態素解析が困難な文末表現を扱う場合でも、より多くの変換候補を選択することができる。 When determining whether or not the part of speech of the sentence end expression matches, it may be determined that the part of speech of the sentence end expression matches only when the parts of speech of all morphemes included in the sentence end expression match. If the part of speech of at least one morpheme, such as the part of speech of the first morpheme included, matches, it may be determined that the part of speech of the sentence end expression matches. By increasing the number of parts of speech to be matched, it is possible to suppress selection of non-grammatical conversion candidates. Also, depending on the dialect or author's character, the part-of-speech expression at the end of the sentence may be analyzed incorrectly. When dealing with sentence ending expressions that are difficult to perform morphological analysis, even if the part of speech of the sentence ending expressions do not match, they may be selected as conversion candidates. That is, the conversion candidate may be selected only by matching the part of speech of the morpheme immediately before the sentence end expression. As a result, even when a sentence end expression that is difficult to analyze by morpheme is handled, more conversion candidates can be selected.

また、変換候補選択部３１は、文末表現に表れるテンス（完了や継続などの時制を表す言語表現）やモダリティ（疑問、推量、否定、経験、依頼、勧誘などの著者の判断や感じ方を表す言語表現）に基づいて、変換候補を選択してもよい。具体的には、例えば図６に示すようなテンス、モダリティ等リストを参照して、変換対象文末表現及び文末表現リスト２５内の各文末表現にテンス・モダリティ等を表す形態素が含まれているか否かを照合する。照合の結果、変換対象文末表現にテンス、モダリティ等を表す形態素が含まれていた場合は、変換対象文末表現に含まれるテンス、モダリティを表す形態素と同じ形態素が含まれる文末表現のみを、変換候補として選択する。 Further, the conversion candidate selection unit 31 represents the author's judgment and feeling such as tense (language expression representing tense such as completion or continuation) and modality (question, guess, negation, experience, request, solicitation) that appear in the sentence end expression. Conversion candidates may be selected based on (language expression). Specifically, for example, referring to a list of tense and modality as shown in FIG. 6, whether or not a morpheme representing tense / modality or the like is included in each sentence end expression in the conversion target sentence end expression and sentence end expression list 25. Collate. As a result of collation, if the conversion target sentence ending expression contains a morpheme representing a tense, modality, etc., only the sentence ending expression containing the same morpheme as the tense and modality included in the conversion target sentence ending expression is converted as a conversion candidate. Select as.

例えば、入力文が上記の「明日はいい天気になるかな」の場合、変換対象文末表現「るかな」に含まれる「かな」という形態素が「疑問」のモダリティを表す。そこで、上記のように、文末表現リスト２５内の文末表現情報のうち、文末表現の品詞が「活用語尾＿終助詞」で、直前の形態素の品詞が「動詞語幹」の文末表現情報に含まれる文末表現であって、「疑問」のモダリティを表す形態素を含む文末表現を、変換候補として選択する。 For example, when the input sentence is “Is it good weather tomorrow?”, The morpheme “Kana” included in the conversion target sentence expression “Kana” represents the “question” modality. Therefore, as described above, of the sentence end expression information in the sentence end expression list 25, the part of speech of the sentence end expression is included in the sentence end expression information of “utilization ending_final particle” and the part of speech of the immediately preceding morpheme is “verb stem”. A sentence ending expression including a morpheme representing a “question” modality is selected as a conversion candidate.

テンスやモダリティに基づいて変換候補を選択することにより、質問回答や情報提供などの対話行為を行う対話システムなどにおいて、対話行為に影響を与える要素が変更されることを防止することができる。なお、本実施の形態に係る文末表現変換装置１０を、対話行為の変更に配慮しなくてもよいシステムに適用する場合には、テンスやモダリティに基づく変換候補の選択は、省略してもよい。 By selecting conversion candidates based on the tense and modality, it is possible to prevent changes in elements that affect the dialogue action in a dialogue system that performs dialogue actions such as answering questions and providing information. In addition, when applying the sentence end expression conversion device 10 according to the present embodiment to a system that does not need to consider the change of dialogue action, selection of conversion candidates based on tense and modality may be omitted. .

変換候補選択部３１は、例えば図７に示すような、選択した変換候補の各々をリスト化した変換候補リストを出力する。図７の例では、入力文の変換対象文末表現より前の形態素列（図7中の「入力文の文末表現より前」）と共に、変換候補の各々をリスト化している。 The conversion candidate selection unit 31 outputs a conversion candidate list in which each of the selected conversion candidates is listed, for example, as shown in FIG. In the example of FIG. 7, each conversion candidate is listed together with a morpheme string (“before the sentence end expression of the input sentence” in FIG. 7) before the conversion target sentence end expression of the input sentence.

属性付与部３２は、変換候補選択部３１から出力された変換候補リストを入力として受け付け、特徴的文末表現リスト２６に含まれる文末表現から変換候補と一致する文末表現を抽出し、抽出した文末表現に対応付けられている属性を、変換候補の各々に付与する。変換候補と一致する文末表現が特徴的文末表現リスト２６内に複数存在する場合には、対応する複数の属性を変換候補に付与する。図８に、変換候補リストに含まれる変換候補の各々に属性を付与した一例を示す。図８の例では、特徴的文末表現リスト２６に含まれるカイ二乗値も合わせて付与している。 The attribute assigning unit 32 receives the conversion candidate list output from the conversion candidate selection unit 31 as an input, extracts a sentence ending expression that matches the conversion candidate from the sentence ending expressions included in the characteristic sentence ending expression list 26, and extracts the extracted sentence ending expression The attribute associated with is assigned to each conversion candidate. When there are a plurality of sentence ending expressions matching the conversion candidate in the characteristic sentence ending expression list 26, a plurality of corresponding attributes are assigned to the conversion candidate. FIG. 8 shows an example in which an attribute is assigned to each conversion candidate included in the conversion candidate list. In the example of FIG. 8, the chi-square value included in the characteristic sentence end expression list 26 is also given.

なお、特徴的文末表現リスト２６に、各変換候補と同一の文末表現が存在しない場合には、変換候補を形態素に分割して、形態素の１〜Ｎ−ｇｒａｍを作り、１〜Ｎ−ｇｒａｍが一致する文末表現が特徴的文末表現リスト２６に存在すれば、その文末表現に対応する属性を変換候補に付与するようにしてもよい。 If the same sentence ending expression as each conversion candidate does not exist in the characteristic sentence ending expression list 26, the conversion candidates are divided into morphemes to create 1-N-grams of morphemes. If a matching sentence ending expression exists in the characteristic sentence ending expression list 26, an attribute corresponding to the sentence ending expression may be given to the conversion candidate.

変換結果出力部３３は、入力文、変換後の文の著者の属性を示す著者属性、及び属性が付与された変換候補リストを入力として受け付け、付与された属性が入力された著者属性に一致する変換候補を、変換結果として決定する。属性が一致する変換候補が複数存在する場合には、カイ二乗値が最も高い変換候補を、変換結果として決定する。そして、変換結果出力部３３は、入力文の変換対象文末表現より前の形態素列と、決定した変換結果とを結合した出力文を生成して、出力する。 The conversion result output unit 33 accepts an input sentence, an author attribute indicating the author attribute of the converted sentence, and a conversion candidate list to which the attribute is assigned as input, and the assigned attribute matches the input author attribute. Conversion candidates are determined as conversion results. If there are a plurality of conversion candidates with matching attributes, the conversion candidate with the highest chi-square value is determined as the conversion result. Then, the conversion result output unit 33 generates and outputs an output sentence obtained by combining the morpheme string before the conversion target sentence end expression of the input sentence and the determined conversion result.

例えば、「明日はいい天気になるかな」という入力文、及び「女性」という著者属性が入力された場合、図８の例では、変換候補の中から、属性が「女性」でありカイ二乗値が最も高い「る＿かしら」という変換候補を変換結果として決定し、「明日はいい天気になるかしら」という変換後の出力文を出力する。 For example, when an input sentence “Is it a nice weather tomorrow?” And an author attribute “female” are input, in the example of FIG. 8, the attribute is “female” from the conversion candidates, and the chi-square value. Conversion candidate “RU_KASHIRA” having the highest is determined as a conversion result, and an output sentence after conversion “Is it going to be a good weather tomorrow?” Is output.

なお、属性が一致する変換候補が複数存在する場合には、カイ二乗値等の出現比率を示す指標に基づいて変換結果を決定する場合に限定されず、ランダムに決定したり、出現比率を示す指標以外の統計的指標に基づいて、決定したりしてもよい。出現比率を示す指標以外の統計的指標としては、例えば、変換候補とその変換候補に付与された属性とのペアの著者情報付き文集合４２における出現頻度を用いることができる。出現頻度が高い変換候補を選択することで、より一般的な文末表現を選択することができる。また、その他の統計的指標として、例えば、著者情報付き文集合４２における、変換候補とその変換候補に付与された属性とのペアと同一の文末表現と属性とのペア部分の直前の形態素の異なり数を用いてもよい。直前の形態素の異なり数が多い変換候補を選択することにより、様々な表現に後続できる、すなわち、どんな文脈にも適応できる可能性が高い文末表現を選択することができる。 In addition, when there are a plurality of conversion candidates with matching attributes, the conversion result is not limited to the case where the conversion result is determined based on an index indicating the appearance ratio, such as a chi-square value. It may be determined based on a statistical index other than the index. As a statistical index other than the index indicating the appearance ratio, for example, the appearance frequency in the sentence set 42 with the author information of the pair of the conversion candidate and the attribute assigned to the conversion candidate can be used. By selecting a conversion candidate with a high appearance frequency, a more general sentence ending expression can be selected. Further, as another statistical index, for example, in the sentence set with author information 42, the morpheme immediately before the pair part of the same sentence end expression and attribute as the pair of the conversion candidate and the attribute assigned to the conversion candidate is different. Numbers may be used. By selecting a conversion candidate having a large number of different morphemes immediately before, it is possible to select a sentence ending expression that can be followed by various expressions, that is, highly likely to be adaptable to any context.

また、著者属性は、入力された情報を受け付ける場合に限定されず、予め設定された著者属性を用いてもよいし、予め用意した複数の著者属性からランダムに選択したり、所定のルールに従って選択したりした著者属性を用いてもよい。 In addition, the author attribute is not limited to accepting the input information, the author attribute set in advance may be used, or may be selected randomly from a plurality of previously prepared author attributes, or selected according to a predetermined rule Or author attributes may be used.

次に、本実施の形態に係る文末表現変換装置１０の作用について説明する。文末表現変換装置１０に文集合４１及び著者情報付き文集合４２が入力されると、リスト作成部２０により、図９に示すリスト作成処理が実行される。また、文末表現変換装置１０に入力文及び著者属性が入力されると、変換部３０により、図１０に示す変換処理が実行される。以下、各処理について詳述する。 Next, the operation of the sentence ending expression conversion apparatus 10 according to the present embodiment will be described. When the sentence set 41 and the sentence set with author information 42 are input to the sentence end expression conversion device 10, the list creation unit 20 executes a list creation process shown in FIG. When the input sentence and the author attribute are input to the sentence end expression conversion device 10, the conversion unit 30 executes the conversion process shown in FIG. Hereinafter, each process is explained in full detail.

まず、リスト作成処理のステップ１００で、文末表現抽出部２１が、文集合４１及び著者情報付き文集合４２に含まれる各文を形態素解析し、文末から見て初出の内容語より後ろの形態素列を文末表現として抽出し、文末表現に含まれる形態素の品詞列を文末表現の品詞として抽出し、文末表現の直前の形態素の品詞を抽出する。文末表現抽出部２１は、抽出した文末表現、文末表現の品詞、及び直前の形態素の品詞を含む文末表現情報を、例えば図３に示すような文末表現リスト２５として、所定の記憶領域に記憶する。 First, in step 100 of the list creation process, the sentence ending expression extraction unit 21 performs morphological analysis on each sentence included in the sentence set 41 and the sentence set with author information 42, and the morpheme string after the first appearing content word as seen from the sentence end. Is extracted as the sentence end expression, the morpheme part of speech string included in the sentence end expression is extracted as the part of speech of the sentence end expression, and the part of speech of the morpheme immediately before the sentence end expression is extracted. The sentence end expression extraction unit 21 stores sentence end expression information including the extracted sentence end expression, the part of speech of the sentence end expression, and the part of speech of the immediately preceding morpheme in a predetermined storage area, for example, as a sentence end expression list 25 as shown in FIG. .

次に、ステップ１０２で、文末表現抽出部２１が、著者情報付き文集合４２に含まれる各文から抽出した文末表現に、その文末表現を使用した著者名として、各文の著者名を対応付け、例えば図４に示すような文末表現使用著者リストを作成する。 Next, in step 102, the sentence end expression extraction unit 21 associates the author name of each sentence with the sentence end expression extracted from each sentence included in the sentence set 42 with the author information as the author name using the sentence end expression. For example, a sentence end expression using author list as shown in FIG. 4 is created.

次に、ステップ１０４で、特徴的文末表現抽出部２２が、上記ステップ１０２で作成された文末表現使用著者リスト、及び著者情報付き文集合４２に含まれる、例えば図２に示すような著者属性リストに基づいて、文末表現使用著者リストに含まれる各文末表現に属性を対応付けた文末表現と属性とのペアを作成する。 Next, in step 104, the characteristic sentence ending expression extraction unit 22 is included in the sentence ending expression using author list created in step 102 and the sentence set with author information 42, for example, an author attribute list as shown in FIG. Based on the above, a pair of sentence ending expression and attribute is created by associating an attribute with each sentence ending expression included in the sentence ending expression use author list.

次に、ステップ１０６で、特徴的文末表現抽出部２２が、上記ステップ１０４で作成した文末表現と属性とのペアを集計し、例えばカイ二乗検定により、属性毎に偏って多く使用される特徴的な文末表現を抽出する。特徴的文末表現抽出部２２は、抽出した属性毎の特徴的な文末表現に、例えばカイ二乗検定の際に計算したカイ二乗値を付与して、例えば図５に示すような特徴的文末表現リスト２６として作成し、所定の記憶領域に記憶し、リスト作成処理を終了する。 Next, in step 106, the characteristic sentence ending expression extraction unit 22 counts the pairs of sentence ending expressions and attributes created in step 104, and the characteristic that is often used biased for each attribute, for example, by chi-square test. End sentence expression is extracted. The characteristic sentence ending expression extraction unit 22 assigns, for example, a chi-square value calculated at the time of the chi-square test to the characteristic sentence ending expression for each extracted attribute, for example, a characteristic sentence ending expression list as shown in FIG. 26 is created and stored in a predetermined storage area, and the list creation process is terminated.

次に、変換処理のステップ１１０で、文末表現抽出部２１が、入力文から変換対象文末表現情報を抽出する。 Next, in step 110 of the conversion process, the sentence end expression extraction unit 21 extracts conversion target sentence end expression information from the input sentence.

次に、ステップ１１２で、変換候補選択部３１が、上記ステップ１１０で抽出された変換対象文末表現情報と、文末表現リスト２５に含まれる各文末表現情報と照合する。そして、変換対象文末表現情報に含まれる変換対象文末表現の品詞及び直前の形態素の品詞と、文末表現の品詞及び直前の形態素の品詞とが一致する文末表現リスト２５内の文末表現情報に含まれる文末表現を、変換対象文末表現の変換候補として選択する。 Next, in step 112, the conversion candidate selection unit 31 collates the conversion target sentence ending expression information extracted in step 110 with each sentence ending expression information included in the sentence ending expression list 25. Then, the part-of-sentence expression in the sentence-end expression list 25 in which the part-of-speech of the conversion-target sentence ending expression and the part-of-speech of the immediately preceding morpheme and the part-of-speech of the sentence-end expression and the part-of-speech of the immediately preceding morpheme included in the conversion-target sentence ending expression information are included. The sentence end expression is selected as a conversion candidate for the conversion target sentence end expression.

次に、ステップ１１４で、変換候補選択部３１が、例えば図６に示すようなテンス、モダリティ等リストを参照して、上記ステップ１１２で選択した変換候補のうち、変換対象文末表現に含まれるテンス、モダリティを表す形態素と同じ形態素が含まれる変換候補を選択する。変換候補選択部３１は、例えば図７に示すような、選択した変換候補の各々をリスト化した変換候補リストを出力する。 Next, in step 114, the conversion candidate selection unit 31 refers to, for example, a list of tens and modalities as shown in FIG. 6, and among the conversion candidates selected in step 112, the tense included in the conversion target sentence ending expression. The conversion candidate including the same morpheme as the morpheme representing the modality is selected. The conversion candidate selection unit 31 outputs a conversion candidate list in which each of the selected conversion candidates is listed, for example, as shown in FIG.

次に、ステップ１１６で、属性付与部３２が、上記ステップ１１４で出力された変換候補リストに含まれる変換候補の各々に、特徴的文末表現リスト２６を参照して、属性及びカイ二乗値を付与する。 Next, in step 116, the attribute assigning unit 32 assigns an attribute and a chi-square value to each of the conversion candidates included in the conversion candidate list output in step 114 with reference to the characteristic sentence end expression list 26. To do.

次に、ステップ１１８で、変換結果出力部３３が、付与された属性が入力された著者属性に一致する変換候補のうち、カイ二乗値が最も高い変換候補を、変換結果として決定する。そして、変換結果出力部３３が、入力文の変換対象文末表現より前の形態素列と、決定した変換結果とを結合した出力文を生成して出力し、変換処理を終了する。 Next, in step 118, the conversion result output unit 33 determines the conversion candidate having the highest chi-square value among the conversion candidates whose assigned attributes match the input author attribute as the conversion result. And the conversion result output part 33 produces | generates and outputs the output sentence which combined the morpheme sequence before the conversion object sentence end expression of an input sentence, and the determined conversion result, and complete | finishes a conversion process.

以上説明したように、本実施の形態に係る文末表現変換装置によれば、Ｗｅｂなどから収集した大量のテキストデータである文集合から抽出された複数の文末表現から、少なくとも文末表現の直前の品詞が一致する変換候補を選択し、著者の属性毎に偏って多く使用される特徴的な文末表現に基づいて付与した属性が、所望の著者属性に一致する変換候補を選択する。著者の属性毎に偏って多く使用される特徴的な文末表現は、Ｗｅｂなどから収集した大量のテキストデータである著者情報付きの文集合から自動的に抽出しておくことができ、多様な属性に適用することができる。従って、人手による作業コストをかけることなく、変換後の文が所望のキャラクタ性を有するように、文末表現を変換することができる。 As described above, according to the sentence ending expression conversion device according to the present embodiment, at least the part of speech immediately before the sentence ending expression from a plurality of sentence ending expressions extracted from a sentence set that is a large amount of text data collected from the Web or the like. Are selected, and a conversion candidate whose attribute assigned based on a characteristic sentence ending expression that is used in a biased manner for each attribute of the author matches the desired author attribute is selected. Characteristic end-of-sentence expressions that are often used in bias for each author attribute can be automatically extracted from a sentence set with author information, which is a large amount of text data collected from the Web, etc. Can be applied to. Therefore, it is possible to convert the sentence end expression so that the converted sentence has a desired character property without incurring manual work costs.

対話システムに本発明を適用すると、システムで生成する応答文にキャラクタ性を持たせることが可能となり、対話システムをより人間らしく親しみ易い存在にすることができる。また、ＷｅｂページにおけるクチコミやＱ＆Ａなどのテキストデータの要約に本発明を適用すると、方言や性別、年代を異にする複数の人物が書いた投稿内容に表れるキャラクタ性を統一することが可能になり、複数の人物が書いた文から成るものだと気付かせない、より自然な要約文を生成できるようになる。 When the present invention is applied to a dialogue system, it is possible to give a character to response sentences generated by the system, and to make the dialogue system more humane and friendly. In addition, when the present invention is applied to the summary of text data such as word-of-mouth and Q & A on a web page, it becomes possible to unify the character characteristics that appear in the contents of posts written by a plurality of people with different dialects, genders, and ages. , You will be able to generate a more natural summary sentence that you will not notice if it consists of sentences written by multiple people.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、上記実施の形態では、Ｗｅｂなどから収集したテキストデータである文を入力する場合について説明したが、ここでの「文」は日本語で記述された文であればよく、話者による発話を、書き起こしや音声認識によってテキスト化したものであってもよい。この場合、話者の属性を上記実施の形態における著者の属性として用いればよい。また、テキストデータではなく、音声データを入力するようにしもよい。この場合、文末表現抽出部の前に、音声認識部を設けるようにするとよい。また、出力文もテキストデータで出力する場合に限定されず、音声合成して音声データとして出力するようにしてもよい。 For example, in the above-described embodiment, the case where a sentence that is text data collected from the Web or the like is input has been described. However, the “sentence” here may be a sentence written in Japanese, and an utterance by a speaker. May be converted into text by transcription or voice recognition. In this case, the speaker attribute may be used as the author attribute in the above embodiment. Also, voice data may be input instead of text data. In this case, a speech recognition unit may be provided before the sentence end expression extraction unit. Further, the output sentence is not limited to the case of outputting as text data, but may be synthesized as speech data and output as speech data.

また、上記実施の形態では、リスト作成部と変換部とが同一のコンピュータで構成される場合について説明したが、別々のコンピュータで構成するようにしてもよい。この場合、リスト作成部を構成するコンピュータにより作成された文末表現リスト及び特徴的文末表現リストを、変換部を構成するコンピュータで読み込んで、上記の変換処理を実行するようにするとよい。 Moreover, although the case where the list creation unit and the conversion unit are configured by the same computer has been described in the above embodiment, the list generation unit and the conversion unit may be configured by separate computers. In this case, the sentence ending expression list and the characteristic sentence ending expression list created by the computer constituting the list creating unit may be read by the computer constituting the converting unit to execute the above conversion processing.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 In the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium.

１０文末表現変換装置
２０リスト作成部
２１文末表現抽出部
２２特徴的文末表現抽出部
２５文末表現リスト
２６特徴的文末表現リスト
３０変換部
３１変換候補選択部
３２属性付与部
３３変換結果出力部
４１文集合
４２著者情報付き文集合 DESCRIPTION OF SYMBOLS 10 End sentence expression conversion apparatus 20 List preparation part 21 End sentence expression extraction part 22 Characteristic sentence end expression extraction part 25 End sentence expression list 26 Characteristic end sentence expression list 30 Conversion part 31 Conversion candidate selection part 32 Attribute assignment part 33 Conversion result output part 41 sentence Set 42 Sentence set with author information

Claims

Sentence ending expression extracting means for extracting sentence ending expression that includes the sentence ending expression that appears at the sentence ending of the sentence in Japanese and the morpheme part of speech immediately before the sentence ending expression;
Out of multiple sentence ending expression information extracted from each of multiple Japanese sentences, extracted from the morpheme part of speech immediately before the sentence ending expression included in each of the multiple sentence ending expression information and the Japanese sentence to be converted The sentence ending expression included in the sentence ending expression extracted from the sentence in Japanese to be converted is the sentence ending expression included in the sentence ending expression that matches the part of speech of the morpheme immediately preceding the sentence ending expression included in the sentence ending expression. Conversion candidate selection means for selecting as a conversion candidate,
Based on the sentence ending expression that appears characteristically for each attribute obtained from the correspondence between multiple sentence ending expressions extracted from each of multiple Japanese sentences with author information including author attributes Attribute assigning means for assigning an attribute to each of the conversion candidates;
Out of the conversion candidates, the end-of-sentence expression information extracted from the conversion-target Japanese sentence by selecting a conversion candidate whose attribute matches the attribute set in advance as the attribute of the converted sentence author Conversion means for converting the sentence end expression included in the selected conversion candidate,
Sentence ending expression conversion device.

The sentence ending expression conversion apparatus according to claim 1, wherein the sentence ending expression that appears characteristically for each attribute is extracted based on an appearance ratio of a correspondence relationship between the sentence ending expression and an author attribute.

The conversion means includes a statistical index indicating a correspondence relationship between the sentence end expression corresponding to the assigned attribute and an author attribute among the conversion candidates in which the assigned attribute matches the preset attribute. The sentence end expression conversion device according to claim 2, wherein the highest conversion candidate is selected.

The conversion candidate selection means selects a conversion candidate representing a tens or modality that matches a tense or modality represented by a sentence ending expression included in sentence ending expression information extracted from a sentence in Japanese to be converted among the conversion candidates. The sentence ending expression conversion apparatus of any one of Claims 1-3.

The sentence ending expression extracting means extracts sentence ending expression information further including a part of speech of the sentence ending expression,
The conversion candidate selection means includes, among a plurality of sentence ending expression information extracted from each of a plurality of Japanese sentences, a part of speech of a sentence ending expression included in each of the plurality of sentence ending expression information and a morpheme immediately before the sentence ending expression. The sentence ending expression included in the sentence ending expression information that matches the part of speech of the sentence ending expression included in the sentence ending expression extracted from the Japanese sentence to be converted and the morpheme immediately before the sentence ending expression, the sentence ending expression The sentence end expression conversion device according to claim 1, wherein the sentence end expression conversion candidate is selected as a conversion candidate for the sentence end expression included in the sentence end expression extracted from the sentence in Japanese.

6. The conversion candidate selection unit, when the sentence end expression includes a plurality of morphemes, determines that the part of speech of the sentence end expression matches when at least one of the part of speech of the morpheme included in the sentence end expression matches. The sentence ending expression conversion device described.

A sentence ending expression conversion method in a sentence ending expression conversion device including a sentence ending expression extraction means, a conversion candidate selection means, an attribute assignment means, and a conversion means,
The sentence ending expression extracting means extracts sentence ending expression that includes a sentence ending expression that appears at the sentence ending of a sentence in Japanese, and a morpheme part of speech immediately before the sentence ending expression,
Of the plurality of sentence ending expression information extracted from each of a plurality of sentences in Japanese, the conversion candidate selecting means includes a morpheme part of speech immediately before the sentence ending expression included in each of the plurality of sentence ending expression information, and a conversion target The sentence ending expression included in the sentence ending expression that matches the morpheme part of speech immediately before the sentence ending expression included in the sentence ending expression extracted from the Japanese sentence is extracted from the Japanese sentence to be converted. Select as the conversion candidate for the sentence ending expression contained in the expression information,
The attribute assigning means is characteristic for each attribute obtained from a correspondence relationship between a plurality of sentence end expressions extracted from each of a plurality of sentences in Japanese to which author information including the attribute of the author is assigned and the attribute of the author An attribute is given to each of the conversion candidates based on the sentence end expression appearing in
The conversion means selects, from the conversion candidates, a conversion candidate whose assigned attribute matches an attribute set in advance as an attribute of the author of the converted sentence, and extracts from the Japanese sentence to be converted A sentence ending expression conversion method for converting a sentence ending expression included in the sentence ending expression information to the selected conversion candidate.

The sentence end expression conversion program for functioning a computer as each means which comprises the sentence end expression conversion apparatus of any one of Claims 1-6.