JP6674411B2

JP6674411B2 - Utterance generation device, utterance generation method, and utterance generation program

Info

Publication number: JP6674411B2
Application number: JP2017091927A
Authority: JP
Inventors: 航光田; 東中　竜一郎; 竜一郎東中; 松尾　義博; 義博松尾
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-05-02
Filing date: 2017-05-02
Publication date: 2020-04-01
Anticipated expiration: 2037-05-02
Also published as: JP2018190170A

Description

本発明は、発話生成装置、発話生成方法、及び発話生成プログラムに関する。 The present invention relates to an utterance generation device, an utterance generation method, and an utterance generation program.

対話システムは、大きく分けて２種類あり、タスク指向型対話システムと非タスク指向型対話システム（雑談対話システム）とに分けられる。近年は、そのエンターテイメント性、ロボットとの日常会話等が注目されていることもあり、雑談対話システムの研究が盛んに行われている。 There are roughly two types of dialog systems, which are divided into a task-oriented dialog system and a non-task-oriented dialog system (chat dialog system). In recent years, the entertainment property, daily conversation with a robot, and the like have attracted attention, and a chat dialogue system has been actively studied.

対話において会話を進めるために、システムの理解をユーザに伝える応答発話は重要である。例えば、ホテル予約を行うタスク指向型対話システムであれば、ユーザ発話を理解した結果を「〜に〜日に泊まりたいのですね」と伝えることで、ユーザはシステムの理解を確認することができる。雑談対話システムであれば、ユーザが発話した内容を繰り返すことで、ユーザの話を理解しているということを示すことができる。 Answer utterances that convey the understanding of the system to the user are important for the conversation to proceed. For example, in the case of a task-oriented dialogue system for making a hotel reservation, the user can confirm his / her understanding of the system by transmitting the result of understanding the user's utterance as "I want to stay in a day." With a chat dialogue system, by repeating the content uttered by the user, it can be shown that the user understands the story.

雑談対話システムでは、ユーザ発話からキーワードを抽出し、テンプレートに当てはめることで応答発話を生成することが多い。例えば、「富士山に行きました」という発話から「富士山」というキーワードを抽出し、「〜ですね」というパタンに当てはめることで、「富士山ですね」という応答発話を生成することができる。 In a chat dialogue system, a response utterance is often generated by extracting a keyword from a user utterance and applying the extracted keyword to a template. For example, a keyword "Mt. Fuji" is extracted from the utterance "Mt. Mt. Fuji" and applied to the pattern "Is ..", so that a response utterance "Mt. Fuji" can be generated.

本手法の具体的な説明は、非特許文献１に開示されている。 A specific description of this technique is disclosed in Non-Patent Document 1.

また、近年では、非特許文献２に開示されているように、ユーザ発話から述語項構造と呼ばれる意味構造（述語及びその項からなる構造）を抽出し、抽出した述語項構造を元に応答発話を生成する手法も研究されている。例えば、「富士山に行きました」というユーザ発話から「ユーザガ富士山ニ行く」という述語項構造を抽出し、末尾の表現を変えることで、「富士山に行ったんですね」という応答発話を生成することができる。 In recent years, as disclosed in Non-Patent Document 2, a semantic structure (a structure including a predicate and its terms) called a predicate-term structure is extracted from a user utterance, and a response utterance is generated based on the extracted predicate-term structure. Techniques for generating are also being studied. For example, by extracting the predicate-argument structure of "User-Ga Fuji-Ni-Go" from the user utterance "I went to Mt.Fuji", by changing the expression at the end, a response utterance of "I went to Mt.Fuji" is generated. be able to.

J. Weizenbaum, "ELIZA-a computer program for the study of natural language communication between man and machine", Communications of the ACM, vol. 9, pp. 36-45, 1966.J. Weizenbaum, "ELIZA-a computer program for the study of natural language communication between man and machine", Communications of the ACM, vol. 9, pp. 36-45, 1966. Ryuichiro Higashinaka, Kenji Imamura, Toyomi Meguro, Chiaki Miyazaki, Nozomi Kobayashi, Hiroaki Sugiyama, Toru Hirano, Hoshiro Makino, Yoshihiro Matsuo, "Towards an open domain conversational system fully based on natural language processing, In Proc. COLING, pp. 928-939, 2014.Ryuichiro Higashinaka, Kenji Imamura, Toyomi Meguro, Chiaki Miyazaki, Nozomi Kobayashi, Hiroaki Sugiyama, Toru Hirano, Hoshiro Makino, Yoshihiro Matsuo, "Towards an open domain conversational system fully based on natural language processing, In Proc. COLING, pp. 928- 939, 2014.

既存の雑談対話システムが生成可能な応答発話は、ユーザ発話中の単語、述語項構造等から生成されるため、ユーザ発話中で表層的に表れたものしか生成することができない。そのため、ユーザ発話を単に繰り返して生成しているだけという印象をユーザに与えてしまい、より深く話を聞いていると示すことができない。 Since the response utterance that can be generated by the existing chat dialogue system is generated from the words, the predicate term structure, and the like in the user's utterance, only the response utterance that is superficially expressed in the user's utterance can be generated. For this reason, the user is given an impression that the user's utterance is simply generated repeatedly, and it is not possible to indicate that the user is listening to the talk more deeply.

本発明は、雑談対話システムにおいてユーザ発話に直接含まれない言外の情報を得ることができる発話生成装置、発話生成方法、及び発話生成プログラムを提供することを目的とする。 An object of the present invention is to provide an utterance generation device, an utterance generation method, and an utterance generation program that can obtain information outside a word that is not directly included in a user utterance in a chat dialogue system.

上記目的を達成するために、本発明の発話生成装置は、ユーザ発話を入力する入力部と、ユーザ発話と言外の情報との組の集合である用例データ、又はコーパス文書内で共起する述語項構造の組に基づいて、前記入力部により入力された前記ユーザ発話に対応する言外の情報、又は前記ユーザ発話に対応する述語項構造から、前記言外の情報の候補を生成する検索部と、前記入力部により入力された前記ユーザ発話と、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出するためのランキングモデルとに基づいて、前記検索部により生成された前記言外の情報の候補の各々に対して、前記スコアを算出する言外の情報ランキング部と、を含む。 In order to achieve the above object, an utterance generation device according to the present invention co-occurs in an input unit for inputting a user utterance and example data which is a set of a set of a user utterance and extraordinary information, or in a corpus document. Based on a set of predicate-argument structures, a search that generates candidates for the non-verbal information from the unspoken information corresponding to the user utterance input by the input unit or the predicate-arrangement structure corresponding to the user utterance And the user utterance input by the input unit, and a ranking model for calculating a score representing the likelihood of information outside of the user utterance with respect to the user utterance. And an information ranking section for calculating the score for each of the information candidates.

なお、前記言外の情報のタイプを識別するためのタイプ識別モデルに基づいて、前記言外の情報の各々に対して前記タイプを識別して付与するタイプ識別部と、前記タイプが付与された前記言外の情報から、前記タイプに関する予め定めた条件を満たす前記言外の情報を出力するタイプフィルタ部と、を更に含むようにしても良い。 In addition, based on a type identification model for identifying the type of the implicit information, a type identification unit that identifies and assigns the type to each of the implicit information, and the type is assigned. A type filter unit that outputs, from the unspoken information, the unspoken information satisfying a predetermined condition regarding the type may be further included.

また、予め前記タイプが付与された前記言外の情報の集合を収録したデータに基づいて、前記タイプ識別モデルを作成するタイプ識別モデル作成部を更に備え、前記タイプ識別部は、前記タイプ識別モデル作成部により作成された前記タイプ識別モデルに基づいて、前記各々の言外の情報に対して前記タイプを識別して付与するようにしても良い。 The apparatus further includes a type identification model creation unit that creates the type identification model based on data that includes a set of the wording information to which the type is added in advance, and the type identification unit includes the type identification model. Based on the type identification model created by the creating unit, the type may be identified and assigned to each of the unspoken information.

また、前記スコアが算出された前記言外の情報の候補のうち、前記スコアが予め定めた条件を満たす前記言外の情報を応答発話として出力する出力部を更に含むようにしても良い。 In addition, an output unit that outputs, as a response utterance, the unspoken information whose score satisfies a predetermined condition among the unspoken information candidates for which the score has been calculated may be further included.

また、前記出力部により出力される前記言外の情報の各々を発話文に変換する表現変換部を更に含むようにしても良い。 Further, the information processing apparatus may further include an expression conversion unit that converts each of the unspoken information output by the output unit into an utterance sentence.

また、ユーザ発話と正例の前記言外の情報との組、及びユーザ発話と負例の前記言外の情報との組に基づいて、前記ユーザ発話及び前記言外の情報の各々を形態素解析し、得られた形態素のうちの語幹同士の組み合わせ、及び前記語幹同士の組み合わせが、正例の組及び負例の組の何れから得られたものであるかに基づいて、前記ランキングモデルを作成するランキングモデル作成部を更に備え、言外の情報ランキング部は、前記入力部により入力された前記ユーザ発話と、前記検索部により生成された前記言外の情報の候補との各々を形態素解析し、得られた形態素のうちの語幹同士の組み合わせと、前記ランキングモデル作成部により作成された前記ランキングモデルとに基づいて、前記言外の情報の候補について前記スコアを算出するようにしても良い。 Further, based on a combination of the user utterance and the non-verbal information of the positive example and a pair of the user utterance and the non-verbal information of the negative example, each of the user utterance and the non-verbal information is subjected to morphological analysis. Then, the ranking model is created based on whether the combination of the stems of the obtained morphemes and the combination of the stems are obtained from a positive example set or a negative example set. Further comprising a ranking model creating unit, wherein the unspoken information ranking unit performs a morphological analysis on each of the user utterance input by the input unit and the unspoken information candidates generated by the search unit. And calculating the score for the candidate of the unspoken information based on the combination of the stems of the obtained morphemes and the ranking model created by the ranking model creating unit. Unishi and may be.

上記目的を達成するために、本発明の発話生成方法は、入力部、検索部、及び言外の情報ランキング部を含む発話生成装置における発話生成方法であって、前記入力部が、ユーザ発話を入力するステップと、前記検索部が、ユーザ発話と言外の情報との組の集合である用例データ、又はコーパス文書内で共起する述語項構造の組に基づいて、前記入力部により入力された前記ユーザ発話に対応する言外の情報、又は前記ユーザ発話に対応する述語項構造から、前記言外の情報の候補を生成するステップと、前記言外の情報ランキング部が、前記入力部により入力された前記ユーザ発話と、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出するためのランキングモデルとに基づいて、前記検索部により生成された前記言外の情報の候補の各々に対して、前記スコアを算出するステップと、を含む。 In order to achieve the above object, an utterance generation method according to the present invention is an utterance generation method in an utterance generation device including an input unit, a search unit, and an unranked information ranking unit, wherein the input unit generates a user utterance. Inputting, the search unit is input by the input unit based on example data, which is a set of a set of user utterances and extraordinary information, or a set of predicate term structures co-occurring in a corpus document. Generating a candidate for the unspoken information from the unspoken information corresponding to the user utterance or the predicate item structure corresponding to the user utterance; and The unspoken information generated by the search unit based on the input user utterance and a ranking model for calculating a score indicating the likelihood of the unspoken information for the user utterance For each candidate, including a step of calculating the score.

上記目的を達成するために、本発明の発話生成プログラムは、コンピュータを、上記発話生成装置の各部として機能させるためのプログラムである。 In order to achieve the above object, an utterance generation program of the present invention is a program for causing a computer to function as each unit of the utterance generation device.

本発明によれば、雑談対話システムにおいてユーザ発話に直接含まれない言外の情報を得ることが可能となる。 According to the present invention, it is possible to obtain extraordinary information that is not directly included in a user utterance in a chat dialogue system.

実施形態に係る発話生成装置の全体の構成を示すブロック図である。It is a block diagram showing the whole utterance generation device composition concerning an embodiment. 実施形態に係る学習部の機能的な構成を示すブロック図である。FIG. 3 is a block diagram illustrating a functional configuration of a learning unit according to the embodiment. 実施形態に係る発話生成部の機能的な構成を示すブロック図である。FIG. 3 is a block diagram illustrating a functional configuration of an utterance generation unit according to the embodiment. 実施形態に係る発話生成処理の流れを示すフローチャートである。9 is a flowchart illustrating a flow of an utterance generation process according to the embodiment.

以下、本実施形態について図面を用いて説明する。 Hereinafter, the present embodiment will be described with reference to the drawings.

本実施形態に係る発話生成装置は、ユーザ発話から推定が可能な情報をもとに、応答発話を生成することで、ユーザ発話に限定されない発話を生成する。本実施形態では、ユーザ発話から推定が可能な情報を「言外の情報」と呼び、当該「言外の情報」を「発話に明示的に表れていなかったとしても、人間が一般に理解することができる情報」と定義する。 The utterance generation device according to the present embodiment generates an utterance that is not limited to the user utterance by generating a response utterance based on information that can be estimated from the user utterance. In the present embodiment, information that can be estimated from the user's utterance is referred to as “unspoken information”, and even if the “unspoken information” is not explicitly expressed in the utterance, it is generally understood by humans. Information that can be ".

具体的には、本実施形態に係る発話生成装置は、ユーザ発話を入力とし、言外の情報を推定した上で、その言外の情報を用いて応答発話を生成する。また、本実施形態に係る発話生成装置は、特に、言外の情報を確認したり質問したりする発話を生成することで、応答発話を生成する。言外の情報の推定では、以下に示す（方法１）及び（方法２）で言外の情報の候補を列挙し、ユーザ発話と各々の言外の情報との候補ペアの尤もらしさを表すスコアを算出し、言外の情報の候補をランキングすることで、言外の情報を推定する。 Specifically, the utterance generation device according to the present embodiment receives a user's utterance, estimates extraordinary information, and generates a response utterance using the extraordinary information. In addition, the utterance generation device according to the present embodiment generates a response utterance, particularly, by generating an utterance that confirms unsolicited information or asks a question. In estimating the word-to-word information, candidates of word-to-word information are listed by (method 1) and (method 2) shown below, and a score representing the likelihood of a candidate pair of a user utterance and each word-to-word information is listed. Is calculated, and the information candidates are estimated by ranking the information candidates.

（方法１）ユーザ発話と言外の情報との組の集合を収録したデータを用意し、入力されたユーザ発話を用いて、データ中のユーザ発話を検索することで、言外の情報の候補を列挙する。 (Method 1) Data containing a set of pairs of user utterances and spontaneous information is prepared, and user utterances in the data are searched for using the input user utterances. Are listed.

（方法２）文書内で共起する述語項構造の組の集合のデータ（例：＜雨が降る、洗濯物が濡れる＞）を用意し、ユーザ発話と類似する述語項構造を検索し、データ中で、検索された述語項構造と組み合わせられた述語項構造を言外の情報とみなすことで、言外の情報の候補を列挙する。 (Method 2) Prepare data of a set of sets of predicate-argument structures that co-occur in a document (eg, <raining, wet laundry>) and search for a predicate-argument structure similar to the user's utterance. Among them, the predicate-argument structure combined with the searched predicate-arrangement structure is regarded as implicit information, thereby listing candidates of implicit information.

（方法１）及び（方法２）の方法で、ユーザ発話に対する言外の情報の候補を列挙し、列挙した候補をランキングすることで、言外の情報の推定を行う。具体的には、ユーザ発話と正しい言外の情報（正例の言外の情報）の組、及び、ユーザ発話と誤った言外の情報（負例の言外の情報）の組を収録したデータをもとに、ユーザ発話に対して言外の情報の尤もらしさを推定する回帰モデルを学習する。学習した回帰モデルを用いて、発話に対する言外の情報の候補の尤もらしさをスコアとして推定した上でランキングを行い、スコアが予め定めた閾値以上の言外の情報を、ユーザ発話に対する言外の情報として推定する。 In the methods (method 1) and (method 2), candidates of unspoken information for the user utterance are enumerated, and the enumerated candidates are ranked to estimate the unspoken information. Specifically, a set of a user utterance and correct non-word information (information of a positive example) and a set of a user utterance and incorrect word information (negative example of a word) are recorded. Based on the data, a regression model for estimating the likelihood of information outside of the user utterance is learned. Using the learned regression model, the likelihood of candidate information outside the word for the utterance is estimated as a score, and ranking is performed.The information outside the word whose score is equal to or greater than a predetermined threshold is calculated as the word outside the word for the user utterance. Estimate as information.

ユーザ発話に対する言外の情報として推定した言外の情報の中には、様々な種類の情報が含まれ得るため、推定した言外の情報の全てを応答発話として出力してしまうと、ユーザが不快に思う場合がある。例えば、「テストで１番になりました」というユーザ発話に対して、「ユーザは褒められたい」という言外の情報を推定したとき、「あなたは褒められたいんですね」と応答すると、ユーザが不快に思う可能性がある。そのため、予め定義した言外の情報のタイプに基づいて、推定された言外の情報のタイプを推定した後、特定のタイプの言外の情報のみを応答発話として出力する。 Since various types of information can be included in the extraneous information estimated as the extraneous information for the user utterance, if all the extraneous information estimated is output as the response utterance, May be uncomfortable. For example, in response to a user's utterance "I was ranked first in the test", when estimating extra information that "the user wants to be praised," the user responds "I want to be praised." May be uncomfortable. Therefore, after estimating the type of the estimated extraordinary information based on the predefined type of the extraordinary information, only the specific type of extraneous information is output as a response utterance.

推定した言外の情報に対して、言外の情報をそのままの形（例えば、平叙文）、もしくは、末尾を確認の形式（例：「〜なんですね」）、質問の形式（例：「〜なんですか？」）等に変換した形で発話として使用することで、応答発話を出力する。 With respect to the estimated extraordinary information, the extraordinary information is used as it is (for example, declarative sentence), or the end is confirmed (for example, "~ What is"), and the question format (for example, "~ What is it? ") And output the response utterance by using it as the utterance in the converted form.

図１は、本実施形態に係る発話生成装置１０の全体の構成を示すブロック図である。図１に示すように、本実施形態に係る発話生成装置１０は、学習部１２、及び、発話生成部１４を備えている。学習部１２は、発話生成部１４で必要とされるデータ及びモデルを作成する。また、発話生成部１４は、学習部１２で作成されたデータ及びモデルを元に、ユーザ発話に対する複数の応答発話の集合を、その尤もらしさのスコアを付与した状態で生成する。 FIG. 1 is a block diagram illustrating an overall configuration of an utterance generation device 10 according to the present embodiment. As illustrated in FIG. 1, the utterance generation device 10 according to the present embodiment includes a learning unit 12 and an utterance generation unit 14. The learning unit 12 creates data and a model required by the utterance generation unit 14. Further, the utterance generation unit 14 generates a set of a plurality of response utterances for the user utterance based on the data and the model created by the learning unit 12 in a state where the likelihood score is given.

図２は、本実施形態に係る発話生成装置１０の学習部１２の機能的な構成を示すブロック図である。学習部１２は、コーパス文書共起データ作成部２０、コーパス文書共起データ記憶部２２、ランキングモデル作成部２４、ランキングモデル記憶部２６、タイプ識別モデル作成部２８、及び、タイプ識別モデル記憶部３０を備えている。 FIG. 2 is a block diagram illustrating a functional configuration of the learning unit 12 of the utterance generation device 10 according to the present embodiment. The learning unit 12 includes a corpus document co-occurrence data creation unit 20, a corpus document co-occurrence data storage unit 22, a ranking model creation unit 24, a ranking model storage unit 26, a type identification model creation unit 28, and a type identification model storage unit 30. It has.

図３は、本実施形態に係る発話生成装置１０の発話生成部１４の機能的な構成を示すブロック図である。発話生成部１４は、入力部４０、用例データ記憶部４２、検索部４４、言外の情報ランキング部４６、タイプ識別部４８、タイプフィルタ部５０、表現変換部５２、及び、出力部５４を備えている。また、検索部４４は、用例検索部４４ａ、及び、共起検索部４４ｂを備えている。 FIG. 3 is a block diagram illustrating a functional configuration of the utterance generation unit 14 of the utterance generation device 10 according to the present embodiment. The utterance generation unit 14 includes an input unit 40, an example data storage unit 42, a search unit 44, an information ranking unit 46, a type identification unit 48, a type filter unit 50, an expression conversion unit 52, and an output unit 54. ing. The search unit 44 includes an example search unit 44a and a co-occurrence search unit 44b.

なお、図１中及び図２中の矢印は、各部の入出力関係を表す。また、破線は、各部がそのモデル又はデータを利用することを表す。 The arrows in FIGS. 1 and 2 indicate the input / output relationship of each unit. A broken line indicates that each unit uses the model or data.

コーパス文書共起データ作成部２０は、コーパス文書内で共起する述語項構造の組に基づいて、入力部４０により入力したユーザ発話に対応する言外の情報から、前記言外の情報の候補を生成し、コーパス文書共起データ記憶部２２に記憶させる。 The corpus-document co-occurrence data creating unit 20 is configured to select candidates of the unspoken information from the unspoken information corresponding to the user utterance input by the input unit 40 based on a set of predicate-term structures co-occurring in the corpus document. Is generated and stored in the corpus document co-occurrence data storage unit 22.

具体的には、まず、入力された文書コーパスからイベントを抽出するために、各文書に対して述語項構造解析を行う。述語項構造解析には、述語及び項が抽出できるツールを利用すると良い。述語及び項が抽出できるツールとしては、例えば、下記非特許文献３に開示されている、出願人が開発したＪＤＥＰが挙げられる。 Specifically, first, a predicate term structure analysis is performed on each document in order to extract an event from the input document corpus. For the predicate-term structure analysis, it is preferable to use a tool that can extract predicates and terms. As a tool from which predicates and terms can be extracted, for example, JDEP developed by the applicant disclosed in Non-Patent Document 3 below can be mentioned.

［非特許文献３］Kenji Imamura, Genichiro Kikui, and Norihito Yasuda, "Japanese dependency parsing using scquential labeling for semi-spoken language.", In Proc ACL, 2007. [Non-Patent Document 3] Kenji Imamura, Genichiro Kikui, and Norihito Yasuda, "Japanese dependency parsing using scquential labeling for semi-spoken language.", In Proc ACL, 2007.

例えば、以下のように、３つの文書が含まれたブログ文書コーパスを入力として受け付ける。
文書Ａ：「車を運転して、富士山に行った。やはり山が好きだと思った。」
文書Ｂ：「富士山に行って、山に登った。景色が綺麗だった。」
文書Ｃ：「山が好きなので、富士山に行った。移動するために、車を運転した。」 For example, as described below, a blog document corpus including three documents is received as input.
Document A: "I drove a car and went to Mt. Fuji. I still liked the mountains."
Document B: "I went to Mt. Fuji and climbed the mountain. The scenery was beautiful."
Document C: "I like the mountains, so I went to Mt. Fuji. I drove a car to move."

これに対し、述語項構造解析を行った結果が以下である。なお、「Ｉ」（一人称を表す記号）は、ブログの著者を表す。また、述語は動詞で表され、項は名詞として表される。助詞の「ガ」、「ヲ」、及び「二」は項の種類を表し、それぞれ主語、直接目的語、及び間接目的語を表す。
文書Ａ：「Ｉガ車ヲ運転する」、「Ｉガ富士山ニ行く」、「Ｉガ山ガ好き」、「Ｉガ思う」
文書Ｂ：「Ｉガ富士山ニ行く」、「Ｉガ山ニ登る」、「景色ガ綺麗」
文書Ｃ：「Ｉガ山ガ好き」、「Ｉガ富士山ニ行く」、「Ｉガ移動する」、「Ｉガ車ヲ運転する」 On the other hand, the result of performing predicate term structure analysis is as follows. Note that "I" (symbol representing the first person) indicates the author of the blog. Predicates are represented by verbs, and terms are represented by nouns. The particles "ga", "$", and "two" represent the type of the term, and represent the subject, the direct object, and the indirect object, respectively.
Document A: "Iga car @ drive", "Iga Fujisan go", "Igayamaga lover", "Iga think"
Document B: "I go to Mt. Fuji", "Climb to Iga", "Beautiful scenery"
Document C: “Iga Yamaga I like”, “Iga Fujisan Go”, “Iga move”, “Iga car ヲ drive”

次に、同一の文書から抽出された述語項構造の全ての組み合わせについて組として抽出し、抽出した組が文書内において共起する共起回数をカウントすることで、下記表１に示す結果が得られる。 Next, all combinations of predicate-argument structures extracted from the same document are extracted as a set, and the number of co-occurrences in which the extracted set co-occurs in the document is counted. Can be

最後に、各々の述語項構造の組に対して付与された共起回数に基づいて、共起回数が予め定めた閾値以上の述語項構造の組を共起データとしてコーパス文書共起データ記憶部２２に出力する。 Finally, based on the number of co-occurrences given to each set of predicate-argument structures, a set of predicate-argument structures whose co-occurrence count is equal to or greater than a predetermined threshold is used as co-occurrence data as a corpus document co-occurrence data storage 22.

上述した例では、例えば、閾値を２とした場合、共起回数が２回以上の述語項構造の組を抽出し、下記表２に示す述語項構造の組が共起データとして出力される。 In the above-described example, for example, when the threshold is set to 2, a set of predicate item structures having a co-occurrence count of 2 or more is extracted, and a set of predicate item structures shown in Table 2 below is output as co-occurrence data.

コーパス文書共起データ記憶部２２には、コーパス文書共起データ作成部２０により作成されたコーパス文書共起データが記憶される。 The corpus document co-occurrence data storage unit 22 stores the corpus document co-occurrence data created by the corpus document co-occurrence data creation unit 20.

ランキングモデル作成部２４は、ユーザ発話と正例の言外の情報との組、及びユーザ発話と負例の言外の情報との組に基づいて、ユーザ発話及び言外の情報の各々を形態素解析し、得られた形態素のうちの語幹同士の組み合わせ、及び語幹同士の組み合わせが正例の組及び負例の組の何れから得られたものであるかに基づいて、ランキングモデルを作成し、ランキングモデル記憶部２６に記憶させる。ランキングモデルは、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出するためのモデルである。 The ranking model creation unit 24 morphologically converts each of the user utterance and the non-verbal information based on a pair of the user utterance and the positive verbal information and a pair of the user utterance and the negative verbal information. Analyze and create a ranking model based on whether the combination of stems among the obtained morphemes and the combination of stems are obtained from a positive set or a negative set, It is stored in the ranking model storage unit 26. The ranking model is a model for calculating a score indicating the likelihood of information other than words in response to a user utterance.

ランキングモデルの作成には、下記表３に示すような、予め人手で作成された、ユーザ発話と正例の言外の情報との組、及びユーザ発話と負例の言外の情報との組のデータを用いる。なお、下記表３におけるフラグは、１が正例を表し、０が負例を表す。また、「Ｉ」は、言外の情報が付与されたユーザ発話の話者を表す。 To create the ranking model, as shown in Table 3 below, a set of a user's utterance and information outside the positive example and a set of a user's utterance and information outside the negative example, which are manually created in advance. Use the data of In the following Table 3, 1 represents a positive example and 0 represents a negative example. “I” represents a speaker of a user utterance to which information other than words is added.

このようなデータを元にランキングモデルの学習を行うために、素性として、ユーザ発話に含まれる形態素と、言外の情報に含まれる形態素との組を用いる。 In order to perform learning of a ranking model based on such data, a combination of a morpheme included in a user utterance and a morpheme included in information other than words is used as a feature.

ここでは、一例として、下記表４に示すユーザ発話と言外の情報とを例に挙げて説明する。ただし、Ｓは、話者を表す記号である。 Here, as an example, the user's utterance and information other than the words shown in Table 4 below will be described as an example. Here, S is a symbol representing a speaker.

下記非特許文献４で開示されている、出願人が開発したＪＴＡＧ等の形態素解析機を用いて、ユーザ発話と言外の情報との形態素解析を行う。 Using a morphological analyzer such as JTAG developed by the applicant disclosed in Non-Patent Document 4 below, a morphological analysis is performed between the user's utterance and unspoken information.

［非特許文献４］Takeshi Fuchi, and Shinichiro Takagi, "Japanese morphological analyzer using word co-occurrence: JTAG", Proceedings of the 17th international conference on Computational linguistics-Volume 1, Association for Computational Linguistics, 1998. [Non-Patent Document 4] Takeshi Fuchi, and Shinichiro Takagi, "Japanese morphological analyzer using word co-occurrence: JTAG", Proceedings of the 17th international conference on Computational linguistics-Volume 1, Association for Computational Linguistics, 1998.

形態素解析を行うことにより、下記表５に示すような結果を得る。なお、下記表５において、左側の列は表記を表し、中央の列は品詞を表し、右側の列は語幹を表す。 By performing the morphological analysis, the results shown in Table 5 below are obtained. In the following Table 5, the left column indicates the notation, the center column indicates the part of speech, and the right column indicates the stem.

次に、形態素解析の結果を利用して、内容語（文法的な意味を持たず、意味を表す単語）を抽出し、ユーザ発話から抽出した内容語と、言外の情報から抽出した内容語との組を作成する。具体的には、ユーザ発話と言外の情報のそれぞれから、品詞が名詞、動詞語幹、又は形容詞語幹である形態素を抽出し、ユーザ発話から抽出した形態素の各々と、言外の情報から抽出した形態素の各々とを組み合わせることで、下記表６に示すような、内容語の組を作成する。なお、下記表６では、各々の内容語の組を、（ユーザ発話から抽出した形態素，言外の情報から抽出した形態素）として示している。 Next, using the results of the morphological analysis, content words (words having no grammatical meaning and representing meaning) are extracted, and content words extracted from user utterances and content words extracted from non-word information Create a pair with Specifically, from each of the user's utterance and the non-verbal information, morphemes whose part of speech is a noun, a verb stem, or an adjective stem were extracted, and each of the morphemes extracted from the user utterance and the extra verbal information was extracted. By combining each of the morphemes, a set of content words as shown in Table 6 below is created. In Table 6 below, each set of content words is shown as (morpheme extracted from user utterance, morpheme extracted from non-word information).

このようにして内容語の組を作成し、作成した内容語の組と、内容語の組が正例の組及び負例の組の何れから得られたものであるかに基づいて、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出するためのランキングモデルを学習する。 A set of content words is created in this manner, and the user utterance is determined based on the created set of content words and whether the set of content words is obtained from a positive example set or a negative example set. Learning a ranking model for calculating a score that represents the likelihood of implicit information with respect to.

学習アルゴリズムには、例えば、ロジスティック回帰を用いれば良い。ロジスティック回帰の詳細は、下記非特許文献５に開示されている。また、ロジスティック回帰以外のアルゴリズムとしては、例えばランキングＳＶＭ（Support Vector Machine）等が挙げられる。 For example, logistic regression may be used as the learning algorithm. Details of logistic regression are disclosed in Non-Patent Document 5 below. As an algorithm other than the logistic regression, for example, a ranking SVM (Support Vector Machine) or the like is cited.

［非特許文献５］高村大也，言語処理のための機械学習入門，コロナ社，2010. [Non-Patent Document 5] Daiya Takamura, Introduction to Machine Learning for Language Processing, Corona, 2010.

最終的に、学習の結果として得られた、内容語の組毎の、ユーザ発話に対する言外の情報の尤もらしさを表す重みの情報を、ランキングモデルとしてランキングモデル記憶部２６に記憶させる。 Finally, the ranking model storage unit 26 stores, as a ranking model, the weight information indicating the likelihood of information other than the user utterance for each set of content words obtained as a result of learning.

ランキングモデル記憶部２６には、ランキングモデル作成部２４により作成されたランキングモデルが記憶される。 The ranking model storage unit 26 stores the ranking model created by the ranking model creation unit 24.

タイプ識別モデル作成部２８は、予めタイプが付与された言外の情報の集合を収録したデータに基づいて、言外の情報のタイプを識別するタイプ識別モデルを作成し、タイプ識別モデル記憶部３０に記憶させる。 The type identification model creation unit 28 creates a type identification model that identifies the type of the information that is implicit, based on data that includes a set of information that is implicitly assigned in advance, and stores the type identification model storage unit 30 To memorize.

タイプの種類は、下記表７に示す９種類である。 There are nine types shown in Table 7 below.

これらの９種類のタイプは、言外の情報がどのような情報を表しているかという観点で作成されている。各タイプの作成方法、及び詳細な説明は、下記非特許文献６に開示されている。 These nine types are created from the viewpoint of what information the implicit information represents. The method of making each type and a detailed description are disclosed in Non-Patent Document 6 below.

［非特許文献６］光田航，東中竜一郎，松尾義博，「複数の作業者グループを用いた対話における言外の情報の類型化」，電子情報通信学会技術研究報告 Vol. 116 No. 379 言語理解とコミュニケーション, pp. 13-18, 2016. [Non-Patent Document 6] Wataru Mitsuda, Ryuichiro Higashinaka, Yoshihiro Matsuo, "Classification of unspoken information in dialogue using multiple worker groups", IEICE Technical Report Vol. 116 No. 379 Language Understanding and Communication, pp. 13-18, 2016.

本実施形態では、予めタイプが各々の言外の情報に付与されたデータを用いて、言外の情報に対してタイプを識別するためのタイプ識別モデルを学習する。 In the present embodiment, a type identification model for identifying a type with respect to unspoken information is learned using data in which a type is previously assigned to each unspoken information.

下記表８に示す言外の情報を例とし、素性の抽出方法を説明する。なお、下記表８で例示されている言外の情報は、「信念１」というタイプに分類される言外の情報である。 A feature extraction method will be described with reference to the information shown in Table 8 below as an example. Note that the extraordinary information exemplified in Table 8 below is extraordinary information classified into the type of “belief 1”.

言外の情報に対して形態素解析を行うと、下記表９に示すような結果が得られる。下記表９の結果は、出願人が開発したＪＴＡＧ等の形態素解析機を用いて形態素解析を行った場合の出力結果である。下記表９における左側の列は表記を表し、中央左側の列は品詞を表し、中央右側の列は語幹を表し、右側の列は日本語語彙大系のカテゴリ番号を表す。また、日本語語彙大系のカテゴリ番号［］［］［］は、左側から順番に、一般名詞意味属性、固有名詞意味属性、用言意味属性を表す。これらの各々の属性については、非特許文献７に開示されている。ただし、Ｓは、話者を表す記号である。 When a morphological analysis is performed on the information that is implicit, the results shown in Table 9 below are obtained. The results in Table 9 below are output results when morphological analysis was performed using a morphological analyzer such as JTAG developed by the applicant. In Table 9 below, the left column represents notation, the left column in the middle represents part of speech, the right column in the middle represents stems, and the right column represents category numbers in the Japanese vocabulary system. The category numbers [] [] [] of the Japanese vocabulary system represent a general noun semantic attribute, a proper noun semantic attribute, and a verbal semantic attribute in order from the left. Each of these attributes is disclosed in Non-Patent Document 7. Here, S is a symbol representing a speaker.

［非特許文献７］池原悟，宮崎正弘，白井諭，横尾昭男，中岩浩巳，小倉健太郎，大山芳史，林良彦，日本語語彙大系，岩波書店，1997. [Non Patent Literature 7] Satoru Ikehara, Masahiro Miyazaki, Satoshi Shirai, Akio Yokoo, Hiromi Nakaiwa, Kentaro Ogura, Yoshifumi Oyama, Yoshihiko Hayashi, Japanese Vocabulary, Iwanami Shoten, 1997.

このような形態素解析の結果から、以下のような素性が得られる。なお、ユニグラムは系列の要素１つずつのことを表し、バイグラムは系列の隣接する要素２つずつを順序付きで組にしたものを表す。また、形態素は、形態素解析の結果として得られた各形態素の語幹を表す。 The following features are obtained from the results of such morphological analysis. Note that a unigram represents one element of a sequence, and a bigram represents a set of two adjacent elements in a sequence ordered. The morpheme indicates the stem of each morpheme obtained as a result of the morphological analysis.

形態素ユニグラム：「Ｓ」、「は」、「富士山」、「が」、「好き」、「だ」
形態素バイグラム：「Ｓ-は」、「は-富士山」、「富士山-が」、「が-好き」、「好き-だ」
一般名詞意味属性ユニグラム：「４７１」、「１３００」
一般名詞意味属性バイグラム：「４７１−１３００」
用言意味属性ユニグラム：「１１」
用言意味属性バイグラム：なし Morphological unigram: "S", "ha", "Mt. Fuji", "ga", "like", "da"
Morphological bigram: "S-ha", "ha-Mt. Fuji", "Mt. Fuji-ga", "ga-like", "like-da"
General noun semantic attribute unigram: "471", "1300"
General noun semantic attribute bigram: "471-1300"
Word semantic attribute unigram: "11"
Word meaning semantic attribute bigram: none

一般名詞意味属性と用言意味属性とのカテゴリ番号が複数存在する場合には、最も左側のカテゴリ番号が、上記ＪＴＡＧが最も適切と判定したカテゴリ番号であることを考慮し、最も左側のカテゴリ番号のみを用いる。 When there are a plurality of category numbers of the general noun semantic attribute and the verbal semantic attribute, considering that the leftmost category number is the category number determined to be most appropriate by the JTAG, the leftmost category number is considered. Use only

これらの素性を用いて、言外の情報を入力として、そのタイプを出力するモデルを学習する。学習アルゴリズムには、多クラス分類が可能なＳＶＭを利用すれば良い。多クラス分類については、上記非特許文献５に詳しく開示されている。 Using these features, a model is learned that inputs information that is not the word and outputs that type. As a learning algorithm, an SVM capable of performing multi-class classification may be used. Non-Patent Document 5 discloses the multi-class classification in detail.

タイプ識別モデル記憶部３０には、タイプ識別モデル作成部２８により作成されたタイプ識別モデルが記憶される。 The type identification model storage unit 30 stores the type identification model created by the type identification model creation unit 28.

入力部４０は、１つのユーザ発話を入力し、入力したユーザ発話を、用例検索部４４ａ、共起検索部４４ｂ、及び言外の情報ランキング部４６に出力する。例えば、ユーザ発話として下記表１０に示すユーザ発話が入力され、入力されたユーザ発話が、用例検索部４４ａ、共起検索部４４ｂ、及び言外の情報ランキング部４６に出力される。 The input unit 40 inputs one user utterance and outputs the input user utterance to the example search unit 44a, the co-occurrence search unit 44b, and the non-word information ranking unit 46. For example, user utterances shown in Table 10 below are input as user utterances, and the input user utterances are output to the example search unit 44a, the co-occurrence search unit 44b, and the extraordinary information ranking unit 46.

なお、入力部４０は、ユーザ発話の他に、当該ユーザ発話の前までの対話文脈（発話が系列になったもの）を入力として受け付けても良い。 Note that, in addition to the user utterance, the input unit 40 may receive, as an input, a conversation context (a series of utterances) up to the user utterance.

用例データ記憶部４２には、ユーザ発話と言外の情報との組の集合である用例データが記憶されている。 The example data storage unit 42 stores example data, which is a set of sets of user utterances and unspoken information.

用例検索部４４ａは、用例データ記憶部４２に記憶されている用例データに基づいて、入力部４０により入力されたユーザ発話に対応する言外の情報から、言外の情報の候補を生成する。 The example search unit 44a generates a candidate for unspoken information from unspoken information corresponding to the user utterance input by the input unit 40, based on the example data stored in the example data storage unit 42.

具体的には、まず、入力部４０により入力されたユーザ発話、及び、用例データ中のユーザ発話を、ｗｏｒｄ２ｖｅｃを用いてベクトルに変換する。ｗｏｒｄ２ｖｅｃは、テキストコーパスを用いて学習を行うことで、任意のテキストを固定長のベクトルに変換する一般的な手法である。 Specifically, first, the user utterance input by the input unit 40 and the user utterance in the example data are converted into vectors using word2vec. word2vec is a general method of converting an arbitrary text into a fixed-length vector by performing learning using a text corpus.

ｗｏｒｄ２ｖｅｃの学習には、Ｗｉｋｉｐｅｄｉａ（登録商標）等のコーパス文書を用いる。この際、コーパス文書共起データ作成部で用いたコーパス文書と同じコーパス文書を用いても良い。具体的な学習方法は、下記非特許文献８に開示されている。 For learning word2vec, a corpus document such as Wikipedia (registered trademark) is used. At this time, the same corpus document as that used in the corpus document co-occurrence data creation unit may be used. A specific learning method is disclosed in Non-Patent Document 8 below.

［非特許文献８］Tomas Mikolov, Kai Chen, and Jeffrey Dean, "Efficient estimation of word representation in vector space", CoRR, Vol. abs/1301.3781, 2013. [Non-Patent Document 8] Tomas Mikolov, Kai Chen, and Jeffrey Dean, "Efficient estimation of word representation in vector space", CoRR, Vol. Abs / 1301.3781, 2013.

ｗｏｒｄ２ｖｅｃを用いてベクトルに変換した、入力部４０により入力されたユーザ発話と、用例データ中のユーザ発話との類似度の計算には、コサイン類似度を用いれば良い。コサイン類似度は、ベクトル間の類似度を測るために用いられる一般的な尺度であり、下記（１）式で表される。なお、下記（１）式における

と

とはベクトルを表し、

はコサイン類似度を表す。 The cosine similarity may be used to calculate the similarity between the user utterance converted into a vector using word2vec and input by the input unit 40 and the user utterance in the example data. The cosine similarity is a general measure used to measure the similarity between vectors, and is represented by the following equation (1). Note that, in the following equation (1):

When

Represents a vector,

Represents cosine similarity.

・・・（１）
... (1)

入力部４０により入力されたユーザ発話と、用例データ中のユーザ発話との類似度を計算し、用例データ中で、コサイン類似度が閾値以上であるユーザ発話と組になっている言外の情報を、言外の情報の候補として出力する。 Calculates the similarity between the user utterance input by the input unit 40 and the user utterance in the example data, and the information outside the word that is paired with the user utterance whose cosine similarity is equal to or greater than the threshold in the example data. Is output as an implicit information candidate.

ユーザ発話として「紅葉を見に富士山に行きました」というユーザ発話に対する処理の一例を以下に示す。下記表１１に、用例データの一例を示す。 An example of a process for a user utterance “I went to Mt. Fuji to see autumn leaves” as a user utterance is shown below. Table 11 below shows an example of the example data.

下記表１２は、入力部４０により入力されたユーザ発話と、用例データ中のユーザ発話とをベクトル化し、類似度を計算した結果を示す。 Table 12 below shows the result of vectorizing the user utterance input by the input unit 40 and the user utterance in the example data and calculating the similarity.

上記表１２において、類似度の閾値を０．８とすると、下記表１３に示すように、類似度が閾値以上のであるユーザ発話と組になっている言外の情報を、言外の情報の候補として言外の情報ランキング部４６に出力する。 In Table 12, when the similarity threshold is set to 0.8, as shown in Table 13 below, the unspoken information paired with the user utterance whose similarity is equal to or greater than the threshold is replaced with the unspoken information of the unspoken information. The information is output to the information ranking section 46 as a candidate.

共起検索部４４ｂは、コーパス文書共起データ記憶部２２にコーパス文書共起データとして記憶されている、コーパス文書内で共起する述語項構造の組に基づいて、ユーザ発話に対応する述語項構造から、言外の情報の候補を生成する。 The co-occurrence search unit 44b is configured to execute a predicate term corresponding to the user's utterance based on a set of co-occurrence predicate term structures stored in the corpus document co-occurrence data storage unit 22 as corpus document co-occurrence data. From the structure, an implicit information candidate is generated.

言外の情報の候補を検索する際には、コーパス文書共起データ記憶部２２に記憶されているコーパス文書共起データを読み出し、読み出したコーパス文書共起データに対して、入力部４０により入力されたユーザ発話と類似した述語項構造を検索する。そして、述語項構造の組において、検索された述語項構造と組み合わせている述語項構造を言外の情報の候補として言外の情報ランキング部４６に出力する。 When searching for information candidates, the corpus document co-occurrence data stored in the corpus document co-occurrence data storage unit 22 is read, and the read corpus document co-occurrence data is input by the input unit 40. A predicate-argument structure similar to the user utterance is searched. Then, in the set of predicate item structures, the predicate item structure combined with the retrieved predicate item structure is output to the implicit information ranking unit 46 as an implicit information candidate.

なお、入力部４０により入力されたユーザ発話と類似した述語項構造を検索する方法としては、上述した用例検索部４４ａと同様にベクトル間の類似度を用いた方法が挙げられる。 In addition, as a method of searching for a predicate term structure similar to the user utterance input by the input unit 40, a method using the similarity between vectors as in the above-described example search unit 44a is exemplified.

言外の情報ランキング部４６は、入力部４０により入力されたユーザ発話と、ランキングモデル記憶部２６に記憶されているランキングモデルとに基づいて、用例検索部４４ａ又は共起検索部４４ｂにより生成された言外の情報の候補の各々に対して、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出する。 The word ranking section 46 is generated by the example search section 44a or the co-occurrence search section 44b based on the user utterance input by the input section 40 and the ranking model stored in the ranking model storage section 26. A score representing the likelihood of the extraordinary information for the user utterance is calculated for each of the extraordinary information candidates.

下記表１４に、入力部４０により入力されたユーザ発話を示し、下記表１５に、言外の情報の候補を示す。 Table 14 below shows user utterances input by the input unit 40, and Table 15 below shows candidates for unspoken information.

下記表１６に、このような例において算出された、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを示す。 Table 16 below shows scores calculated in such an example and indicating the likelihood of extraordinary information for the user's utterance.

下記表１７に示すように、算出したスコアが閾値（例えば、０．２）以上の言外の情報を、スコアを昇順に並べたランキング形式で、タイプ識別部４８に出力する。 As shown in Table 17 below, information that has a calculated score equal to or greater than a threshold (for example, 0.2) is output to the type identification unit 48 in a ranking format in which the scores are arranged in ascending order.

タイプ識別部４８は、タイプ識別モデル記憶部３０に記憶されているタイプ識別モデルに基づいて、言外の情報の各々に対してタイプを識別して付与する。 The type identification unit 48 identifies and assigns a type to each of the word information based on the type identification model stored in the type identification model storage unit 30.

例えば、下記表１８に示される言外の情報を入力したとする。 For example, assume that information other than the words shown in Table 18 below is input.

このような場合には、下記表１９に示されるように、各々の言外の情報にタイプが付与される。 In such a case, as shown in Table 19 below, a type is assigned to each implicit information.

タイプフィルタ部５０は、タイプが付与された言外の情報から、タイプに関する予め定めた条件を満たす言外の情報を、表現変換部５２に出力する。 The type filter unit 50 outputs, to the expression conversion unit 52, the unspoken information that satisfies the predetermined condition regarding the type from the unspoken information to which the type is assigned.

予め定めた条件を満たす言外の情報の選択方法としては、例えば、出力する言外の情報のタイプを予め列挙した設定ファイルを記憶させておき、この設定ファイルに含まれるタイプの言外の情報のみを選択する方法が挙げられる。 As a method of selecting the unspoken information that satisfies the predetermined condition, for example, a setting file in which the types of the unspoken information to be output are listed in advance is stored, and the unspoken information of the type included in the setting file is stored. There is a method of selecting only.

例えば、設定ファイルに「事実１」のタイプ、及び「事実２」のタイプのみが列挙されているとする。この場合に、下記表２０に示す、タイプが付与された言外の情報が入力された場合、上記設定ファイルを用いて、下記表２１に示す言外の情報が選択される。 For example, assume that only the type of “Fact 1” and the type of “Fact 2” are listed in the setting file. In this case, when information other than the type given in Table 20 below is input, the information outside the word shown in Table 21 below is selected using the setting file.

このように、タイプが付与された言外の情報を、タイプに基づいてフィルタリングすることで、例えば「Ｓは褒められたい」というように、ユーザに発話すべきでない可能性がある言外の情報を、出力対象の言外の情報から除外することができる。なお、予め定めた条件を満たす言外の情報の選択方法としては、例えば、出力すべきでない言外の情報のタイプを予め列挙した設定ファイルを記憶させておいてもよい。 In this way, by filtering the information with the type added out of the word based on the type, the information out of the word that may not have to be uttered to the user, for example, “I want to praise S” Can be excluded from the unintended information to be output. In addition, as a method of selecting non-word information that satisfies a predetermined condition, for example, a setting file in which types of non-word information that should not be output may be stored in advance may be stored.

表現変換部５２は、タイプフィルタ部５０により出力された言外の情報の各々を、発話文に変換し、出力部５４に出力する。 The expression conversion unit 52 converts each of the unspoken information output by the type filter unit 50 into an utterance sentence, and outputs the utterance sentence to the output unit 54.

言外の情報の表現を変換する際には、言外の情報の末尾を「〜ですね」と変換することで、確認の形式に変換する。変換方法はこれに限らず、言外の情報の末尾を「〜なんですか？」と変換することで質問の形式にしても良い。また、変換を行わず、言外の情報をそのまま応答発話として出力しても良い。 When converting the expression of the implicit information, the end of the implicit information is converted to “-isu” to thereby convert it into a confirmation format. The conversion method is not limited to this, and the end of the implicit information may be converted to “-What?” To form a question. Further, without performing the conversion, the unspoken information may be directly output as the response utterance.

下記表２２に示すような言外の情報を入力した場合には、各々の言外の情報は、下記表２３に示すような表現に変換される。なお、下記表２２では、言外の情報のみを示している。 When information other than words shown in Table 22 below is input, each information of words is converted to an expression shown in Table 23 below. In Table 22 below, only the information that is implicit is shown.

出力部５４は、スコアが算出された言外の情報のうち、スコアが予め定めた条件を満たす言外の情報を、スコアを付与した状態で、応答発話として出力する。出力方法としては、応答発話を示すデータをディスプレイ等の表示手段に表示させたり、応答発話を示すデータを外部装置に送信したり、応答発話を示す音声を音声出力手段により出力させたりする方法が挙げられる。 The output unit 54 outputs, as a response utterance, information of words whose score satisfies a predetermined condition among information of words whose score has been calculated, in a state where a score is assigned. As an output method, a method of displaying data indicating a response utterance on a display unit such as a display, transmitting data indicating a response utterance to an external device, and outputting a sound indicating a response utterance by a sound output unit is provided. No.

下記表２４に示すような、スコアが付与された言外の情報を入力した場合には、スコアに基づき、下記表２５に示すような、スコアが予め定めた閾値（例えば、０．５）以上の言外の情報が、スコアが付与された状態で、応答発話として出力される。 When information other than the words to which a score is assigned as shown in Table 24 below is input, the score is determined to be equal to or more than a predetermined threshold (for example, 0.5) as shown in Table 25 below based on the score. Is output as a response utterance in a state where a score is given.

なお、本実施形態に係る発話生成装置１０は、例えば、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、各種プログラムを記憶するＲＯＭ（Read Only Memory）を備えたコンピュータ装置で構成される。また、発話生成装置１０を構成するコンピュータは、ハードディスクドライブ、不揮発性メモリ等の記憶部を備えていても良い。本実施形態では、ＣＰＵがＲＯＭ、ハードディスク等の記憶部に記憶されているプログラムを読み出して実行することにより、上記のハードウェア資源とプログラムとが協働し、上述した機能が実現される。 The utterance generation device 10 according to the present embodiment is configured by, for example, a computer device including a CPU (Central Processing Unit), a RAM (Random Access Memory), and a ROM (Read Only Memory) for storing various programs. Further, the computer configuring the utterance generation device 10 may include a storage unit such as a hard disk drive and a nonvolatile memory. In the present embodiment, the CPU reads out and executes a program stored in a storage unit such as a ROM and a hard disk, so that the hardware resources and the program cooperate to realize the above-described functions.

次に、本実施形態に係る発話生成装置１０による発話生成処理の流れを、図４に示すフローチャートを用いて説明する。本実施形態では、発話生成装置１０に、発話生成処理の実行を開始するための予め定めたデータが入力されたタイミングで発話生成処理が開始されるが、発話生成処理が開始されるタイミングはこれに限らず、例えば、対話ルールを示すデータが入力されたタイミングで発話生成処理が開始されても良い。 Next, a flow of an utterance generation process performed by the utterance generation device 10 according to the present embodiment will be described with reference to a flowchart illustrated in FIG. In the present embodiment, the utterance generation process is started at a timing at which predetermined data for starting the execution of the utterance generation process is input to the utterance generation device 10. However, the present invention is not limited to this. For example, the utterance generation processing may be started at a timing when data indicating the interaction rule is input.

ステップＳ１０１では、入力部４０が、ユーザ発話を入力する。 In step S101, the input unit 40 inputs a user utterance.

ステップＳ１０３では、用例検索部４４ａが、ユーザ発話と言外の情報との組の集合である用例データに基づいて、入力部４０により入力されたユーザ発話に対応する言外の情報から、言外の情報の候補を生成する。 In step S103, the example search unit 44a extracts the word from the word information corresponding to the user utterance input by the input unit 40, based on the example data which is a set of the user utterance and the word utterance. Generate information candidates.

ステップＳ１０５では、共起検索部４４ｂが、コーパス文書共起データである、コーパス文書内で共起する述語項構造の組に基づいて、入力部４０により入力されたユーザ発話に対応する述語項構造から、言外の情報の候補を生成する。 In step S105, the co-occurrence search unit 44b generates a predicate term structure corresponding to the user utterance input by the input unit 40 based on a set of predicate term structures co-occurring in the corpus document, which is corpus document co-occurrence data. From the above, an information candidate is generated.

ステップＳ１０７では、言外の情報ランキング部４６が、入力部４０により入力されたユーザ発話と、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出するためのランキングモデルとに基づいて、用例検索部４４ａ又は共起検索部４４ｂにより生成された言外の情報の候補の各々に対して、スコアを算出する。 In step S107, the non-verbal information ranking unit 46 uses the example based on the user utterance input by the input unit 40 and a ranking model for calculating a score indicating the likelihood of the non-verbal information for the user utterance. A score is calculated for each of the unspoken information candidates generated by the search unit 44a or the co-occurrence search unit 44b.

ステップＳ１０９では、タイプ識別部４８が、言外の情報のタイプを識別するためのタイプ識別モデルに基づいて、言外の情報の各々に対してタイプを識別して付与する。 In step S109, the type identification unit 48 identifies and assigns a type to each of the unspoken information based on a type identification model for identifying the type of the unspoken information.

ステップＳ１１１では、タイプフィルタ部５０が、タイプが付与された言外の情報から、タイプに基づいてフィルタリングし、タイプに関する予め定めた条件を満たす言外の情報のみを出力する。 In step S111, the type filter unit 50 performs filtering based on the type from the unspoken information to which the type is assigned, and outputs only unspoken information that satisfies a predetermined condition regarding the type.

ステップＳ１１３では、表現変換部５２が、言外の情報の各々を、表現を変換することにより発話文に変換する。 In step S113, the expression conversion unit 52 converts each of the unspoken information into an utterance sentence by converting the expression.

ステップＳ１１５では、出力部５４が、スコアが算出された言外の情報の候補のうち、スコアが予め定めた条件を満たす言外の情報を応答発話として出力し、本発話生成処理のプログラムの実行を終了する。 In step S115, the output unit 54 outputs, as the response utterance, the information of the unspoken information whose score satisfies the predetermined condition among the candidates of the unspoken information whose score is calculated, and executes the program of the utterance generation processing. To end.

このように、本実施形態では、入力されたユーザ発話と言外の情報との組の集合である用例データ、又はコーパス文書内で共起する述語項構造の組に基づいて、入力されたユーザ発話に対応する言外の情報、又はユーザ発話に対応する述語項構造から、言外の情報の候補を生成する。また、入力されたユーザ発話と、ユーザ発話に対する言外の情報の尤もらしさを表すスコアを算出するためのランキングモデルとに基づいて、生成された言外の情報の候補の各々に対して、スコアを算出する。また、スコアが算出された言外の情報の候補のうち、スコアが予め定めた条件を満たす言外の情報を応答発話として出力する As described above, in the present embodiment, based on the example data that is a set of the set of the input user utterance and the information that is not the word or the set of the predicate term structure that co-occurs in the corpus document, A candidate for unspoken information is generated from the unspoken information corresponding to the utterance or the predicate item structure corresponding to the user utterance. Further, based on the input user utterance and a ranking model for calculating a score indicating the likelihood of the information outside the word with respect to the user utterance, a score for each of the generated candidates of the information outside the word is calculated. Is calculated. In addition, among the candidates of the unspoken information whose score has been calculated, the unspoken information whose score satisfies a predetermined condition is output as a response utterance.

これにより、言外の情報を用いた応答発話を生成することにより、雑談対話システムがユーザの発話内容に限定されない、様々な内容を応答することができる。話をしっかり理解しているとユーザに伝えることができるため、より長く使ってもらえる対話システムが実現される。 Thus, by generating a response utterance using information other than words, the chat dialogue system can respond to various contents that are not limited to the contents of the user's utterance. Since the user can be told that he / she understands the story well, a dialogue system that can be used for a longer time is realized.

なお、本実施形態では、図１乃至図３に示す機能の構成要素の動作をプログラムとして構築し、発話生成装置１０として利用されるコンピュータにインストールして実行させるが、これに限らず、ネットワークを介して流通させても良い。 In the present embodiment, the operations of the constituent elements of the functions shown in FIGS. 1 to 3 are constructed as programs and installed and executed on a computer used as the utterance generation device 10. However, the present invention is not limited to this. May be distributed through the Internet.

また、構築されたプログラムをハードディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールしたり、配布したりしても良い。 Further, the constructed program may be stored in a portable storage medium such as a hard disk or a CD-ROM, and may be installed on a computer or distributed.

１０発話生成装置
１２学習部
１４発話生成部
２０コーパス文書共起データ作成部
２２コーパス文書共起データ記憶部
２４ランキングモデル作成部
２６ランキングモデル記憶部
２８タイプ識別モデル作成部
３０タイプ識別モデル記憶部
４０入力部
４２用例データ記憶部
４４検索部
４４ａ用例検索部
４４ｂ共起検索部
４６言外の情報ランキング部
４８タイプ識別部
５０タイプフィルタ部
５２表現変換部
５４出力部 Reference Signs List 10 utterance generation device 12 learning unit 14 utterance generation unit 20 corpus document co-occurrence data creation unit 22 corpus document co-occurrence data storage unit 24 ranking model creation unit 26 ranking model storage unit 28 type identification model creation unit 30 type identification model storage unit 40 Input unit 42 Example data storage unit 44 Search unit 44a Example search unit 44b Co-occurrence search unit 46 Implicit information ranking unit 48 Type identification unit 50 Type filter unit 52 Expression conversion unit 54 Output unit

Claims

An input unit for inputting a user utterance,
Based on example data, which is a set of a set of a user utterance and information outside of the word, or a set of predicate item structures co-occurring in the corpus document, the word outside the word corresponding to the user utterance input by the input unit Information, or a predicate-argument structure corresponding to the user utterance, a search unit that generates candidates for the unspoken information,
Based on the user utterance input by the input unit, and a ranking model for calculating a score indicating the likelihood of information outside the word for the user utterance, the information of the word outside the word generated by the search unit For each of the candidates, an implicit information ranking unit that calculates the score,
An utterance generation device including:

Based on a type identification model for identifying the type of the word information, a type identification unit that identifies and assigns the type to each of the word information,
The utterance generation device according to claim 1, further comprising: a type filter unit that outputs, from the unspoken information to which the type is assigned, the unspoken information that satisfies a predetermined condition regarding the type.

A type identification model creation unit that creates the type identification model based on data that includes a set of information that is not the word to which the type is given in advance,
The utterance generation device according to claim 2, wherein the type identification unit identifies and assigns the type to each of the unspoken information based on the type identification model created by the type identification model creation unit. .

The output unit that outputs, as a response utterance, the unspoken information whose score satisfies a predetermined condition, among the candidates of the unspoken information for which the score is calculated, further comprising an output unit. The utterance generation device described in the section.

The utterance generation device according to claim 4, further comprising an expression conversion unit configured to convert each of the unspoken information output from the output unit into an utterance sentence.

Based on a set of the user utterance and the non-verbal information of the positive example, and a set of the user utterance and the non-verbal information of the negative example, morphologically analyze each of the user utterance and the non-verbal information, A ranking for creating the ranking model based on a combination of stems among the obtained morphemes and a combination of the stems obtained from a positive set or a negative set. It further comprises a model creation unit,
The verbal information ranking unit performs a morphological analysis of each of the user utterance input by the input unit and the verbal information candidate generated by the search unit, and obtains a stem of the obtained morphemes. The utterance generation device according to any one of claims 1 to 5, wherein the score is calculated for the candidate of the unspoken information based on a combination of the candidates and the ranking model created by the ranking model creation unit. .

An utterance generation method in an utterance generation device including an input unit, a search unit, and an unmarked information ranking unit,
The input unit inputs a user utterance,
The search unit is based on example data, which is a set of a set of a user utterance and extra-word information, or a set of predicate-term structures co-occurring in a corpus document, based on the user utterance input by the input unit. Generating a candidate for the unspoken information from the corresponding unspoken information, or a predicate-argument structure corresponding to the user utterance,
The non-verbal information ranking unit is based on the user utterance input by the input unit, and a ranking model for calculating a score indicating the likelihood of the non-verbal information with respect to the user utterance. For each of the generated implicit information candidates, calculating the score,
An utterance generation method including:

An utterance generation program for causing a computer to function as each unit of the utterance generation device according to any one of claims 1 to 6.