JP6704976B2

JP6704976B2 - Information processing apparatus, information processing method, and program

Info

Publication number: JP6704976B2
Application number: JP2018219727A
Authority: JP
Inventors: 力橋本; 颯々野　学; 学颯々野
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2018-11-22
Filing date: 2018-11-22
Publication date: 2020-06-03
Anticipated expiration: 2037-09-07
Also published as: JP2019050037A

Description

本発明は、情報処理装置、情報処理方法、およびプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program.

従来、ユーザの会話内容に相手の言葉を聞き返す、あるいは確認する等の言葉を予め登録しておき、会話内容にこれら登録された言葉が含まれているときは、会話が有効に行われていないと判断する装置が開示されている（特許文献１参照）。 Conventionally, when the user's conversation content has previously registered words such as listening to or confirming the other party's words, and the conversation content includes these registered words, the conversation is not effectively conducted. There is disclosed a device that judges that the device is a device (see Patent Document 1).

特開２００７−４３３５６号公報JP, 2007-43356, A

しかしながら、上記の装置において、予め登録された言葉に会話の有効性の判断が依存しているため、それ以外の言葉に対する判断をすることができない場合があった。 However, in the above-mentioned device, since the judgment of the effectiveness of the conversation depends on the words registered in advance, it may not be possible to make a judgment for other words.

本発明は、このような事情を考慮してなされたものであり、未知の会話が所定の種別であるかを判定する手がかりを自動的に取得することができる情報処理装置、情報処理方法、およびプログラムを提供することを目的の一つとする。 The present invention has been made in consideration of such circumstances, and an information processing apparatus, an information processing method, and an information processing method capable of automatically acquiring a clue for determining whether an unknown conversation is of a predetermined type. One of the purposes is to provide a program.

本発明の一態様は、第１の発話主体により発せられた発話に対する第２の発話主体の所定の反応を示していると推定されるフィードバック発話を取得する取得部と、前記取得部により取得された前記フィードバック発話の直前に、所定の種別の会話が現れると推定される指標を導出する前段分類器を生成する前段生成部とを備える情報処理装置である。 One aspect of the present invention includes an acquisition unit that acquires a feedback utterance estimated to indicate a predetermined reaction of the second utterance subject with respect to an utterance issued by the first utterance subject, and the acquisition unit obtains the feedback utterance. Further, the information processing apparatus includes a pre-stage generation unit that generates a pre-stage classifier that derives an index estimated that a predetermined type of conversation appears just before the feedback utterance.

本発明の一態様によれば、未知の会話が所定の種別であるかを判定する手がかりを自動で取得することができる。 According to one aspect of the present invention, it is possible to automatically obtain a clue for determining whether an unknown conversation is a predetermined type.

情報処理システム１の構成の一部を示す図である。It is a figure which shows a part of structure of the information processing system 1. フィードバック発話に付与されるラベルの内容の一例を示す図である。It is a figure which shows an example of the content of the label given to a feedback utterance. フィードバック発話の直前の会話に付与されるラベルの内容の一例を示す図である。It is a figure which shows an example of the content of the label given to the conversation just before a feedback utterance. 発話分類器２４に入力されるフィードバック発話と、発話分類器２４によって出力される発話スコアの一例を示す図である。FIG. 6 is a diagram showing an example of feedback utterances input to the utterance classifier 24 and utterance scores output by the utterance classifier 24. 情報処理システム１の他の構成を示す図である。It is a figure which shows the other structure of the information processing system 1. 第１の種別側または第２の種別側に偏しているフィードバック発話の一例を示す図である。It is a figure which shows an example of the feedback utterance biased to the 1st classification side or the 2nd classification side. 会話学習データ記憶装置５２に記憶された会話の一例を示す図である。It is a figure which shows an example of the conversation memorize|stored in the conversation learning data storage device 52. 学習の処理を概念的に示す図である。It is a figure which shows the process of learning notionally. 会話分類器６６に入力される会話と、会話分類器６６により出力される情報の一例を示す図である。FIG. 6 is a diagram showing an example of a conversation input to a conversation classifier 66 and information output by the conversation classifier 66. 情報処理システム１により発話分類器２４が生成される処理の流れを示すフローチャートである。6 is a flowchart showing a flow of processing in which a speech classifier 24 is generated by the information processing system 1. 情報処理システム１により会話分類器６６が生成される処理の流れを示すフローチャートである。6 is a flowchart showing a flow of processing in which a conversation classifier 66 is generated by the information processing system 1. 比較例１および比較例２の機能構成を示す図である。It is a figure which shows the function structure of the comparative example 1 and the comparative example 2. 情報処理システム１、比較例１、および比較例２の処理結果の一例を示す図である。It is a figure which shows an example of the process result of the information processing system 1, the comparative example 1, and the comparative example 2. 変形例１の情報処理システム１Ａの機能構成の一例を示す図である。It is a figure showing an example of functional composition of information processing system 1A of the modification 1. 変形例２の情報処理システム１Ｂの機能構成の一例を示す図である。It is a figure which shows an example of a functional structure of the information processing system 1B of the modification 2. 変形例３の情報処理システム１Ｃの機能構成の一例を示す図である。It is a figure showing an example of functional composition of information processing system 1C of the modification 3. 情報処理システム１、変形例１、変形例２、および変形例３の処理結果の一例を示す図である。It is a figure which shows an example of the processing result of the information processing system 1, the modification 1, the modification 2, and the modification 3.

以下、図面を参照し、本発明の情報処理装置、情報処理方法、およびプログラムの実施形態について説明する。以下の説明では、自動応答装置または利用者が発した言葉を「発話」、発話の集合を「会話」、第１の発話主体により発せられた発話に対する第２の発話主体の所定の反応を示していると推定される発話を「フィードバック発話」と称する。第１の発話主体の一例として自動応答装置があり、第２の発話主体の一例として利用者（人）がある。 Hereinafter, embodiments of an information processing apparatus, an information processing method, and a program of the present invention will be described with reference to the drawings. In the following description, the words uttered by the automatic response device or the user are "utterances", the set of utterances is "conversation", and a predetermined reaction of the second utterer to the utterances given by the first utterer is shown. Utterances that are estimated to be present are referred to as “feedback utterances”. An automatic response device is an example of the first utterance subject, and a user (person) is an example of the second utterance subject.

情報処理装置は、一以上のプロセッサにより実現される。情報処理装置は、例えば利用者と自動応答装置との間で行われる会話に対して、会話の種別を示す指標を導出する。会話の種別とは、例えば、会話が不自然であるか（その逆に会話が自然であるか）である。会話が不自然であるとは、例えば適切な自動応答がなされなかった結果、会話が成立していないことである。なお、会話の種別は、会話が不自然であるかに限らず、任意に定められてもよい。 The information processing device is realized by one or more processors. The information processing device derives an index indicating the type of conversation with respect to the conversation conducted between the user and the automatic response device, for example. The type of conversation is, for example, whether the conversation is unnatural (or vice versa). The conversation being unnatural means that the conversation is not established as a result of, for example, an appropriate automatic response not being made. Note that the type of conversation is not limited to whether the conversation is unnatural and may be set arbitrarily.

また、情報処理装置は、その処理の過程において、発話分類器、および会話分類器を生成する。発話分類器は、フィードバック発話に対して与えられる指標であって、フィードバック発話の直前に、不自然な会話または自然な会話が現れると推定される度合を示す指標（後述する発話スコア）を導出するものである。なお、以下に説明する実施形態では、発話スコアは、フィードバック発話の直前に、不自然な会話が現れると推定される度合を示す指標である例について説明する。また、以下、「直前に現れる会話」（あるいは「直前の会話」）とは、利用者の発話と、それに対する自動応答装置の発話との組み合わせであるものとする。会話分類器は、会話に対して与えられる指標であって、会話が不自然である度合を示す指標（後述する会話スコア）を導出するものである。 The information processing device also generates a speech classifier and a conversation classifier in the process of the processing. The utterance classifier derives an index (an utterance score described later) indicating the degree to which an unnatural conversation or a natural conversation is estimated to appear immediately before the feedback utterance, which is an index given to the feedback utterance. It is a thing. In the embodiment described below, an example in which the utterance score is an index indicating the degree to which an unnatural conversation is estimated to appear immediately before the feedback utterance will be described. In addition, hereinafter, the “conversation that appears immediately before” (or “the last conversation”) is a combination of the user's utterance and the utterance of the automatic response device. The conversation classifier derives an index (conversation score described later) indicating the degree of unnaturalness of the conversation, which is an index given to the conversation.

［構成］
図１は、情報処理システム１の構成の一部を示す図である。情報処理システム１は、例えば、会話ログ記憶装置１０と、フィードバック発話記憶装置１２と、発話学習データ記憶装置１４と、取得部２０と、発話分類器生成部（前段生成部）２２と、発話分類器２４とを備える。なお、上述した機能構成は装置として構成されてもよい。 [Constitution]
FIG. 1 is a diagram showing a part of the configuration of the information processing system 1. The information processing system 1 includes, for example, a conversation log storage device 10, a feedback utterance storage device 12, an utterance learning data storage device 14, an acquisition unit 20, an utterance classifier generation unit (previous generation unit) 22, and an utterance classification. And a container 24. The functional configuration described above may be configured as an apparatus.

取得部２０、発話分類器生成部２２、および発話分類器２４は、例えば、ＣＰＵ（Central Processing Unit）などのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。また、これらの構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）などのハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。 The acquisition unit 20, the speech classifier generation unit 22, and the speech classifier 24 are realized, for example, by a hardware processor such as a CPU (Central Processing Unit) executing a program (software). Further, some or all of these components are hardware (circuits) such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), and GPU (Graphics Processing Unit). Part; including circuitry), or may be realized by cooperation of software and hardware.

情報処理システム１に含まれる各記憶装置は、例えば、例えば、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）フラッシュメモリ、ＳＤカード、ＲＡＭ（Random Access Memory）、レジスタ等によって実現される。 Each storage device included in the information processing system 1 is realized by, for example, a ROM (Read Only Memory), a HDD (Hard Disk Drive) flash memory, an SD card, a RAM (Random Access Memory), a register, or the like.

会話ログ記憶装置１０には、会話のログ情報が記憶されている。この会話のログ情報は、例えば、人工知能（Artificial Intelligence; AI）により動作する自動応答装置と、利用者とによって行われた会話のテキスト情報である。テキスト情報は、音声認識によって音声による発話から変換されたものであってもよい。 The conversation log storage device 10 stores conversation log information. The log information of the conversation is, for example, text information of the conversation performed by the user and the automatic response device that operates by artificial intelligence (AI). The text information may be converted from a speech utterance by voice recognition.

フィードバック発話記憶装置１２には、フィードバック発話が記憶されている。フィードバック発話記憶装置１２に記憶されるフィードバック発話は、会話ログ記憶装置１０から抽出されたものである。フィードバック発話は、例えば、予め設定されたフィードバック発話である。例えば、作業者が、会話ログ記憶装置１０に記憶された会話のログ情報から抽出したり、所定の装置（またはその他のシステム）が、予め設定されたフィードバック発話の文言に基づいて、会話ログ記憶装置１０に記憶された会話のログ情報から自動で抽出したりしてフィードバック発話が抽出（取得）される。 Feedback utterances are stored in the feedback utterance storage device 12. The feedback utterance stored in the feedback utterance storage device 12 is extracted from the conversation log storage device 10. The feedback utterance is, for example, a preset feedback utterance. For example, a worker extracts from the conversation log information stored in the conversation log storage device 10, or a predetermined device (or other system) stores the conversation log based on a preset wording of the feedback utterance. Feedback utterances are extracted (acquired) by automatically extracting them from the conversation log information stored in the device 10.

上述したようにフィードバック発話記憶装置１２には、会話ログ記憶装置１０から取得されたフィードバック発話が記憶される。図２は、フィードバック発話の一例を示す図である。例えば、フィードバック発話は、（１）「違う違うどういうこと」、（２）「わかりましたありがとう」などのような発話を含む。フィードバック発話が直前の会話を否定するもの、又は肯定するものであるか否かのみでは、直前の会話が成立しているか否かの判定を正確に行うことができない。そこで本実施形態の情報処理システム１では、フィードバック発話のみで、その直前の会話が成立しているか否か等を示すスコア（確率）を出力する発話分類器２４を生成する。 As described above, the feedback utterance storage device 12 stores the feedback utterance acquired from the conversation log storage device 10. FIG. 2 is a diagram illustrating an example of feedback utterance. For example, the feedback utterance includes utterances such as (1) "what is different and different" and (2) "thank you". Whether or not the feedback utterance is one that denies or affirms the immediately preceding conversation cannot accurately determine whether or not the immediately previous conversation has been established. Therefore, the information processing system 1 of the present embodiment generates the utterance classifier 24 that outputs only the feedback utterance and outputs the score (probability) indicating whether or not the conversation immediately before the feedback utterance is established.

まず、会話ログ記憶装置１０からフィードバック発話を有する会話が取得され、図３に示すようにフィードバック発話の直前の会話に対して自然（第１の種別を表すラベル）又は不自然（第２の種別を表すラベル）のラベルが付与される。そして、フィードバック発話の直前の会話に対して付与されたラベルをフィードバック発話の教師ラベルとした学習データが生成され、発話学習データ記憶装置１４に記憶される。 First, a conversation having a feedback utterance is acquired from the conversation log storage device 10, and as shown in FIG. 3, the conversation immediately before the feedback utterance is natural (label indicating the first type) or unnatural (second type). Label) is added. Then, the learning data in which the label given to the conversation immediately before the feedback utterance is used as the teacher label of the feedback utterance is generated and stored in the utterance learning data storage device 14.

図３の例では、「ありがとう」の直前の会話が自然であるため「ありがとう」というフィードバック発話に対して、自然であるという教師ラベルが付与され、「どういう意味」の直前の会話が不自然であるため、「どういう意味」というフィードバック発話に対して不自然であるという教師ラベルが付与された学習データが生成される。 In the example of FIG. 3, since the conversation immediately before “Thank you” is natural, the teacher utterance “natural” is given to the feedback utterance “Thank you”, and the conversation immediately before “What does” is unnatural. Therefore, the learning data with the teacher label "what is meant" is unnatural with respect to the feedback utterance is generated.

発話分類器生成部２２は、上記学習データを学習し、発話分類器２４を生成する。また、発話分類器生成部２２は、ニューラルネットワークなどを用いたディープラーニング技術や、ＳＶＭ（Support Vector Machine）などの手法を用いた学習を行う。 The utterance classifier generation unit 22 learns the learning data and generates the utterance classifier 24. The utterance classifier generation unit 22 also performs learning using a deep learning technique using a neural network or a technique such as SVM (Support Vector Machine).

発話分類器２４は、未知または既知のフィードバック発話が与えられると、その直前に現れる会話が不自然である確率を表す発話スコアを導出する。発話分類器２４に与えられるフィードバック発話は、例えば、フィードバック発話記憶装置１２から取得部２０により取得されたフィードバック発話である。発話スコアは、フィードバック発話の直前に自動応答装置により発せられた発話が、その直前に人により発せられた発話に対して不自然であると推定される度合を示す指標である。すなわち、発話スコアが高い程、利用者と自動応答装置との間で行われたフィードバック発話の直前の会話が不自然である確率が高くなる。 The utterance classifier 24, when given an unknown or known feedback utterance, derives an utterance score that represents the probability that the immediately preceding conversation is unnatural. The feedback utterance given to the utterance classifier 24 is, for example, the feedback utterance acquired by the acquisition unit 20 from the feedback utterance storage device 12. The utterance score is an index indicating the degree to which the utterance made by the automatic response device immediately before the feedback utterance is estimated to be unnatural with respect to the utterance made by a person immediately before the feedback utterance. That is, the higher the utterance score, the higher the probability that the conversation immediately before the feedback utterance between the user and the automatic response device is unnatural.

図４は、発話分類器２４に入力されるフィードバック発話と、発話分類器２４によって出力される発話スコアとの一例を示す図である。例えば、発話「違う違うどういうこと（図中、ＦＢ１）」に対して導出される発話スコアは、発話「なかなか素直でよろしい（図中、ＦＢ２）」に対して導出される発話スコアに比して高くなる。 FIG. 4 is a diagram showing an example of feedback utterances input to the utterance classifier 24 and utterance scores output by the utterance classifier 24. For example, the utterance score derived for the utterance "what is different (FB1 in the figure)" is higher than the utterance score derived for the utterance "It is fairly straightforward and good (FB2 in the figure)". Get higher

発話分類器２４が導出するスコアにより、自然な会話か不自然な会話かを判断する際に、会話の中身を精査することなく、フィードバック発話のみで判断することが可能になる。また、本実施形態では、付与されるラベルは、直前の会話が自然であることを示す自然ラベル、または直前の会話が不自然であることを示す不自然ラベルの２値ラベルであるが、第１の種別を示すラベル、または第２の種別を示すラベルは、直前の会話が成立しているか（または自然であるか）、不成立であるか（または不自然であるか）に限らず、任意のフィードバック発話に対して付与されてもよい。例えば、賞賛や受諾、了解、感謝、面白さ等を示すフィードバック発話に第１の種別を示すラベルが付与されたり、失望や、伝達不良、不可解、軽蔑、退屈等を示すフィードバック発話に第２の種別を示すラベルが付与されたりしてもよい。 The score derived by the utterance classifier 24 makes it possible to judge only a feedback utterance when deciding whether the conversation is natural or unnatural, without scrutinizing the contents of the conversation. Further, in the present embodiment, the assigned label is a binary label such as a natural label indicating that the immediately previous conversation is natural or an unnatural label indicating that the immediately previous conversation is unnatural. The label indicating the type 1 or the label indicating the second type is not limited to whether the immediately preceding conversation is established (or natural) or not established (or unnatural), and is arbitrary. May be added to the feedback utterance. For example, a feedback utterance indicating praise, acceptance, comprehension, appreciation, interest, etc. may be labeled with the first type, or disappointment, poor communication, incomprehension, contempt, boredom, etc. A label indicating the type may be added.

発話分類器２４は、フィードバック発話に対して、その発話スコアを対応付けた対応情報を、情報処理システム１の後述するスコア付きフィードバック発話記憶装置５０に記憶させる。 The utterance classifier 24 stores the correspondence information in which the utterance score is associated with the feedback utterance in the scored feedback utterance storage device 50 of the information processing system 1 described later.

図５は、情報処理システム１の他の構成を示す図である。情報処理システム１は、図１で示した構成に加え、更に会話ログ記憶装置４０と、スコア付きフィードバック発話記憶装置５０と、会話学習データ記憶装置５２と、抽出部６２と、学習データ生成部６３と、会話分類器生成部（後段生成部）６４と、会話分類器６６とを備える。なお、これらの機能構成は、装置として構成されてもよい。また、情報処理システム１に含まれる機能構成のうち、任意の機能構成が装置として構成されてもよい。 FIG. 5 is a diagram showing another configuration of the information processing system 1. In addition to the configuration shown in FIG. 1, the information processing system 1 further includes a conversation log storage device 40, a feedback utterance storage device with score 50, a conversation learning data storage device 52, an extraction unit 62, and a learning data generation unit 63. A conversation classifier generation unit (post-stage generation unit) 64 and a conversation classifier 66. Note that these functional configurations may be configured as a device. Further, of the functional configurations included in the information processing system 1, any functional configuration may be configured as a device.

例えば、抽出部６２、学習データ生成部６３、会話分類器生成部６４、および会話分類器６６のうち一部または全部は、例えば、ＣＰＵなどのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。また、これらの構成要素のうち一部または全部は、ＬＳＩやＡＳＩＣ、ＦＰＧＡ、ＧＰＵなどのハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。 For example, some or all of the extraction unit 62, the learning data generation unit 63, the conversation classifier generation unit 64, and the conversation classifier 66 are executed by a hardware processor such as a CPU executing a program (software). Will be realized. Further, some or all of these constituent elements may be realized by hardware (including a circuit unit; circuitry) such as LSI, ASIC, FPGA, and GPU, or realized by cooperation of software and hardware. May be done.

図１および図５に示す構成要素は、例えば、ソフトウェア間通信により、或いはハードウェアネットワークを介して通信する。ハードウェアネットワークは、例えば、ＷＡＮ（Wide Area Network）やＬＡＮ（Local Area Network）、インターネット、専用回線、無線基地局、プロバイダなどを含んでよい。 The components shown in FIGS. 1 and 5 communicate, for example, by software communication or via a hardware network. The hardware network may include, for example, a WAN (Wide Area Network), a LAN (Local Area Network), the Internet, a dedicated line, a wireless base station, a provider, and the like.

会話ログ記憶装置４０には、例えば、会話のログ情報が記憶されている。この会話のログ情報は、会話ログ記憶装置１０に記憶された情報と同一であってもよいし、異なっていてもよい。 The conversation log storage device 40 stores, for example, conversation log information. The conversation log information may be the same as or different from the information stored in the conversation log storage device 10.

スコア付きフィードバック発話記憶装置５０には、発話分類器２４によって発話スコアが導出されたフィードバック発話と、そのフィードバック発話に対する発話スコアとが記憶されている。 The feedback utterance storage device with score 50 stores the feedback utterance whose utterance score is derived by the utterance classifier 24 and the utterance score for the feedback utterance.

抽出部６２は、スコア付きフィードバック発話記憶装置５０からフィードバック発話及びそれに対応するスコアを取得し、会話ログ記憶装置４０からフィードバック発話を含む会話（フィードバック発話およびその直前の会話）を取得する。会話ログ記憶装置４０から、フィードバック発話を含む会話が取得される際、スコア付きフィードバック発話記憶装置５０から抽出されたフィードバック発話が利用される。 The extraction unit 62 acquires the feedback utterance and the score corresponding to the feedback utterance from the scored feedback utterance storage device 50, and acquires the conversation including the feedback utterance (the feedback utterance and the conversation immediately before it) from the conversation log storage device 40. When the conversation including the feedback utterance is acquired from the conversation log storage device 40, the feedback utterance extracted from the scored feedback utterance storage device 50 is used.

抽出部６２は、会話ログ記憶装置４０から抽出したフィードバック発話を含む会話を取得し、スコア付きフィードバック発話記憶装置５０から抽出されたフィードバック発話に付されたスコアに基づいて、フィードバック発話の直前の会話にスコアを付与する。 The extraction unit 62 acquires a conversation including the feedback utterance extracted from the conversation log storage device 40, and based on the score attached to the feedback utterance extracted from the scored feedback utterance storage device 50, the conversation immediately before the feedback utterance. Give a score to.

なお、スコア付きフィードバック発話記憶装置５０に記憶されていないフィードバック発話を含む会話についてスコアを付与したい場合には、その会話のフィードバック発話を発話分類器２４に与えてスコアを取得する。 When it is desired to give a score to a conversation including a feedback utterance that is not stored in the scored feedback utterance storage device 50, the feedback utterance of the conversation is given to the utterance classifier 24 to obtain the score.

学習データ生成部６３は、フィードバック発話の直前の会話に付与されたスコアに基づいて、種別を表すラベルをフィードバック発話の直前の会話に付与し、フィードバック発話の直前の会話とその種別を教師ラベルとした学習データを生成し、会話学習データ記憶装置５２に学習データを記憶させる。 The learning data generation unit 63 assigns a label representing the type to the conversation immediately before the feedback utterance based on the score given to the conversation immediately before the feedback utterance, and the conversation immediately before the feedback utterance and the type thereof as the teacher label. The learning data is generated and the learning data is stored in the conversation learning data storage device 52.

例えば、第１閾値（例えば０．３）以下のスコアが付与されたフィードバック発話の直前の会話については、第１の種別のラベルが付与され、第２閾値（例えば０．７）以上のスコアが付与されたフィードバック発話の直前の会話については、第２の種別のラベルが付与される。ラベルの付与については、上述したようなスコアの閾値ではなく、スコアの高いもの順にフィードバック発話が並び替えられ、上位の所定割合（例えば２割）のフィードバック発話の直前の会話に第２の種別のラベルが付与され、それ以外の直前の会話に第１の種別のラベルが付与されるようにしてもよい。 For example, a conversation immediately before a feedback utterance to which a score equal to or lower than a first threshold value (eg, 0.3) is assigned a label of the first type and a score equal to or higher than a second threshold value (eg, 0.7) is given. The second type of label is given to the conversation immediately before the given feedback utterance. Regarding the labeling, the feedback utterances are rearranged in the descending order of the score, rather than the threshold of the score as described above, and the conversations immediately before the feedback utterances of the upper predetermined ratio (eg, 20%) are classified into the second type. The label may be given, and the first type of label may be given to the conversation immediately before that.

図６は、スコアが付与されたフィードバック発話の一例を示す図である。例えば０．３以下のスコアが付与されたものについては第１の種別のラベル、０．７以上のスコアが付与されたものについては第２の種別のラベルを付与した場合、「頭いいですね」、および「なかなか素直でよろしい」が、発話スコアが第１の種別側に偏しているフィードバック発話の一例であり、「違う違うどういうこと」、および「会話になっていませんけど」が、発話スコアが第２の種別側に偏しているフィードバック発話の一例となる。 FIG. 6 is a diagram showing an example of feedback utterances to which a score is added. For example, if a label with a score of 0.3 or less is given a label of the first type, and if a score of 0.7 or more is given a label of the second type, ”, and “It’s fairly straightforward and nice” are examples of feedback utterances in which the utterance score is biased toward the first category side, and “What is different and what is different”, This is an example of feedback utterances in which the utterance score is biased toward the second type.

図７は、会話ログ記憶装置４０から抽出された会話の一例を示す図である。フィードバック発話が「違う違うどういうこと」の直前の会話に対しては、フィードバック発話「違う違うどういうこと」のスコアに基づいてラベルが付与される。フィードバック発話が「頭いいですね」の直前の会話に対しては、フィードバック発話「頭いいですね」のスコアに基づいてラベルが付与される。例えば、０.３以下のスコアが付与されたフィードバック発話の直前の会話については、第１の種別のラベルが付与され、０.７以上のスコアが付与されたフィードバック発話の直前の会話については、第２の種別のラベルが付与される場合、「４２６＋１２９は」「答えは５５５です」の会話については第１の種別のラベルが付与され、「閲覧履歴を見せて」「ふふふ」の会話について第２の種別のラベルが付与された学習データが学習データ生成される。 FIG. 7 is a diagram showing an example of a conversation extracted from the conversation log storage device 40. For the conversation immediately before the feedback utterance "what is different and different", a label is given based on the score of the feedback utterance "different and what different". A label immediately before the feedback utterance "cool" is given based on the score of the feedback utterance "cool". For example, for a conversation immediately before a feedback utterance with a score of 0.3 or less, a label of the first type is given, and for a conversation immediately before a feedback utterance with a score of 0.7 or more, When the second type of label is given, the first type of label is given for the conversation of "426 + 129 is" and "The answer is 555", and the first for the conversation of "Show browsing history" and "Fufufu". The learning data to which the label of the type 2 is given is generated as the learning data.

会話学習データ記憶装置５２には、学習データ生成部６３により生成された（フィードバック発話を含まない）フィードバック発話の直前の会話に上記の種別（例えば第１の種別または第２の種別）を示すラベルが付与された学習データが記憶される。 In the conversation learning data storage device 52, a label indicating the type (for example, the first type or the second type) in the conversation immediately before the feedback utterance (not including the feedback utterance) generated by the learning data generating unit 63. The learning data assigned with is stored.

会話分類器生成部６４は、抽出部６２により抽出されたフィードバック発話の直前の会話に基づいて、未知の会話の種別を示す指標である会話スコアを導出する会話分類器６６を生成する。会話分類器生成部６４は、第１の種別側に偏したフィードバック発話の直前の会話と、第２の種別側に偏したフィードバック発話の直前の会話と、これらの直前の会話に付与されたラベル（第１の種別または第２の種別）の情報とに基づいて学習を行う。学習は、例えば機械学習によって行われる。会話分類器生成部６４は、抽出部６２により抽出された直前の会話、および直前の会話に付与されているラベルの情報を教師ラベルとした機械学習に基づいて会話分類器６６を生成する。会話分類器生成部６４は、ニューラルネットワークなどを用いたディープラーニング技術や、ＳＶＭなどの手法を用いて上記の学習をしてもよい。 The conversation classifier generation unit 64 generates a conversation classifier 66 that derives a conversation score that is an index indicating the type of unknown conversation, based on the conversation immediately before the feedback utterance extracted by the extraction unit 62. The conversation classifier generation unit 64 includes a conversation immediately before the feedback utterance biased to the first type side, a conversation immediately before the feedback utterance biased to the second type side, and labels given to these immediately previous conversations. Learning is performed based on the (first type or second type) information. Learning is performed by machine learning, for example. The conversation classifier generation unit 64 generates the conversation classifier 66 based on machine learning using the immediately previous conversation extracted by the extraction unit 62 and label information given to the immediately previous conversation as a teacher label. The conversation classifier generation unit 64 may perform the above learning using a deep learning technique using a neural network or the like, or a method such as SVM.

図８は、学習の処理を概念的に示す図である。例えば、第２の種別側に偏したフィードバック発話が「違う違うどういうこと」である場合、「違う違うどういうこと」の直前に現れた会話ａ〜ｃが抽出される。また、第１の種別側に偏したフィードバック発話が「頭いいですね」である場合、「頭いいですね」の直前に現れた会話ｄ〜ｆが抽出される。このように、ユーザと自動応答装置による会話において、自然または不自然な会話である確率が高い会話から会話分類器６６が学習される。 FIG. 8 is a diagram conceptually showing the learning process. For example, if the feedback utterance biased toward the second type is "what is different and different", the conversations a to c that appear immediately before "what is different and different" are extracted. If the feedback utterance biased toward the first type is "cool", the conversations d to f appearing immediately before "cool" are extracted. Thus, in the conversation between the user and the automatic response device, the conversation classifier 66 is learned from the conversation having a high probability of being a natural or unnatural conversation.

会話分類器６６は、未知または既知の会話が与えられると、その会話の種別を示す会話スコア（後段指標）を導出する。会話スコアは、自動応答装置により発せられた発話が、その直前に人により発せられた発話に対して不自然であると推定される度合を示す指標である。すなわち、会話スコアが高い程、利用者と自動応答装置との間で行われた会話が不自然である確率が高い。 When an unknown or known conversation is given, the conversation classifier 66 derives a conversation score (second stage index) indicating the type of the conversation. The conversation score is an index indicating the degree to which the utterance uttered by the automatic response device is estimated to be unnatural with respect to the utterance made by a person immediately before. That is, the higher the conversation score, the higher the probability that the conversation conducted between the user and the automatic response device is unnatural.

図９は、会話分類器６６に入力される（未知の）会話と、会話分類器６６により出力される情報の一例を示す図である。例えば、会話分類器６６に利用者の発話「パチンコ勝てないんだけど」、および利用者の発話に対する応答である自動応答装置の発話「募金なんかいかがでしょうか」が入力されると、会話分類器６６は、例えば、上記の会話が不自然である確率は９５パーセントであることを出力する。このように、会話分類器６６は、フィードバック発話が後続しない、未知の会話に対しても会話の自然さ、または不自然さを判断することができる。 FIG. 9 is a diagram showing an example of (unknown) conversation input to the conversation classifier 66 and information output by the conversation classifier 66. For example, when the user's utterance "I can't win a pachinko game" and the utterance "How about a donation?" from the automatic response device, which is a response to the user's utterance, are input to the conversation classifier 66. Outputs, for example, that the probability that the above conversation is unnatural is 95%. In this way, the conversation classifier 66 can determine the naturalness or unnaturalness of a conversation even for an unknown conversation that is not followed by a feedback utterance.

また、上記処理において、会話Ａとして「閲覧履歴を見せて」「ふふふ」は、不自然である確率が高いことが、学習されたものとする。例えば、未知の会話「ヒストリーを見せて」「ふふふ」が、会話分類器６６に入力された場合、会話分類器６６は、その未知の会話に対して会話Ａと同様に不自然である確率が高い会話スコアを導出する。「閲覧履歴」と「ヒストリー」は意味的に近い言葉であるためである。 Further, in the above process, it is assumed that it is learned that “show the browsing history” and “fufufu” as conversation A have a high probability of being unnatural. For example, when an unknown conversation “Show History” “Fufufu” is input to the conversation classifier 66, the conversation classifier 66 has the same probability of being unnatural as the conversation A with respect to the unknown conversation. Derive a high conversation score. This is because "browsing history" and "history" are words that are close in meaning.

［発話分類器が生成される処理］
図１０は、情報処理システム１により発話分類器２４が生成される処理の流れを示すフローチャートである。まず、発話分類器生成部２２が、発話学習データ記憶装置１４から、学習データであるフィードバック発話およびフィード発話に付与された教師ラベルを取得する（Ｓ１００）。 [Process for generating speech classifier]
FIG. 10 is a flowchart showing the flow of processing in which the utterance classifier 24 is generated by the information processing system 1. First, the utterance classifier generation unit 22 acquires, from the utterance learning data storage device 14, the feedback utterance which is the learning data and the teacher label assigned to the feed utterance (S100).

次に、発話分類器生成部２２が、Ｓ１００で取得した学習データに基づいて、フィードバック発話の直前に不自然な会話、または自然な会話が出現する確率を学習する（Ｓ１０２）。次に、発話分類器生成部２２が、Ｓ１０２の学習の結果に基づいて、発話分類器２４を生成する（Ｓ１０４）。 Next, the utterance classifier generation unit 22 learns the probability that an unnatural conversation or a natural conversation will appear immediately before the feedback utterance, based on the learning data acquired in S100 (S102). Next, the speech classifier generating unit 22 generates the speech classifier 24 based on the learning result of S102 (S104).

次に、取得部２０は、発話スコアを付与する対象のフィードバック発話を取得し、取得したフィードバック発話を発話分類器２４に入力する。発話分類器２４は、入力されたフィードバック発話に発話スコアを付与し、フィードバック発話の発話スコアと、そのフィードバック発話とを対応付けた対応情報を、情報処理システム１のスコア付きフィードバック発話記憶装置５０に記憶させる（Ｓ１０６）。これにより、本フローチャートの処理は終了する。 Next, the acquisition unit 20 acquires the feedback utterance of the target to which the utterance score is given, and inputs the acquired feedback utterance into the utterance classifier 24. The utterance classifier 24 gives an utterance score to the input feedback utterance, and associates the utterance score of the feedback utterance with the feedback utterance in correspondence information in the scored feedback utterance storage device 50 of the information processing system 1. It is stored (S106). This completes the processing of this flowchart.

上述した処理により、フィードバック発話に対して、直前の会話が不自然である度合を示す発話スコアを導出する発話分類器２４が生成され、生成された発話分類器２４によって所定のフィードバック発話に対してスコアが付与される。 Through the above-described processing, the utterance classifier 24 that derives the utterance score indicating the degree to which the immediately preceding conversation is unnatural is generated for the feedback utterance, and the generated utterance classifier 24 responds to the predetermined feedback utterance. A score is given.

［会話分類器が生成される処理］
図１１は、情報処理システム１により会話分類器６６が生成される処理の流れを示すフローチャートである。まず、抽出部６２が、スコア付きフィードバック発話記憶装置５０に記憶された対応情報を取得する（Ｓ２００）。次に、抽出部６２が、Ｓ２００で取得された対応情報から、発話スコアが付与されたフィードバック発話を自動的に抽出する（Ｓ２０２）。 [Process for generating conversation classifier]
FIG. 11 is a flowchart showing the flow of processing in which the conversation classifier 66 is generated by the information processing system 1. First, the extraction unit 62 acquires the correspondence information stored in the feedback utterance storage device with score 50 (S200). Next, the extraction unit 62 automatically extracts the feedback utterance to which the utterance score has been added, from the correspondence information acquired in S200 (S202).

次に、抽出部６２は、Ｓ２０２で抽出した各フィードバック発話を含む会話（フィードバック発話及びそのフィードバック発話の直前の会話）を、会話ログ記憶装置４０に記憶されたログ情報から抽出し、スコア付きフィードバック発話記憶装置５０から抽出したフィードバック発話に付与されたスコアに基づいて、抽出したフィードバック発話の直前の会話にスコアを付与する（Ｓ２０４）。次に、学習データ生成部６３が、ステップＳ２０４で付与されたスコアに基づいて、種別を表すラベルをフィードバック発話の直前の会話に付与し、フィードバック発話の直前の会話とその種別を教師ラベルとした情報とを含む学習データを生成し、会話学習データ記憶装置５２に学習データを記憶させる（Ｓ２０６）。 Next, the extraction unit 62 extracts the conversation including the feedback utterances extracted in S202 (the feedback utterance and the conversation immediately before the feedback utterance) from the log information stored in the conversation log storage device 40, and the feedback with the score. Based on the score given to the feedback utterance extracted from the utterance storage device 50, a score is given to the conversation immediately before the extracted feedback utterance (S204). Next, the learning data generation unit 63 adds a label indicating the type to the conversation immediately before the feedback utterance based on the score given in step S204, and sets the conversation immediately before the feedback utterance and its type as the teacher label. Learning data including information is generated, and the learning data is stored in the conversation learning data storage device 52 (S206).

次に、会話分類器生成部６４が、Ｓ２０６で生成され会話学習データ記憶装置５２に記憶された学習データに基づいて学習を行う（Ｓ２０８）。次に、会話分類器生成部６４が、Ｓ２０８の学習の結果に基づいて、会話分類器６６を生成する（Ｓ２１０）。これにより、本フローチャートの処理は終了する。 Next, the conversation classifier generation unit 64 performs learning based on the learning data generated in S206 and stored in the conversation learning data storage device 52 (S208). Next, the conversation classifier generation unit 64 generates the conversation classifier 66 based on the learning result of S208 (S210). This completes the processing of this flowchart.

上述した処理により、会話の不自然さを示す会話スコアを導出する会話分類器６６がされる。 Through the above-described processing, the conversation classifier 66 that derives a conversation score indicating the unnaturalness of conversation is performed.

なお、上記例では、発話分類器２４が生成される処理と会話分類器６６が生成される処理とを別々の処理として説明したが、これらの処理は一連の処理とされてもよい。 In the above example, the process of generating the utterance classifier 24 and the process of generating the conversation classifier 66 are described as separate processes, but these processes may be a series of processes.

［まとめ］
第１の種別を示すフィードバック発話であっても、直前の会話は不自然であったり、第２の種別を示すフィードバック発話であっても、直前の会話は自然であったりする場合がある。自動応答装置と利用者との会話が自然または不自然であるかは、フィードバック発話の種別が必ずしも示しているわけでなく、別の要因が関係する場合がある。例えば、自動応答装置によって親切な言葉が発話された場合、利用者が第１の種別を示すフィードバック発話を行うことがある。また、例えば、自動応答装置よって利用者を怒らせる発話が行われた場合、利用者は第２の種別を示すフィードバック発話を行うことがある。このため、単純に第１の種別を示すフィードバック発話の直前の会話は自然であり、第２の種別を示すフィードバック発話の直前の会話は不自然であるという判断は適切ではない。 [Summary]
Even if the feedback utterance indicates the first type, the immediately preceding conversation may be unnatural, and even if the feedback utterance indicating the second type may be performed, the immediately previous conversation may be natural. Whether the conversation between the automatic response device and the user is natural or unnatural does not necessarily indicate the type of feedback utterance, and another factor may be involved. For example, when a kind word is uttered by the automatic response device, the user may make a feedback utterance indicating the first type. Further, for example, when an utterance that makes the user angry is made by the automatic response device, the user may make a feedback utterance indicating the second type. Therefore, it is not appropriate to simply judge that the conversation immediately before the feedback utterance indicating the first type is natural and the conversation immediately before the feedback utterance indicating the second type is unnatural.

また、会話において、第１の種別または第２の種別を示すフィードバック発話は頻繁に現れないため、ラベルが付与された会話に対して機械学習の技術を適用しない場合、フィードバック発話が後続しない会話の自然さ、または不自然さを、幅広い範囲で判断することが困難である場合があった。 In addition, since feedback utterances indicating the first type or the second type do not frequently appear in conversations, if the machine learning technique is not applied to the labeled conversations, the feedback utterances that do not follow do not occur. It was sometimes difficult to judge naturalness or unnaturalness in a wide range.

これに対して、本実施形態の情報処理システム１は、会話のログ情報から抽出された、スコア付きのフィードバック発話の直前の会話に対して機械学習を行って、会話分類器６６を生成するため、第１の種別を示すフィードバック発話の直前の会話を自然な会話として、第２の種別を示すフィードバック発話の直前の会話を不自然な会話として単純に認識する手法に比べて、会話の自然さ、または不自然さを、幅広い範囲で判断することができる。このため、この会話分類器６６は、判断対象となる会話のカバー率を向上させることができ、未知の会話に対しても会話の自然さ、または不自然さを判断することができる。 On the other hand, the information processing system 1 of the present embodiment performs machine learning on the conversation immediately before the feedback utterance with the score, which is extracted from the conversation log information, to generate the conversation classifier 66. , The naturalness of the conversation compared to the method of simply recognizing the conversation immediately before the feedback utterance indicating the first type as a natural conversation and the conversation immediately before the feedback utterance indicating the second type as an unnatural conversation. Or, the unnaturalness can be judged in a wide range. For this reason, the conversation classifier 66 can improve the coverage of conversations to be judged, and can judge the naturalness or unnaturalness of conversations even for unknown conversations.

また、本実施形態の情報処理システム１は、発話スコアが第１の種別側または第２の種別側に偏したフィードバック発話の直前の会話に対して機械学習を行って、会話分類器６６を生成する。このため、会話分類器６６は、より精度よく会話が自然または不自然な会話であるかを判断することができる。 Further, the information processing system 1 according to the present embodiment performs the machine learning on the conversation immediately before the feedback utterance in which the utterance score is biased toward the first type side or the second type side to generate the conversation classifier 66. To do. Therefore, the conversation classifier 66 can more accurately determine whether the conversation is natural or unnatural.

また、本実施形態の情報処理システム１は、タスクやドメインに適した会話分類器６６を容易に生成することができる。例えば、比較例のシステムにおいて、タスクやドメインに適した会話分類器６６を生成する場合、そのタスクやドメインにおいて出現した会話のログ情報を収集し、収集した会話に対してラベルが付与する。そして、比較例のシステムは、ラベルが付与された会話に対して機械学習を行って、会話分類器６６を生成する。この場合、人手で、タスクやドメインごとにその都度、会話分類器６６を作成しなければならず、コストが高くなる。 Further, the information processing system 1 of the present embodiment can easily generate the conversation classifier 66 suitable for the task or domain. For example, in the system of the comparative example, when the conversation classifier 66 suitable for a task or domain is generated, log information of the conversation that has appeared in the task or domain is collected and a label is given to the collected conversation. Then, the system of the comparative example performs machine learning on the labeled conversation to generate the conversation classifier 66. In this case, the conversation classifier 66 must be manually created for each task or domain, which increases the cost.

これに対して、本実施形態の情報処理システム１は、ある会話のログ情報に基づいて、発話分類器２４を生成すると、色々なタスクやドメインに対して、その発話分類器２４を適用することにより、容易に会話分類器６６を生成することができる。例えば、情報処理システム１は、対象のタスクやドメインにおいて出現した会話のログ情報から、発話スコアが付与されたフィードバック発話の直前の会話を抽出し、抽出した会話および発話スコアに対して機械学習を行って会話分類器６６を生成することで、対象のタスクやドメインに適合した会話分類器６６を生成することができる。このように、情報処理システム１は、対象とするタスクやドメインにおいて出現した会話に対してラベルが付与されていなくても、発話分類器２４を適用することで、会話分類器６６を生成することができる。すなわち本実施形態の手法では、スコアつきフィードバック発話のデータベースを一旦作ってしまえば、新しいタスクやドメインに取り組むことになっても、そのタスクやドメインの対話ログと、スコア付きフィードバック発話記憶装置５０から自動で、手間ひまかけず、つまり低コストで会話分類器６６を学習できる。 On the other hand, when the information processing system 1 of the present embodiment generates the utterance classifier 24 based on the log information of a certain conversation, the utterance classifier 24 is applied to various tasks and domains. Thus, the conversation classifier 66 can be easily generated. For example, the information processing system 1 extracts the conversation immediately before the feedback utterance with the utterance score from the log information of the conversation that appears in the target task or domain, and performs machine learning on the extracted conversation and utterance score. By performing the process to generate the conversation classifier 66, the conversation classifier 66 suitable for the target task or domain can be generated. As described above, the information processing system 1 applies the utterance classifier 24 to generate the conversation classifier 66 even if no label is attached to the conversation that appears in the target task or domain. You can That is, according to the method of the present embodiment, once a database of scored feedback utterances is created, even if a new task or domain is tackled, a dialogue log of the task or domain and the scored feedback utterance storage device 50 are used. The conversation classifier 66 can be learned automatically and without any trouble, that is, at low cost.

なお、上述した実施形態では、会話分類器６６が、会話の不自然さを示す指標を導出するものとして説明したが、「不自然さ」を別の特性に置換しても構わない。例えば、フィードバック発話の直前の会話が所定の種別である度合を示す指標が導出されてもよい。例えば、フィードバック発話の直前の会話が、利用者にとって有益である度合を示す指標や、利用者の気分を向上させる会話である度合を示す指標等が導出されてもよい。これらの場合、フィードバック発話に対して、第１の種別を示すラベルまたは第２の種別を示すラベルに代えて、指標の種類に応じたラベルが付与され、フィードバック発話の直前の会話に対して、自然ラベルまたは不自然ラベルに代えて、指標の種類に応じたラベルが付与される。 In the above-described embodiment, the conversation classifier 66 is described as deriving the index indicating the unnaturalness of the conversation, but the “unnaturalness” may be replaced with another characteristic. For example, an index indicating the degree to which the conversation immediately before the feedback utterance is of a predetermined type may be derived. For example, an index indicating the degree to which the conversation immediately before the feedback utterance is beneficial to the user, an index indicating the degree to which the conversation improves the mood of the user, and the like may be derived. In these cases, instead of the label indicating the first type or the label indicating the second type for the feedback utterance, a label according to the type of the index is given, and for the conversation immediately before the feedback utterance, Instead of the natural label or the unnatural label, a label according to the type of index is given.

また、上述した実施形態では、会話分類器６６は、会話が２種類の種別のうち一方の種別（例えば第２の種別）に該当する確率を導出する例について説明したが、これに代えて会話が３種類以上の種別のうち、いずれの種別であるかを示す確率を導出してもよい。この場合、例えば、３種類以上の会話の種別を示すラベルが用意される。例えば、第１の種別および第２の種別を示すラベルに加え、中立な会話を示す第３の種別を示すラベルが用意される場合について考える。この場合、発話学習データ記憶装置１４に記憶されたフィードバック発話の直前の会話には、第１の種別〜第３の種別を示すラベルが付与される。そして、情報処理システム１は、第１の種別〜第３の種別と、フィードバック発話との関係を学習する。また、例えば、情報処理システム１は、対応情報から、発話スコアが予め設定された自然な会話、不自然な会話、および中立な会話を示す範囲に含まれるスコアを有するフィードバック発話を自動的に抽出する。そして、情報処理システム１が、抽出したフィードバック発話の直前の会話と、会話の種別を示すラベルとの関係を学習することで、会話分類器６６を生成する。 In the above-described embodiment, the conversation classifier 66 has described an example of deriving the probability that the conversation corresponds to one of the two types (for example, the second type). However, instead of this, the conversation is classified. It is also possible to derive a probability that indicates which type among three or more types. In this case, for example, labels indicating three or more types of conversation are prepared. For example, consider a case where, in addition to the labels indicating the first type and the second type, a label indicating the third type indicating neutral conversation is prepared. In this case, the labels immediately before the feedback utterance stored in the utterance learning data storage device 14 are labeled with the first to third types. Then, the information processing system 1 learns the relationship between the first type to the third type and the feedback utterance. In addition, for example, the information processing system 1 automatically extracts, from the correspondence information, a feedback utterance having a score included in a range indicating a natural conversation, an unnatural conversation, and a neutral conversation having a preset utterance score. To do. Then, the information processing system 1 learns the relationship between the extracted conversation immediately before the feedback utterance and the label indicating the type of conversation, thereby generating the conversation classifier 66.

［比較例１、２］
図１２は、比較例１および比較例２の機能構成を示す図である。図１２の上図に示す比較例１は、人手で作成したデータを使った教師あり学習に基づく手法である。比較例１では、学習部１００が発話学習データ記憶装置１４に記憶された情報を機械学習し、学習結果によって、会話分類器１０２が生成されたものである。発話学習データ記憶装置１４に記憶された情報とは、自然ラベルまたは不自然ラベルが付与されたフィードバック発話の直前の会話である。 [Comparative Examples 1 and 2]
FIG. 12 is a diagram showing the functional configurations of Comparative Example 1 and Comparative Example 2. Comparative Example 1 shown in the upper diagram of FIG. 12 is a method based on supervised learning that uses data created manually. In Comparative Example 1, the learning unit 100 machine-learns the information stored in the utterance learning data storage device 14, and the conversation classifier 102 is generated based on the learning result. The information stored in the utterance learning data storage device 14 is the conversation immediately before the feedback utterance to which the natural label or the unnatural label is given.

図１２の下図に示す比較例２は、会話に対して、第１の種別を示すフィードバック発話と第２の種別を示すフィードバック発話とのうち、どちらが多く後続するかでスコアが付与されるものである。比較例２では、発話分類器２４および会話分類器６６は用いられない。 In Comparative Example 2 shown in the lower diagram of FIG. 12, a score is given to a conversation depending on which one of the feedback utterance indicating the first type and the feedback utterance indicating the second type follows most. is there. In Comparative Example 2, the speech classifier 24 and the conversation classifier 66 are not used.

比較例２では、スコア導出部１１０が、フィードバック発話記憶装置１２に記憶された情報（スコアが付与されていないフィードバック発話）と、会話ログ記憶装置４０に記憶されたログ情報とに基づいて、会話に対してスコアを導出する。例えば、スコア（Ｓｃоｒｅ）は、下記の式（１）によって導出される。｜ＮＥＧ｜は、ログ情報の着目した会話に後続する第２の種別を示すフィードバック発話の数である。｜ＰＯＳ｜は、ログ情報の着目した会話に後続する第１の種別を示すフィードバック発話の数である。
Ｓｃоｒｅ＝｜ＮＥＧ｜−｜ＰＯＳ｜…（１） In Comparative Example 2, the score derivation unit 110 performs a conversation based on the information stored in the feedback utterance storage device 12 (feedback utterance without a score) and the log information stored in the conversation log storage device 40. Derive a score for. For example, the score (Score) is derived by the following equation (1). |NEG| is the number of feedback utterances indicating the second type following the conversation in which the log information is focused. |POS| is the number of feedback utterances indicating the first type following the conversation of interest in the log information.
Score=|NEG|-|POS|... (1)

［比較例１，２との比較］
図１３は、情報処理システム１、比較例１、および比較例２の処理結果の一例を示す図である。図中の縦軸は適合率を示し、横軸は再現率を示している。適合率は、情報処理システムが不自然な会話であると判定した結果の中にどの程度正解（不自然な会話）が含まれるかを示す指標である。この場合において、会話スコアが閾値以上である場合に、不自然な会話であると判定した。正解（不自然な会話である）ラベルは、人によって付与されたものである。再現率は、正解のうち情報処理システム１が不自然な会話であると判定した度合を示す指標である。ＡＵＣ（Area Under the Curve）は、グラフの曲線より下の部分の面積である。 [Comparison with Comparative Examples 1 and 2]
FIG. 13 is a diagram illustrating an example of processing results of the information processing system 1, the comparative example 1, and the comparative example 2. The vertical axis in the figure represents the precision and the horizontal axis represents the recall. The precision is an index indicating how much the correct answer (unnatural conversation) is included in the result of the information processing system determining that the conversation is unnatural. In this case, when the conversation score was equal to or higher than the threshold, it was determined that the conversation was unnatural. The correct answer (unnatural conversation) label is given by a person. The recall rate is an index indicating the degree of determination that the information processing system 1 determines that the conversation is unnatural among the correct answers. AUC (Area Under the Curve) is the area under the curve in the graph.

図示するように、本実施形態の情報処理システム１は、比較例１と同等、または比較例１以上の性能を有する。より具体的には、比較例１の手法は、会話分類器の学習データをタスクごと、ドメインごとに人手で作成しなくてはならないという高コスト手法であるが、本実施形態の手法はタスクやドメインに依存しない低コストな手法であるにも関わらず、比較例１と同等の性能を示している。また、本実施形態の情報処理システム１は、フィードバック発話が曖昧であり、フィードバック発話が低頻度であることを考慮していない比較例２に比して、顕著な性能を有する。 As illustrated, the information processing system 1 of the present embodiment has the same performance as that of the comparative example 1 or the performance of the comparative example 1 or more. More specifically, the method of Comparative Example 1 is a high-cost method in which the learning data of the conversation classifier must be manually created for each task and each domain. Although it is a low-cost method that does not depend on the domain, it exhibits the same performance as Comparative Example 1. Further, the information processing system 1 of the present embodiment has remarkable performance as compared with Comparative Example 2 in which the feedback utterance is ambiguous and the feedback utterance is infrequent.

以下、情報処理システム１を変形させた、変形例１の情報処理システム１Ａ、変形例２の情報処理システム１Ｂ、および変形例３の情報処理システム１Ｃについて説明する。 Hereinafter, an information processing system 1A of modification 1, an information processing system 1B of modification 2, and an information processing system 1C of modification 3 which are modifications of the information processing system 1 will be described.

［変形例１］
変形例１は、発話学習データ記憶装置１４に記憶された、自然ラベルまたは不自然ラベルが付与された自動応答装置と利用者との会話を、更に会話分類器生成部６４に学習させた例である。図１４は、変形例１の情報処理システム１Ａの機能構成の一例を示す図である。 [Modification 1]
Modification 1 is an example in which the conversation classifier generation unit 64 is made to further learn the conversation between the user and the automatic response device to which the natural label or the unnatural label is stored, which is stored in the utterance learning data storage device 14. is there. FIG. 14 is a diagram illustrating an example of a functional configuration of the information processing system 1A of the first modification.

［変形例２］
変形例２は、発話分類器２４を省略した例である。この場合、情報処理システム１Ｂにおいて、スコア付きフィードバック発話記憶装置５０に代えて、フィードバック発話記憶装置１２が設けられる。図１５は、変形例２の情報処理システム１Ｂの機能構成の一例を示す図である。情報処理システム１Ｂの会話分類器生成部６４は、上述した式（１）を用いて自然な会話である確率が高い会話候補と、不自然な会話である確率が高い会話候補とを導出する。 [Modification 2]
Modification 2 is an example in which the utterance classifier 24 is omitted. In this case, in the information processing system 1B, the feedback utterance storage device 12 is provided instead of the scored feedback utterance storage device 50. FIG. 15 is a diagram showing an example of the functional configuration of the information processing system 1B of the second modification. The conversation classifier generation unit 64 of the information processing system 1B derives a conversation candidate having a high probability of being a natural conversation and a conversation candidate having a high probability of being an unnatural conversation by using the above-described formula (1).

情報処理システム１Ｂは、例えば、スコアが所定の範囲内である会話を自然な会話である確率が高い会話候補とし、スコアが所定の範囲とは異なる範囲内である会話を不自然な会話である確率が高い会話候補とする。 The information processing system 1B sets, for example, a conversation with a score within a predetermined range as a conversation candidate with a high probability of being a natural conversation, and a conversation with a score within a range different from the predetermined range is an unnatural conversation. Make it a conversation candidate with a high probability.

［変形例３］
図１６は、変形例３の情報処理システム１Ｃの機能構成の一例を示す図である。変形例３は、学習データ生成部６３および会話学習データ記憶装置５２が省略され、情報処理システム１の会話分類器６６に代えて、スコア導出部１２０を備えたものである。抽出部６２が、スコア付きフィードバック発話記憶装置５０に記憶されたフィードバック発話のうち、スコアが第１の範囲（例えば最小値から２０や３０パーセント）および第２の範囲（例えば最大値から２０や３０パーセント内）のスコアを有するフィードバック発話を抽出する。スコア導出部１２０は、抽出部６２により抽出されたフィードバック発話を用いてスコアを導出する。具体的には、スコア導出部は、上述した式（１）を用いてスコアを導出する。 [Modification 3]
FIG. 16 is a diagram illustrating an example of a functional configuration of the information processing system 1C of Modification 3. In the third modification, the learning data generation unit 63 and the conversation learning data storage device 52 are omitted, and the score derivation unit 120 is provided instead of the conversation classifier 66 of the information processing system 1. Of the feedback utterances stored in the scored feedback utterance storage device 50, the extraction unit 62 has a score in a first range (for example, 20 or 30% from the minimum value) and a second range (for example, 20 or 30 from the maximum value). Extract feedback utterances with a score (in percent). The score derivation unit 120 derives a score using the feedback utterance extracted by the extraction unit 62. Specifically, the score derivation unit derives the score using the above-mentioned formula (1).

［変形例との比較］
図１７は、情報処理システム１、変形例１、変形例２、および変形例３の処理結果の一例を示す図である。図１３と同様の説明については省略する。 [Comparison with the modification]
FIG. 17 is a diagram illustrating an example of processing results of the information processing system 1, the first modification, the second modification, and the third modification. Descriptions similar to those in FIG. 13 are omitted.

図１７に示すように、情報処理システム１、変形例１、および変形例２は、会話分類器６６を有していない変形例３に比して、性能が高い。情報処理システム１、および変形例１は、発話分類器２４を有していない変形例２に比して、性能が高い。すなわち会話分類器６６が本実施形態の情報処理システム１の性能に大きく寄与していることが実験から明らかになった。なお、変形例１は、情報処理システム１に比して性能がやや高い。 As shown in FIG. 17, the information processing system 1, the first modification, and the second modification have higher performance than the third modification that does not have the conversation classifier 66. The information processing system 1 and the modified example 1 have higher performance than the modified example 2 which does not have the utterance classifier 24. That is, it has been clarified from the experiment that the conversation classifier 66 greatly contributes to the performance of the information processing system 1 of this embodiment. The modified example 1 has slightly higher performance than the information processing system 1.

以上説明した実施形態によれば、情報処理システム１は、会話の集合から、第１の発話主体により発せられた発話に対する第２の発話主体の所定の反応を示していると推定されるフィードバック発話の直前の会話を、フィードバック発話に付与されている発話スコアに基づいて抽出する抽出部６２と、抽出部６２により抽出された直前の会話に基づいて、未知の会話の種別を示す指標を導出する会話分類器６６を生成する会話分類器生成部６４と、を備えることにより、未知の会話が所定の種別であるかを判定する手がかりを自動的に取得することができる。 According to the embodiment described above, the information processing system 1 estimates the feedback utterance that is estimated to show the predetermined reaction of the second utterer to the utterance uttered by the first utterer, from the set of conversations. The extraction unit 62 that extracts the immediately preceding conversation based on the utterance score given to the feedback utterance, and the index that indicates the type of the unknown conversation based on the immediately previous conversation extracted by the extraction unit 62. By providing the conversation classifier generation unit 64 that generates the conversation classifier 66, it is possible to automatically obtain a clue to determine whether the unknown conversation is of a predetermined type.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the embodiments for carrying out the present invention have been described above using the embodiments, the present invention is not limited to these embodiments, and various modifications and substitutions are made without departing from the scope of the present invention. Can be added.

１‥情報処理システム、２０‥取得部、２２‥発話分類器生成部、２４‥発話分類器、５０‥スコア付きフィードバック発話記憶装置、６２‥抽出部、６４‥会話分類器生成部、６６‥会話分類器 1... Information processing system, 20... Acquisition unit, 22... Utterance classifier generation unit, 24... Utterance classifier, 50... Scored feedback utterance storage device, 62... Extraction unit, 64... Conversation classifier generation unit, 66... Conversation Classifier

Claims

An acquisition unit for acquiring a feedback utterance estimated to exhibit a predetermined reaction of the second utterance subject to the utterance uttered by the first utterer subject;
Immediately before the feedback utterance acquired by the acquisition unit, a pre-stage generation unit that generates a pre-stage classifier that derives an index estimated that a predetermined type of conversation appears.
An information processing apparatus including.

Further comprising a pre-stage classifier generated by the pre-stage generation unit,
The information processing apparatus according to claim 1.

The acquisition unit acquires the feedback utterance, a conversation immediately before the feedback utterance assigned a first type, and a conversation immediately before the feedback utterance assigned a second type,
The pre-stage generation unit generates a pre-stage classifier based on the information acquired by the acquisition unit,
The information processing apparatus according to claim 2.

The pre-stage generation unit performs machine learning to generate the pre-stage classifier by using a label indicating the type given to the feedback utterance acquired by the acquisition unit and the immediately preceding conversation as learning data.
The information processing device according to claim 3.

Computer
Acquiring a feedback utterance estimated to show a predetermined reaction of the second utterance subject to the utterance uttered by the first utterer subject,
Immediately before the acquired feedback utterance, a pre-stage classifier that derives an index estimated that a predetermined type of conversation appears is generated.
Information processing method.

On the computer,
A feedback utterance presumed to exhibit a predetermined reaction of the second utterer to the utterance uttered by the first utterer is acquired,
Immediately before the obtained feedback utterance, a pre-stage classifier that derives an index estimated that a predetermined type of conversation appears is generated.
program.