JP6570226B2

JP6570226B2 - Response generation apparatus, response generation method, and response generation program

Info

Publication number: JP6570226B2
Application number: JP2014167794A
Authority: JP
Inventors: 香里谷尾; 北岸　郁雄; 郁雄北岸
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2014-08-20
Filing date: 2014-08-20
Publication date: 2019-09-04
Anticipated expiration: 2034-08-20
Also published as: JP2016045584A

Description

本発明は、応答生成装置、応答生成方法及び応答生成プログラムに関する。 The present invention relates to a response generation device, a response generation method, and a response generation program.

従来、ユーザ端末からメッセージを受け付けたことに応じて、このメッセージに対応するメッセージをユーザ端末に出力する対話エージェントシステムが知られている。このような対話エージェントシステムにおいて、ユーザ端末から受け付けたメッセージに対応するメッセージの出力を行ったり、このユーザ端末のユーザに適した広告を含むメッセージの出力を行ったりする技術が提供されている。また、対話のコンテンツやコンテキストに基づいて、ユーザに提供する広告を決定する技術が提案されている。 2. Description of the Related Art Conventionally, there is known an interactive agent system that outputs a message corresponding to a message to the user terminal in response to receiving the message from the user terminal. In such an interactive agent system, there is provided a technique for outputting a message corresponding to a message received from a user terminal or outputting a message including an advertisement suitable for the user of the user terminal. In addition, a technique for determining an advertisement to be provided to a user based on the content and context of a dialog has been proposed.

特表２００６−５００６９９号広報Special table 2006-500699 public information 特表２００１−５２５９５１号広報Special table 2001-525951

しかしながら、上記の従来技術では、ユーザと適切な会話を行うことができない場合がある。例えば、上記の従来技術では、ユーザ端末から受け付けたメッセージに対応する応答や広告を、あらかじめ登録された応答や広告から選択してそのまま出力するので、会話が不自然になったり、広告効果が最適にならない場合がある。 However, with the above-described conventional technology, there are cases where an appropriate conversation cannot be performed with the user. For example, in the above prior art, a response or advertisement corresponding to a message received from a user terminal is selected from a pre-registered response or advertisement and output as it is, so that the conversation becomes unnatural or the advertising effect is optimal. It may not be.

本願は、上記に鑑みてなされたものであって、ユーザと適切な会話を行うことができる応答生成装置を提供することを目的とする。 The present application has been made in view of the above, and an object of the present invention is to provide a response generation device capable of performing an appropriate conversation with a user.

本願にかかる、応答生成装置は、対話エージェントシステムとユーザとの会話の特徴に関する特徴情報を特定する特定部と、前記特定部によって特定された特徴情報に応じて応答メッセージの内容を変形する変形部と、前記変形部によって内容が変形された応答メッセージを出力するよう制御する出力制御部と、を備えたことを特徴とする。 A response generation apparatus according to the present application includes a specifying unit that specifies feature information related to a feature of a conversation between a dialog agent system and a user, and a deforming unit that transforms the content of a response message according to the feature information specified by the specifying unit And an output control unit that controls to output a response message whose contents have been transformed by the transformation unit.

実施形態の一態様によれば、ユーザと適切な会話を行うことができる効果を奏する。 According to one aspect of the embodiment, there is an effect that an appropriate conversation can be performed with the user.

図１は、実施形態にかかる応答生成処理の一例を示す図である。FIG. 1 is a diagram illustrating an example of a response generation process according to the embodiment. 図２は、実施形態にかかる応答生成システムの構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of the response generation system according to the embodiment. 図３は、実施形態にかかる広告入札装置の構成例を示す図である。FIG. 3 is a diagram illustrating a configuration example of the advertising bid device according to the embodiment. 図４は、実施形態にかかる広告情報記憶部の一例を示す図である。FIG. 4 is a diagram illustrating an example of the advertisement information storage unit according to the embodiment. 図５は、実施形態にかかる応答生成装置の構成例を示す図である。FIG. 5 is a diagram illustrating a configuration example of the response generation device according to the embodiment. 図６は、実施形態にかかる判定情報記憶部の一例を示す図である。FIG. 6 is a diagram illustrating an example of the determination information storage unit according to the embodiment. 図７は、判定情報記憶部に記憶されるツリー構造の模式図である。FIG. 7 is a schematic diagram of a tree structure stored in the determination information storage unit. 図８は、実施形態にかかる応答生成装置による応答生成処理手順を示すシーケンス図である。FIG. 8 is a sequence diagram illustrating a response generation processing procedure performed by the response generation apparatus according to the embodiment. 図９は、変形例にかかる判定ツリーの一例を示す図である。FIG. 9 is a diagram illustrating an example of a determination tree according to a modification. 図１０は、変形例にかかる判定ツリーの一例を示す図である。FIG. 10 is a diagram illustrating an example of a determination tree according to the modification. 図１１は、応答生成装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 11 is a hardware configuration diagram illustrating an example of a computer that implements the function of the response generation device.

以下に、本願にかかる応答生成装置、応答生成方法及び応答生成プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願にかかる応答生成装置、応答生成方法及び応答生成プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, a mode for executing a response generation device, a response generation method, and a response generation program according to the present application (hereinafter referred to as “embodiment”) will be described in detail with reference to the drawings. Note that the response generation device, the response generation method, and the response generation program according to the present application are not limited by this embodiment. In the following embodiments, the same portions are denoted by the same reference numerals, and redundant description is omitted.

〔１．応答生成処理〕
まず、図１を用いて、実施形態にかかる応答生成処理の一例について説明する。図１は、実施形態にかかる応答生成処理の一例を示す図である。図１では、応答生成装置１００によって応答生成処理が行われる例を示す。 [1. Response generation process)
First, an example of a response generation process according to the embodiment will be described with reference to FIG. FIG. 1 is a diagram illustrating an example of a response generation process according to the embodiment. FIG. 1 illustrates an example in which a response generation process is performed by the response generation device 100.

ユーザ端末１０は、ユーザによって利用される端末装置である。ユーザ端末１０は、例えば、スマートフォンなどの携帯電話機や、タブレット端末や、ＰＤＡ（Personal Digital Assistant）や、デスクトップ型ＰＣ（Personal Computer）や、ノート型ＰＣ等である。なお、ユーザ端末１０は、会話を行うロボットや、ロボットが有する情報処理装置、その他ロボットに内蔵される任意の装置に適用することができる。 The user terminal 10 is a terminal device used by a user. The user terminal 10 is, for example, a mobile phone such as a smartphone, a tablet terminal, a PDA (Personal Digital Assistant), a desktop PC (Personal Computer), a notebook PC, or the like. Note that the user terminal 10 can be applied to a robot that performs a conversation, an information processing apparatus included in the robot, and any other apparatus built in the robot.

音声認識装置２０は、ユーザ端末１０から受信した入力メッセージ、すなわち、発話の音声データをテキストデータに変換する。また、音声認識装置２０は、ユーザ端末１０から受信した音声データを解析し、音の特徴情報としてユーザに関する特性を特定する。具体的には、音声認識装置２０は、音声データ解析により音声波形を生成する。そして、音声認識装置２０は、生成した音声波形のピーク形状や周波数特性等に基づいて、音声を入力したユーザ特性を特定する。 The speech recognition device 20 converts the input message received from the user terminal 10, that is, speech speech data into text data. In addition, the voice recognition device 20 analyzes voice data received from the user terminal 10 and identifies characteristics related to the user as sound characteristic information. Specifically, the voice recognition device 20 generates a voice waveform by voice data analysis. Then, the voice recognition device 20 specifies the user characteristics that input the voice based on the peak shape, frequency characteristics, and the like of the generated voice waveform.

なお、音の特徴情報としてのユーザ特性とは、例えば、ユーザの年齢、性別、感情、方言、体調といったユーザ属性に関する情報である。例えば、音声認識装置２０は、音声波形に基づいて、音のテンポや単語と単語との間隔が所定値より短ければ「せっかち」、長ければ「おっとり」といった性格を判定することができる。また、音声認識装置２０は、音声波形における振動の幅、ピーク形状、周波数から、年齢・性別・方言を判定することができる。なお、音声認識装置２０は、上述した処理以外にも、任意の手法を用いて、音声からユーザ特性を特定してよい。 Note that the user characteristics as the sound feature information is, for example, information related to user attributes such as the user's age, sex, emotion, dialect, physical condition. For example, the speech recognition apparatus 20 can determine a character such as “impatient” if the sound tempo or the interval between words is shorter than a predetermined value, and “occupy” if it is longer, based on the speech waveform. Further, the speech recognition device 20 can determine the age, sex, and dialect from the vibration width, peak shape, and frequency in the speech waveform. Note that the speech recognition apparatus 20 may specify user characteristics from speech using any method other than the processing described above.

広告主端末３０は、広告主によって利用される端末装置である。広告主端末３０は、例えば、スマートフォンなどの携帯電話機や、タブレット端末や、ＰＤＡや、デスクトップ型ＰＣや、ノート型ＰＣ等である。また、広告主端末３０は、広告主から受け付けた広告情報を広告入札装置４０へ送信する。 The advertiser terminal 30 is a terminal device used by the advertiser. The advertiser terminal 30 is, for example, a mobile phone such as a smartphone, a tablet terminal, a PDA, a desktop PC, a notebook PC, or the like. Further, the advertiser terminal 30 transmits the advertisement information received from the advertiser to the advertisement bidding device 40.

広告入札装置４０は、入札用画面を広告主端末３０に提示する。また、広告入札装置４０は、広告主端末３０から受け付けた広告情報を所定の記憶部に記憶する。なお、広告情報には、広告の内容を特徴づける広告キーワードや、応答メッセージとして出力される広告データ等が含まれる。広告データとは、例えば、テキスト形式の広告文に該当する。 The advertisement bidding device 40 presents a bid screen to the advertiser terminal 30. Further, the advertising bidding device 40 stores the advertising information received from the advertiser terminal 30 in a predetermined storage unit. Note that the advertisement information includes advertisement keywords that characterize the contents of advertisements, advertisement data that is output as response messages, and the like. The advertisement data corresponds to, for example, an advertisement text in a text format.

応答生成装置１００は、ユーザの発話である入力メッセージに対し、予め設定された判定情報に従って応答メッセージを出力することにより会話を実現する。以下、実施形態では、かかる判定情報は、入力メッセージ及び応答メッセージに対応する各ノードから成るツリー構造（以下、「判定ツリー」と表記する場合がある）であるものとする。 The response generation device 100 realizes a conversation by outputting a response message according to preset determination information in response to an input message that is a user's utterance. Hereinafter, in the embodiment, it is assumed that the determination information has a tree structure (hereinafter, sometimes referred to as “determination tree”) composed of nodes corresponding to the input message and the response message.

なお、図１に示す判定ツリーおいて、破線ブロックは、検出ノードを示しており、実線ブロックは、動作ノードを示している。検出ノードは、ユーザからの入力メッセージに対応し、動作ノードは応答メッセージに対応する。 In the determination tree shown in FIG. 1, broken line blocks indicate detection nodes, and solid line blocks indicate operation nodes. The detection node corresponds to an input message from the user, and the operation node corresponds to a response message.

そして、実施形態における応答生成装置１００は、会話の特徴に関する特徴情報として、会話の進め方の傾向、または、音の特徴情報に関するユーザ特性を特定し、特定したユーザ特性に応じて応答メッセージの内容を変形する。そして、応答生成装置１００は、内容を変形した応答メッセージを出力するよう出力制御する。これにより、応答生成装置１００は、よりユーザの特性に応じた会話を実現する。なお、上述したように、音の特徴情報としてのユーザ特性は、音声認識装置２０によって特定されるため、応答生成装置１００は、会話の進め方の傾向に関するユーザ特性を、任意の技術を用いて特定する。そして、応答生成装置１００は、特定されたいずれかのユーザ特性に基づいて、変形処理を行う。また、応答生成装置１００は、複数のユーザ特性を用いて変形処理を行ってもよい。 Then, the response generation apparatus 100 according to the embodiment specifies the tendency of the conversation or the user characteristic related to the sound characteristic information as the characteristic information related to the characteristic of the conversation, and determines the content of the response message according to the specified user characteristic. Deform. And the response production | generation apparatus 100 carries out output control so that the response message which changed the content may be output. Thereby, the response generation device 100 realizes a conversation more according to the characteristics of the user. As described above, since the user characteristic as the sound feature information is specified by the speech recognition apparatus 20, the response generation apparatus 100 specifies the user characteristic related to the tendency of the conversation using any technique. To do. Then, the response generation device 100 performs a deformation process based on any one of the specified user characteristics. Further, the response generation device 100 may perform the deformation process using a plurality of user characteristics.

例えば、応答生成装置１００は、会話の進め方の傾向として、ユーザとの一連の会話の中で使用した所定のノードの使用回数や、使用したノードの総数等に基づいて、ユーザ特性を特定する。具体的には、応答生成装置１００は、所定のノードの使用回数や、会話が終了するまでに使用したノード総数に所定の閾値を設定する。そして、応答生成装置１００は、各ノードの使用回数や、広告情報を出力するまでに使用したノード総数が所定の閾値よりも少ない場合は、ユーザ特性として、「話題の切り替えを好まない傾向（話題の無駄が生じない傾向）→無駄話嫌い」と特定し、多い場合は、「話題の切り替えを好む傾向（話題の無駄が生じる傾向）→無駄話好き」と特定する。 For example, the response generation apparatus 100 identifies the user characteristics based on the number of times a predetermined node is used in a series of conversations with the user, the total number of nodes used, and the like as the tendency of the conversation to proceed. Specifically, the response generation apparatus 100 sets a predetermined threshold value for the number of times a predetermined node is used and the total number of nodes used until the conversation ends. When the number of times each node is used or the total number of nodes used until the advertisement information is output is less than a predetermined threshold, the response generation apparatus 100 indicates that the user characteristic is “dislike topic switching (topic If there are many, specify “a tendency to prefer topic switching (a tendency for topic waste to occur) → a favorite talk story”.

ここでは、応答生成装置１００は、音声認識装置２０によって特定されたユーザ特性を用いるものとする。具体的には、応答生成装置１００は、音声認識装置２０によって特定されたユーザＵ０１のユーザ特性として、「大阪弁」を用いるものとする。そして、以下では、図１を用いて、応答生成装置１００が、ユーザ特性「大阪弁」に基づいて、応答メッセージの内容を変形する例について説明する。 Here, it is assumed that the response generation device 100 uses the user characteristics specified by the voice recognition device 20. Specifically, the response generation device 100 uses “Osaka dialect” as the user characteristic of the user U01 specified by the voice recognition device 20. In the following, an example in which the response generation device 100 changes the content of the response message based on the user characteristic “Osaka dialect” will be described with reference to FIG. 1.

まず、ユーザ端末１０は、その所有者であるユーザＵ０１からメッセージの入力を受け付けたとすると（ステップＳ１１）、その音声データを音声認識装置２０へ送信する（ステップＳ１２）。 First, if the user terminal 10 receives an input of a message from the owner user U01 (step S11), the user terminal 10 transmits the voice data to the voice recognition device 20 (step S12).

音声認識装置２０は、ユーザ端末１０から受信した発話の音声データをテキストデータに変換すると共に、ユーザ端末１０から受け付けた音声データを解析し、音声データに含まれる音の特徴を示す特徴情報として、ユーザ特性を特定する（ステップＳ１３）。 The voice recognition device 20 converts the voice data of the utterance received from the user terminal 10 into text data, analyzes the voice data received from the user terminal 10, and as feature information indicating the characteristics of the sound included in the voice data, User characteristics are specified (step S13).

そして、音声認識装置２０は、テキストデータと、特定したユーザ特性「大阪弁」をユーザ端末１０へ送信する（ステップＳ１４）。ユーザ端末１０は、受信したテキストデータと、ユーザ特性「大阪弁」を応答生成装置１００へ送信する（ステップＳ１５）。 Then, the voice recognition device 20 transmits the text data and the identified user characteristic “Osaka dialect” to the user terminal 10 (step S14). The user terminal 10 transmits the received text data and the user characteristic “Osaka dialect” to the response generation device 100 (step S15).

そして、応答生成装置１００は、出力する応答メッセージのデータを取得する（ステップＳ１６）。ちなみに、応答生成装置１００は、音声テキストを受け付けた場合に、かかる音声テキストに含まれているキーワードを有する検出ノードを判定し、判定した検出ノードと接続された動作ノードに対応する応答メッセージのデータを取得する。このように、応答生成装置１００は、検出ノードと動作ノードを使用してユーザとの会話を実現する。 Then, the response generation device 100 acquires data of the response message to be output (Step S16). Incidentally, when the response generation apparatus 100 receives a speech text, the response generation device 100 determines a detection node having a keyword included in the speech text, and data of a response message corresponding to an operation node connected to the determined detection node. To get. In this way, the response generation device 100 realizes a conversation with the user using the detection node and the operation node.

ここで、応答生成装置１００は、判定した検出ノードを使用するたびに広告情報を広告入札装置４０から検索し、検索の結果、広告情報が存在する場合には、その広告情報に含まれる広告データを、動作ノードとして登録されている応答メッセージに変わる応答メッセージとして取得する。なお、応答生成装置１００は、判定した検出ノードに広告検索を行う旨が登録されている場合に、広告情報を検索してもよい。 Here, every time the determined detection node is used, the response generation device 100 searches the advertisement information from the advertisement bidding device 40. If the advertisement information exists as a result of the search, the advertisement data included in the advertisement information is retrieved. Is acquired as a response message that changes to a response message registered as an operation node. Note that the response generation apparatus 100 may search for advertisement information when the fact that an advertisement search is performed is registered in the determined detection node.

なお、広告情報は、応答生成装置１００が所定の検出ノードを使用（音声テキストの受信に該当）した場合に、その検出ノードに対応する動作ノードとして使用（応答メッセージの出力）されるよう、予め広告主によって、かかる検出ノードを識別するためのノードＩＤ（以下、「検出ノードＩＤ」と表記する場合がある）が登録されている。よって、応答生成装置１００は、音声テキストを受信することにより検出ノードを使用するたびに、かかる検出ノードＩＤに対応する広告情報を検索するよう広告入札装置４０に要求し、広告入札装置４０によって検索された広告情報に含まれる広告データを取得することになる。なお、検出ノードに対応する広告情報が登録されていない場合等には、応答生成装置１００は、判定ツリーに登録されている通常の応答メッセージを取得する。なお、通常の応答メッセージとは、広告情報以外の応答メッセージのことである。 In addition, when the response generation device 100 uses a predetermined detection node (corresponding to reception of voice text), the advertisement information is previously used as an operation node corresponding to the detection node (output of a response message). The advertiser registers a node ID for identifying such a detection node (hereinafter sometimes referred to as “detection node ID”). Therefore, every time the detection node is used by receiving the voice text, the response generation apparatus 100 requests the advertisement bidding apparatus 40 to search for advertisement information corresponding to the detection node ID, and the advertisement bidding apparatus 40 performs the search. Advertisement data included in the advertisement information thus obtained is acquired. When the advertisement information corresponding to the detection node is not registered, the response generation device 100 acquires a normal response message registered in the determination tree. Note that a normal response message is a response message other than advertisement information.

そして、応答生成装置１００は、取得した応答メッセージが、判定ツリーに登録されている通常の応答メッセージであっても、また、広告主によって登録されている広告データのいずれであっても、その内容をユーザ特性「大阪弁」に合わせて、「大阪弁」に変形する（ステップＳ１７）。 Then, the response generation apparatus 100 determines whether the acquired response message is a normal response message registered in the determination tree or any of the advertisement data registered by the advertiser. Is transformed to “Osaka dialect” in accordance with the user characteristic “Osaka dialect” (step S17).

例えば、図１に示す判定ツリーにおいて、太線で示す曲線Ｋ１の流れで会話が行われたとする。また、判定ツリーには、応答メッセージが標準語で登録されているとする。ここで、従来の応答サーバは、「食べること好きや」といった「大阪弁」でメッセージが入力されても、標準語の応答メッセージ「何が好きですか？」を変形せずに出力するので、会話が不自然になってしまう。また、かかる応答メッセージに対して、「キーマカレーが好きやわ」といった「大阪弁」でメッセージが入力されても、従来の応答サーバは、標準語の応答メッセージ「Ａ店のキーマカレーすごくおいしいです」を変形せずに出力するので、会話が不自然になる結果、広告効果が悪化してしまう。 For example, in the determination tree shown in FIG. 1, it is assumed that a conversation is performed according to the flow of a curve K1 indicated by a bold line. Further, it is assumed that the response message is registered in a standard language in the determination tree. Here, even if the message is input in “Osaka dialect” such as “I like to eat”, the conventional response server outputs the standard response message “What do you like?” Without transformation, The conversation becomes unnatural. In addition, even if the message is entered in Osaka dialect such as “I like Kema curry,” the response server in the standard language “Respondent's Kema curry is very delicious”. Is output without being transformed, resulting in an unnatural conversation, resulting in a worse advertising effect.

一方、応答生成装置１００は、変形処理を行う場合には、「食べること好きや」といった「大阪弁」の入力メッセージに応じて、標準語の応答メッセージ「何が好きですか？」を、「何が好きなん？」と「大阪弁」に変形する。また、続く、「キーマカレーが好きやわ」といった「大阪弁」の入力メッセージに応じて、標準語の応答メッセージ「Ａ店のキーマカレーすごくおいしいです」を、「Ａ店のキーマカレーめっちゃうまいで！」に変形する。 On the other hand, when performing the deformation process, the response generation apparatus 100 determines the response message “What do you like?” In response to the input message “Osaka dialect” such as “I like to eat”. “What do you like?” And “Osaka dialect”. Also, in response to the input message of “Osaka dialect” such as “I like Kema curry,” the standard response message “Kima curry at store A is very delicious” is “I love Kema curry at store A! ”.

そして、応答生成装置１００は、文章を変形した応答メッセージがユーザ端末１０によって出力されるよう出力制御を行う（ステップＳ１８）。 And the response production | generation apparatus 100 performs output control so that the response message which deform | transformed the sentence is output by the user terminal 10 (step S18).

このように、応答生成装置１００は、特定したユーザ特性に応じて応答メッセージの内容を変形する。例えば、応答生成装置１００は、ユーザ特性として、ユーザが「大阪弁」を話すことを特定した場合には、入力メッセージを受け付けるたびに、対応する応答メッセージの文章を「大阪弁」に変形し、変形した応答メッセージの出力制御を行う。これにより、応答生成装置１００は、「大阪弁」を話すユーザに対し、「標準語」で応答することによる違和感を与えることがない。すなわち、応答生成装置１００は、自然な流れでユーザの特性に応じた会話を行うことができ、会話に対するユーザの満足度を高めることができる。 As described above, the response generation apparatus 100 transforms the content of the response message according to the specified user characteristic. For example, when the response generation apparatus 100 specifies that the user speaks “Osaka dialect” as the user characteristic, every time an input message is accepted, the corresponding response message text is transformed into “Osaka dialect” Controls output of the modified response message. Thereby, the response generation device 100 does not give a sense of incongruity to the user who speaks “Osaka dialect” by responding with “standard language”. That is, the response generation device 100 can perform a conversation according to the user's characteristics in a natural flow, and can increase the user's satisfaction with the conversation.

〔２．応答生成システムの構成〕
次に、図２を用いて、実施形態にかかる応答生成システムの構成について説明する。図２は、実施形態にかかる応答生成システム１の構成例を示す図である。図２に示すように、応答生成システム１は、ユーザ端末１０と、音声認識装置２０と、広告主端末３０と、広告入札装置４０と、ＡＰＩサーバ装置６０と、音声合成装置７０と、応答生成装置１００とを含む。ユーザ端末１０と、音声認識装置２０と、広告主端末３０と、広告入札装置４０と、ＡＰＩサーバ装置６０と、音声合成装置７０と、応答生成装置１００とは、ネットワークＮを介して有線または無線により通信可能に接続される。なお、図２に示す応答生成システム１には、複数台のユーザ端末１０や、複数台の広告主端末３０が含まれてよい。 [2. Response generation system configuration]
Next, the configuration of the response generation system according to the embodiment will be described with reference to FIG. FIG. 2 is a diagram illustrating a configuration example of the response generation system 1 according to the embodiment. As shown in FIG. 2, the response generation system 1 includes a user terminal 10, a speech recognition device 20, an advertiser terminal 30, an advertisement bidding device 40, an API server device 60, a speech synthesis device 70, and a response generation. Device 100. The user terminal 10, the speech recognition device 20, the advertiser terminal 30, the advertisement bidding device 40, the API server device 60, the speech synthesis device 70, and the response generation device 100 are wired or wirelessly via the network N. To be communicable. Note that the response generation system 1 illustrated in FIG. 2 may include a plurality of user terminals 10 and a plurality of advertiser terminals 30.

ここで、応答生成システム１がユーザへ音声サービスを提供する処理の概要について説明する。ユーザ端末１０は、アプリケーションの起動後、ユーザの発話を検知すると、発話の音声データを音声認識装置２０へ送信する。 Here, an outline of processing in which the response generation system 1 provides a voice service to the user will be described. When the user terminal 10 detects the user's speech after the application is started, the user terminal 10 transmits speech data of the speech to the speech recognition device 20.

音声認識装置２０は、ユーザ端末１０から発話の音声データを受信すると、音声データをテキストデータに変換し、発話のテキストデータをユーザ端末１０へ送信する。音声認識装置２０から発話のテキストデータを受信したユーザ端末１０は、発話のテキストデータを応答生成装置１００に送信する。 When the speech recognition apparatus 20 receives speech voice data from the user terminal 10, the speech recognition apparatus 20 converts the speech data into text data and transmits the speech text data to the user terminal 10. Upon receiving the utterance text data from the speech recognition device 20, the user terminal 10 transmits the utterance text data to the response generation device 100.

広告主端末３０は、広告主から受け付けた広告情報を広告入札装置４０へ送信する。なお、広告情報には広告キーワード、広告データ等が含まれる。広告入札装置４０は、入札用画面を広告主端末３０に提示する。また、広告入札装置４０は、広告主端末３０から受け付けた広告情報を後述する記憶部に記憶する。また、広告入札装置４０は、応答生成装置１００によって変形された広告文を広告主に提示する。 The advertiser terminal 30 transmits the advertising information received from the advertiser to the advertising bidding device 40. The advertisement information includes advertisement keywords, advertisement data, and the like. The advertisement bidding device 40 presents a bid screen to the advertiser terminal 30. Further, the advertising bidding device 40 stores the advertising information received from the advertiser terminal 30 in a storage unit described later. Further, the advertising bidding device 40 presents the advertising text transformed by the response generation device 100 to the advertiser.

応答生成装置１００は、ユーザ端末１０から発話のテキストデータ及び音声認識装置２０によって取得されたユーザ情報を受信すると、上述した検索処理を実行して応答メッセージを生成する。また、応答生成装置１００は、ユーザの発話に基づいて画像検索結果や経路検索結果等を応答として出力する場合には、応答の生成に必要なデータの検索条件を指定し、ユーザ端末１０が起動したアプリケーションに対応するＡＰＩサーバ装置６０に対してデータの要求を行う。 When receiving the text data of the utterance and the user information acquired by the voice recognition device 20 from the user terminal 10, the response generation device 100 executes the above-described search process and generates a response message. When the response generation apparatus 100 outputs an image search result or a route search result as a response based on the user's utterance, the response generation apparatus 100 specifies a search condition for data necessary for generating the response, and the user terminal 10 is activated. The data request is made to the API server device 60 corresponding to the application.

ＡＰＩサーバ装置６０は、応答生成装置１００から受信した検索条件に従って、画像検索結果や経路検索結果等を含むデータを応答生成装置１００に送信する。例えば、ＡＰＩサーバ装置６０は、画像検索結果や経路検索結果を含むＸＭＬ（Extensible Markup Language）データを取得する処理を行い、取得したＸＭＬデータを応答生成装置１００に送信する。 The API server device 60 transmits data including an image search result and a route search result to the response generation device 100 in accordance with the search condition received from the response generation device 100. For example, the API server device 60 performs processing for acquiring XML (Extensible Markup Language) data including image search results and route search results, and transmits the acquired XML data to the response generation device 100.

応答生成装置１００は、ＡＰＩサーバ装置６０から、例えば、ＸＭＬデータを受信すると、ＸＭＬデータからデータを抽出し、ＸＭＬデータをＨＴＭＬデータに変換するとともに、ＸＭＬデータまたはＨＴＭＬデータから音声にて応答を行うテキストデータ（以下、応答発話表示用のテキストデータと記載する）を抽出する。また、応答生成装置１００は、応答発話表示用のテキストデータや、判定処理により取得された応答のテキストデータを音声合成装置７０に送信する。 When the response generation apparatus 100 receives, for example, XML data from the API server apparatus 60, the response generation apparatus 100 extracts the data from the XML data, converts the XML data into HTML data, and responds by voice from the XML data or HTML data. Text data (hereinafter referred to as text data for response utterance display) is extracted. In addition, the response generation device 100 transmits the response utterance display text data and the response text data acquired by the determination process to the speech synthesizer 70.

音声合成装置７０は、応答発話表示用のテキストデータや判定処理により取得された応答のテキストデータから音声を合成する音声合成処理を行って生成した応答発話用の中間表記を応答生成装置１００に送信する。応答生成装置１００は、応答発話用の中間表記と応答発話表示用のテキストデータとＨＴＭＬデータとをユーザ端末１０に送信する。 The speech synthesizer 70 transmits an intermediate notation for response utterance generated by performing speech synthesis processing for synthesizing speech from response utterance display text data and response text data acquired by the determination processing to the response generation device 100. To do. The response generation device 100 transmits the intermediate notation for response utterance, the text data for displaying the response utterance, and the HTML data to the user terminal 10.

ユーザ端末１０は、受信した応答発話用の中間表記を用いて、応答の音声を出力するとともに、応答発話表示用のテキストデータとＨＴＭＬデータとを用いて、応答内容を表示する。このようにして、応答生成システム１は、ユーザの発話に対して適切な応答を行う音声サービスを実現する。 The user terminal 10 outputs the response voice using the received intermediate notation for response utterance, and displays the response content using text data for displaying the response utterance and HTML data. In this way, the response generation system 1 realizes a voice service that makes an appropriate response to the user's utterance.

なお、応答生成装置１００は、上述した音声サービスを提供する処理に、上述した応答メッセージ変形処理を組み合わせることにより、よりユーザの特性に応じた応答メッセージの出力を実現する。 Note that the response generation apparatus 100 realizes output of a response message according to the characteristics of the user by combining the above-described response message modification process with the above-described process of providing the voice service.

〔３−１．広告入札装置の構成〕
次に、図３を用いて、実施形態にかかる広告入札装置４０について説明する。図３は、実施形態にかかる広告入札装置４０の構成例を示す図である。図３に示すように、広告入札装置４０は、通信部４１と、広告情報記憶部４２と、制御部４３とを有する。 [3-1. Configuration of advertising bid device]
Next, the advertising bidding apparatus 40 according to the embodiment will be described with reference to FIG. FIG. 3 is a diagram illustrating a configuration example of the advertisement bidding apparatus 40 according to the embodiment. As shown in FIG. 3, the advertisement bidding apparatus 40 includes a communication unit 41, an advertisement information storage unit 42, and a control unit 43.

通信部４１は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部４１は、ネットワークと有線または無線で接続される。 The communication unit 41 is realized by, for example, a NIC (Network Interface Card). The communication unit 41 is connected to the network by wire or wireless.

広告情報記憶部４２は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。 The advertisement information storage unit 42 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk.

広告情報記憶部４２は、各種広告情報を記憶する。具体的には、広告情報記憶部４２は、広告主端末３０から入札として受け付けた広告情報を記憶する。ここで、図４に、実施形態にかかる広告情報記憶部４２の一例を示す。図４に示した例では、広告情報記憶部４２は、広告ＩＤに、広告キーワードと、広告タグと、広告データとを対応付けて記憶する。 The advertisement information storage unit 42 stores various types of advertisement information. Specifically, the advertisement information storage unit 42 stores advertisement information received as a bid from the advertiser terminal 30. Here, FIG. 4 shows an example of the advertisement information storage unit 42 according to the embodiment. In the example illustrated in FIG. 4, the advertisement information storage unit 42 stores advertisement keywords, advertisement tags, and advertisement data in association with advertisement IDs.

「広告ＩＤ」は、広告情報を識別するための識別情報を示す。また、「広告ＩＤ」は、広告主及び広告主端末３０を識別するための識別情報でもある。「出力箇所」は、広告主が希望する広告出力箇所を示す。具体的には、広告入札装置４０は、応答生成装置１００から所定の部分の判定ツリーを取得し、取得した判定ツリーを広告主に提示することにより、広告主は、かかる判定ツリーを参考に、広告を出力したい出力箇所として、任意の検出ノードに対応するノードＩＤを選択する。つまり、広告主は、選択した検索ノード（出力メッセージ）に対する応答メッセージとして自身の広告情報を出力するよう指定する。 “Advertisement ID” indicates identification information for identifying advertisement information. The “advertisement ID” is also identification information for identifying the advertiser and the advertiser terminal 30. “Output location” indicates an advertisement output location desired by the advertiser. Specifically, the advertising bidding device 40 acquires a predetermined part of the determination tree from the response generation device 100, and presents the acquired determination tree to the advertiser, whereby the advertiser refers to the determination tree. A node ID corresponding to an arbitrary detection node is selected as an output location where an advertisement is to be output. In other words, the advertiser specifies to output his / her advertisement information as a response message to the selected search node (output message).

「広告キーワード」は、広告主によって設定されるキーワードである。例えば、広告主は、広告したい商品や情報を特徴づける言葉を広告キーワードとして設定する。「広告タグ」は、どのような人に対して広告したいかといった広告対象を示す。 “Advertising keyword” is a keyword set by the advertiser. For example, the advertiser sets a word characterizing a product or information to be advertised as an advertising keyword. “Advertisement tag” indicates an advertising target to whom a person wants to advertise.

すなわち、図４では、広告ＩＤ「Ｃ０１」によって識別される広告主（例えばＡ店とする）は、ノードＩＤ「Ｎ９」に対する応答メッセージとして、「Ａ店のキーマカレーすごくおいしいです」といった広告データを出力するよう指定している。また、ユーザが「キーマカレー」を含むメッセージを入力した場合に、そのユーザが大阪地方に関係するユーザであれば、出力するよう指定していることを示す。 That is, in FIG. 4, the advertiser identified by the advertisement ID “C01” (for example, store A) uses advertisement data such as “A store's key curry is very delicious” as a response message to the node ID “N9”. The output is specified. In addition, when the user inputs a message including “key curry”, if the user is a user related to the Osaka region, it is designated to output.

図３に戻って説明を続ける。制御部４３は、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、広告入札装置４０の内部の記憶装置に記憶されている各種プログラム（広告入札プログラムの一例に相当）がＲＡＭ（Random Access Memory）を作業領域として実行されることにより実現される。また、制御部４３は、例えば、ＡＳＩＣ（Application Specific Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 Returning to FIG. 3, the description will be continued. For example, the control unit 43 stores various programs (corresponding to an example of an advertisement bidding program) stored in a storage device inside the advertising bidding apparatus 40 by a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like. This is realized by executing (Random Access Memory) as a work area. The control unit 43 is realized by an integrated circuit such as an application specific circuit (ASIC) or a field programmable gate array (FPGA).

図３に示すように、制御部４３は、入札受付部４４と、提示部４５とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部４３の内部構成は、図３に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、制御部４３が有する各処理部の接続関係は、図３に示した接続関係に限られず、他の接続関係であってもよい。 As shown in FIG. 3, the control unit 43 includes a bid accepting unit 44 and a presentation unit 45, and realizes or executes information processing functions and operations described below. Note that the internal configuration of the control unit 43 is not limited to the configuration illustrated in FIG. 3, and may be another configuration as long as information processing described later is performed. Further, the connection relationship between the processing units included in the control unit 43 is not limited to the connection relationship illustrated in FIG. 3, and may be another connection relationship.

入札受付部４４は、広告主端末３０に所定の入札画面を提示することにより、広告主から、希望する広告出力箇所を示す検出ノードＩＤと、広告キーワード、広告タグと、広告データとを含む広告情報の入札を受け付ける。そして、入札受付部４４は、広告ＩＤを払い出し、払い出した広告ＩＤに受け付けた広告情報に含まれる検索ノードＩＤと、広告キーワードと、広告タグと、広告データとを対応付けて広告情報記憶部４２に格納する。なお、入札受付部４４は、必ずしもノードＩＤと、広告キーワード、広告タグの全てを受け付ける必要はなく、少なくともいずれか一つを受け付けていればよい。 The bid reception unit 44 presents a predetermined bid screen on the advertiser terminal 30 so that the advertiser can receive an advertisement including a detection node ID indicating the desired advertisement output location, an advertisement keyword, an advertisement tag, and advertisement data. Accept bids for information. Then, the bid reception unit 44 pays out the advertisement ID, and associates the search node ID, the advertisement keyword, the advertisement tag, and the advertisement data included in the advertisement information received in the paid out advertisement ID with the advertisement information storage unit 42. To store. Note that the bid reception unit 44 does not necessarily have to receive all of the node ID, the advertisement keyword, and the advertisement tag, and may accept at least one of them.

提示部４５は、応答生成装置１００からの広告取得要求に応じて広告情報を検索し、検索した広告情報に含まれる広告データを提示する。具体的には、提示部４５は、応答生成装置１００から検索ノードＩＤやユーザ特性を受け付けた場合に、かかる検索ノードＩＤと一致する出力箇所が設定されている広告情報を検索する。なお、提示部４５は、検索により複数の広告情報を得た場合には、例えば、それら広告情報のうち、受け付けた検索キーワード及びユーザ特性を満たす広告情報を特定することにより絞り込む。そして、提示部４５は、検索した広告情報に含まれる広告データを応答生成装置１００に対して提示する。なお、後述するが検索キーワードとは、応答生成装置１００によって入力メッセージに含まれる所定のキーワードが、広告情報検索のための検索キーワードとして設定される。 The presentation unit 45 searches for advertisement information in response to an advertisement acquisition request from the response generation device 100, and presents advertisement data included in the searched advertisement information. Specifically, when receiving a search node ID or user characteristics from the response generation device 100, the presentation unit 45 searches for advertisement information in which an output location that matches the search node ID is set. In addition, when a plurality of pieces of advertisement information are obtained by the search, for example, the presentation unit 45 narrows down by specifying advertisement information that satisfies the received search keyword and user characteristics among the advertisement information. Then, the presentation unit 45 presents the advertisement data included in the searched advertisement information to the response generation device 100. As will be described later, as a search keyword, a predetermined keyword included in the input message is set as a search keyword for searching advertisement information by the response generation device 100.

〔３−２．応答生成装置の構成〕
次に、図５を用いて、実施形態にかかる応答生成装置１００について説明する。図５は、実施形態にかかる応答生成装置１００の構成例を示す図である。図５に示すように、応答生成装置１００は、通信部１１０と、判定情報記憶部１２０と、制御部１３０とを有する。 [3-2. Response generator configuration]
Next, the response generation device 100 according to the embodiment will be described with reference to FIG. FIG. 5 is a diagram illustrating a configuration example of the response generation device 100 according to the embodiment. As illustrated in FIG. 5, the response generation device 100 includes a communication unit 110, a determination information storage unit 120, and a control unit 130.

通信部１１０は、例えば、ＮＩＣ等によって実現される。そして、通信部１１０は、ネットワークと有線または無線で接続される。 The communication unit 110 is realized by a NIC or the like, for example. The communication unit 110 is connected to the network by wire or wireless.

判定情報記憶部１２０は、例えば、ＲＡＭ、フラッシュメモリ等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。そして、判定情報記憶部１２０は、ユーザによる応答メッセージに対する応答メッセージを判定するための判定情報を記憶する。判定情報は、入力メッセージに対応する処理手順が定められた検出ノードと、応答メッセージに対応する処理手順が定められた動作ノードと、検出ノードと動作ノードとの接続関係を示すエッジとから構成されるツリー構造のデータである。 The determination information storage unit 120 is realized by, for example, a semiconductor memory device such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk. And the determination information storage part 120 memorize | stores the determination information for determining the response message with respect to the response message by a user. The determination information includes a detection node in which a processing procedure corresponding to the input message is determined, an operation node in which a processing procedure corresponding to the response message is determined, and an edge indicating a connection relationship between the detection node and the operation node. Tree-structured data.

ここで、図６に実施形態にかかる判定情報記憶部１２０の一例を示す。図６に示した例では、各ノードを識別するノードＩＤと、ノードの種別を示すノード種別と、メッセージに対応する処理手順を示す処理内容とを関連付けて記憶する。また、判定情報記憶部１２０には、各ノードがどのノードと接続されているかを示す情報が登録されているものとする。例えば、判定情報記憶部１２０には、ノードＩＤ「Ｎ１」のノードと、ノードＩＤ「Ｎ２」、「Ｎ３」のノードとが接続され、ノードＩＤ「Ｎ１」のノードからノードＩＤ「Ｎ２」、「Ｎ３」に遷移する確率である遷移確率がそれぞれ「０．５」であるものとする。この結果、判定情報記憶部１２０は、図７に示すツリー構造のデータを記憶することとなる。 Here, FIG. 6 illustrates an example of the determination information storage unit 120 according to the embodiment. In the example shown in FIG. 6, the node ID for identifying each node, the node type indicating the type of the node, and the processing content indicating the processing procedure corresponding to the message are stored in association with each other. Further, it is assumed that information indicating which node each node is connected to is registered in the determination information storage unit 120. For example, the node having the node ID “N1” and the nodes having the node IDs “N2” and “N3” are connected to the determination information storage unit 120, and the node IDs “N2” and “N2” are connected to the nodes having the node ID “N1”. It is assumed that the transition probabilities that are the probabilities of transition to “N3” are “0.5”, respectively. As a result, the determination information storage unit 120 stores data having the tree structure shown in FIG.

図７は、判定情報記憶部１２０に記憶されるツリー構造の模式図である。図７に示される破線ブロックは、検出ノードを示しており、実線ブロックは動作ノードを示している。また、これらのブロックには、ノードＩＤが付されている。また、各ブロック同士を繋ぐ矢印は、エッジを示しており、具体的には、始点（矢がない側）が接続元ノードを示し、終点（矢がある側）が接続先ノードを示している。例えば、ノードＩＤ「Ｎ１」のノードとノードＩＤ「Ｎ２」のノードとを接続する矢印は、接続元ノードがノードＩＤ「Ｎ１」の検出ノードであり、接続先ノードがノードＩＤ「Ｎ２」の動作ノードであることを示している。 FIG. 7 is a schematic diagram of a tree structure stored in the determination information storage unit 120. A broken line block shown in FIG. 7 indicates a detection node, and a solid line block indicates an operation node. Also, node IDs are assigned to these blocks. Moreover, the arrow which connects each block has shown the edge, and specifically, the start point (side without an arrow) shows a connection origin node, and the end point (side with an arrow) shows a connection destination node. . For example, the arrow connecting the node with the node ID “N1” and the node with the node ID “N2” is an operation in which the connection source node is the detection node with the node ID “N1” and the connection destination node is the node ID “N2”. Indicates a node.

また、図７に模式した判定ツリーは、判定情報記憶部１２０が記憶する検出ノードや動作ノードのうち一部のノードのみを表したものであり、各ノードには図６や図７に示すノード以外にも、各種の検出ノードや動作ノードが接続されているものとする。 Further, the determination tree schematically illustrated in FIG. 7 represents only some of the detection nodes and operation nodes stored in the determination information storage unit 120, and the nodes illustrated in FIGS. 6 and 7 are included in each node. In addition, it is assumed that various detection nodes and operation nodes are connected.

図５に戻って説明を続ける。制御部１３０は、例えば、ＣＰＵやＭＰＵ等によって、応答生成装置１００の内部の記憶装置に記憶されている各種プログラム（応答生成プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０は、例えば、ＡＳＩＣやＦＰＧＡ等の集積回路により実現される。 Returning to FIG. The control unit 130 is realized by executing various programs (corresponding to an example of a response generation program) stored in a storage device inside the response generation apparatus 100 by using the RAM as a work area by, for example, a CPU or MPU. Is done. The control unit 130 is realized by an integrated circuit such as an ASIC or FPGA, for example.

図５に示すように、制御部１３０は、受信部１３１と、特定部１３２と、変形部１３３と、出力制御部１３４とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部１３０の内部構成は、図５に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、制御部１３０が有する各処理部の接続関係は、図５に示した接続関係に限られず、他の接続関係であってもよい。 As illustrated in FIG. 5, the control unit 130 includes a reception unit 131, a specification unit 132, a deformation unit 133, and an output control unit 134, and realizes or executes information processing functions and operations described below. To do. Note that the internal configuration of the control unit 130 is not limited to the configuration illustrated in FIG. 5, and may be another configuration as long as information processing described later is performed. In addition, the connection relationship between the processing units included in the control unit 130 is not limited to the connection relationship illustrated in FIG. 5, and may be another connection relationship.

受信部１３１は、上述したように、ユーザ端末１０、ＡＰＩサーバ装置６０、音声合成装置７０から各種情報を受信する。また、受信部１３１は、図示しない外部装置によって作製された判定情報を受信し、判定情報記憶部１２０に格納する。また、受信部１３１は、音声認識装置２０から音声テキストとユーザ特性を受け付ける。そして、受信部１３１は、受け付けたユーザ特性を変形部１３３へ送信する。 As described above, the receiving unit 131 receives various types of information from the user terminal 10, the API server device 60, and the speech synthesizer 70. The receiving unit 131 receives determination information created by an external device (not shown) and stores the determination information in the determination information storage unit 120. The receiving unit 131 receives voice text and user characteristics from the voice recognition device 20. Then, the reception unit 131 transmits the received user characteristic to the deformation unit 133.

特定部１３２は、ユーザとの会話の特徴に関する特徴情報を特定する。具体的には、特定部１３２は、かかる特徴情報として、会話の進め方の傾向に応じたユーザ特性を特定する。 The specifying unit 132 specifies feature information related to features of conversation with the user. Specifically, the specifying unit 132 specifies a user characteristic corresponding to the tendency of the conversation as the feature information.

なお、図１でも説明したように、応答生成装置１００は、音声認識装置２０によって音の特徴情報から特定されたユーザ特性を用いて、以下に示す変形処理を行ってもよい。また、応答生成装置１００は、特定部１３２によって特定されたユーザ特性と、音声認識装置２０によって特定されたユーザ特性の両方を用いて、以下に示す変形処理を行ってもよい。 As described with reference to FIG. 1, the response generation apparatus 100 may perform the following deformation process using the user characteristics specified from the sound feature information by the speech recognition apparatus 20. Further, the response generation device 100 may perform the following modification process using both the user characteristics specified by the specifying unit 132 and the user characteristics specified by the speech recognition apparatus 20.

特定部１３２が、会話の進め方の傾向からユーザ特性を特定するには、例えば、以下に示す第１の特定方法がある。具体的には、特定部１３２は、応答生成装置１００によって使用されるノードに基づいて会話の進め方の傾向を特定する。応答生成装置１００によって使用されるノードとは、例えば、入力メッセージを受け付けたことによる対応する検出ノードの使用や、かかる入力メッセージに対する応答メッセージに対応する動作ノードの使用である。 In order for the specifying unit 132 to specify the user characteristics from the tendency of the conversation to proceed, for example, there is a first specifying method described below. Specifically, the specifying unit 132 specifies the tendency of the conversation progress based on the nodes used by the response generation device 100. The node used by the response generation device 100 is, for example, use of a corresponding detection node due to reception of an input message, or use of an operation node corresponding to a response message to the input message.

第１の特定方法として、特定部１３２は、一連の会話の中で、応答生成装置１００が検出ノード、または、動作ノードを使用するたびに、その使用回数をノード毎に計数する。そして、特定部１３２は、使用回数に基づく会話の進め方の傾向に基づくユーザ特性を特定する。具体的には、特定部１３２は、一連の会話の中で、応答生成装置１００によって使用された各ノードにおいて、使用回数が所定の閾値より少ないか多いかに基づく会話傾向から、ユーザ特性を特定する。例えば、特定部１３２は、閾値に「特定回数：２回」を設定し、これに基づいて、「特定回数：２回未満→会話傾向：話題が切り替わりにくい→性格：無駄話嫌い」、「特定回数：２回以上→会話傾向：話題が切り替わりやすい→性格：無駄話好き」といった判定基準を設定することにより、特定部１３２は、計数結果と判定基準を比較し、ユーザ特性を特定する。 As a first identification method, the identification unit 132 counts the number of times of use for each node every time the response generation apparatus 100 uses a detection node or an operation node in a series of conversations. And the specific | specification part 132 specifies the user characteristic based on the tendency of how to advance the conversation based on the frequency | count of use. Specifically, the specifying unit 132 specifies user characteristics from a conversation tendency based on whether the number of times of use is less than or greater than a predetermined threshold in each node used by the response generation device 100 in a series of conversations. . For example, the specifying unit 132 sets “specific number of times: twice” as the threshold, and based on this, “specific number: less than two times → conversation tendency: topic is difficult to switch → character: dislike useless talk”, “specific” By setting a determination criterion such as “number of times: two or more → conversation tendency: topic is easy to switch → character: useless talk lover”, the specifying unit 132 compares the count result with the determination criterion to specify the user characteristic.

例えば、所定のノードが使用される回数が多いということは、無駄話を好む傾向等により、ユーザは応答生成装置１００と何度も同じ内容の会話をしていることが予測できる。つまり、無駄話好きな性格であるほどこのような傾向にあるといえる。一方、所定のノードが使用される回数が少ないということは、無駄話を好まない傾向、判定ツリーに忠実な会話が行われたことが予測できる。つまり、無駄話嫌いな性格であるほどこのような傾向にあるといえる。 For example, if a predetermined node is used many times, it can be predicted that the user has a conversation with the response generation device 100 many times due to a tendency to prefer useless talk. In other words, it can be said that the tendency is such a tendency that the personality likes waste talk. On the other hand, if the number of times a predetermined node is used is small, it can be predicted that a conversation that is faithful to the decision tree tends to be disliked. In other words, it can be said that the more disagreeable personality is, the more the tendency is.

第２の特定方法として、特定部１３２は、一連の会話の中で、広告情報を検索するまでに、応答生成装置１００によって使用されたノードの総数に基づく会話傾向からユーザ特性を特定する。具体的には、特定部１３２は、一連の会話の中で、応答生成装置１００によって使用されたノードの総数に閾値を設定しており、その閾値より総数が少ないか多いかに基づく会話傾向から、ユーザ特性を特定する。 As a second specifying method, the specifying unit 132 specifies a user characteristic from a conversation tendency based on the total number of nodes used by the response generation device 100 before searching for advertisement information in a series of conversations. Specifically, the identifying unit 132 sets a threshold value for the total number of nodes used by the response generation device 100 in a series of conversations, and from the conversation tendency based on whether the total number is less than or greater than the threshold value, Identify user characteristics.

例えば、一連の会話の中で、応答生成装置１００によって使用されたノード総数が多いほど、会話が盛り上がったために、会話が長引いたことが予測できる。つまり、無駄話好きな性格であるほどこのような傾向にあるといえる。一方、ノード総数が少ないほど、無駄のない短い会話であることが予測できる。つまり、無駄話嫌いな性格であるほどこのような傾向にあるといえる。 For example, in a series of conversations, it can be predicted that the conversation has been protracted because the conversation is more exciting as the total number of nodes used by the response generation device 100 is larger. In other words, it can be said that the tendency is such a tendency that the personality likes waste talk. On the other hand, as the total number of nodes is smaller, it can be predicted that the conversation is less wasteful. In other words, it can be said that the more disagreeable personality is, the more the tendency is.

第３の特定方法として、特定部１３２は、応答生成装置１００によって使用されたノードの組合せに基づく会話傾向からユーザ特性を特定する。具体的には、上記第１の特定方法によって、「無駄話嫌い」と特定された場合であっても、ノードの組合せによっては「無駄話好き」と特定した方が妥当である場合がある。例えば、「カレー」に関す会話において、「サッカー」に関するメッセージが入力された場合には、そのようなメッセージを入力したユーザは、「無駄話好き」と特定した方が妥当である。つまり、「無駄話好き」であるために、話題を変えてさらに様々な会話を行おうとしていることが予測できる。 As a third specifying method, the specifying unit 132 specifies a user characteristic from a conversation tendency based on a combination of nodes used by the response generation device 100. Specifically, even if it is identified as “I hate waste talk” by the first identification method, it may be more appropriate to identify “I like waste talk” depending on the combination of nodes. For example, in a conversation related to “curry”, when a message related to “soccer” is input, it is more appropriate that the user who has input such a message is identified as “I like waste talk”. In other words, it is possible to predict that various conversations are being performed by changing the topic because the user likes “waste talk”.

具体的な処理としては、特定部１３２は、入力メッセージが対話文脈に沿っているか否かを示す指数を、各ノードに対応付けられているカテゴリに基づいて算出し、その値が所定値より小さいほど対話文脈から外れた入力メッセージであるとし、そのようなメッセージを入力したユーザを「無駄話好き」と特定する。 As specific processing, the specifying unit 132 calculates an index indicating whether or not the input message is in accordance with the conversation context based on the category associated with each node, and the value is smaller than a predetermined value. It is assumed that the input message is out of the dialog context, and the user who has input such a message is identified as “I like useless talk”.

なお、カテゴリとは、話題の分類を示し、カレーの話、六本木の話といった様々なカテゴリが存在する。例えば、カテゴリ「カレーの話」が対応付けられたノードから成る一連の会話において、カテゴリ「六本木の話」を有する検出ノードが応答生成装置１００によって使用された場合、その検出ノードに対応する入力メッセージは、対話文脈から外れているといえる。特定部１３２は、このように対話文脈から外れている度合いを数値化し、その数値が所定値より小さい場合には、そのようなメッセージを入力したユーザを「無駄話好き」と特定する。 The category indicates a topic classification, and there are various categories such as curry stories and Roppongi stories. For example, when a detection node having the category “Roppongi Story” is used by the response generation apparatus 100 in a series of conversations composed of nodes associated with the category “Curry Story”, an input message corresponding to the detection node is used. Is out of the dialog context. The specifying unit 132 digitizes the degree of deviation from the dialog context in this way, and if the value is smaller than a predetermined value, the user who has input such a message is specified as “I like useless talk”.

第４の特定方法として、特定部１３２は、応答メッセージが出力制御されてから新たな入力メッセージをユーザから受け付けるまでの時間に基づく会話傾向からユーザ特性を特定する。つまり、特定部１３２は、ノードが使用されるたびに、次のノードを使用されるまでの時間を計測する。そして、特定部１３２は、この操作を会話が終了するまで各ノードについて行い、広告を検索する時点における各ノード間の平均時間を算出する。具体的には、特定部１３２は、平均時間に閾値を設定しており、その閾値より算出した平均時間が少ないか多いかに基づく会話傾向から、ユーザの性格を判定する。 As a fourth specifying method, the specifying unit 132 specifies the user characteristic from the conversation tendency based on the time from when the response message is output-controlled until a new input message is received from the user. That is, each time a node is used, the specifying unit 132 measures the time until the next node is used. Then, the specifying unit 132 performs this operation for each node until the conversation ends, and calculates an average time between the nodes at the time of searching for an advertisement. Specifically, the specifying unit 132 sets a threshold value for the average time, and determines the personality of the user from the conversation tendency based on whether the average time calculated from the threshold value is less or longer.

例えば、一連の会話の中で、特定部１３２によって算出された平均時間が長いほど、会話が盛り上がったために、会話が長引いたことが予測できる。つまり、無駄話好きな性格であるほどこのような傾向にあるといえる。一方、平均時間が短いほど、無駄のない短い会話であることが予測できる。つまり、無駄話嫌いな性格であるほどこのような傾向にあるといえる。 For example, in a series of conversations, the longer the average time calculated by the specifying unit 132 is, the more the conversation is excited. In other words, it can be said that the tendency is such a tendency that the personality likes waste talk. On the other hand, as the average time is shorter, it can be predicted that the conversation is less wasteful. In other words, it can be said that the more disagreeable personality is, the more the tendency is.

第５の特定方法として、特定部１３２は、応答生成装置１００によって、最初のノードが使用されてから広告情報を検索するまでの所要時間を計測する。そして、特定部１３２は、上述してきた特定例と同様に、所要時間に閾値を設定することにより、計測した所要時間がその閾値より少ないか多いかに基づく会話傾向からユーザ特性を特定する。 As a fifth identification method, the identification unit 132 measures the time required from the first node is used until the advertisement information is searched by the response generation device 100. And the specific | specification part 132 specifies a user characteristic from the conversation tendency based on whether the measured required time is less than or more than the threshold value by setting a threshold value for the required time similarly to the specific example mentioned above.

最後に、第６の特定例として、特定部１３２は、第１〜第５の特定方法を合わせて用いることにより、総合的にユーザ特性を特定する。 Finally, as a sixth specifying example, the specifying unit 132 comprehensively specifies user characteristics by using the first to fifth specifying methods together.

特定部１３２は、以上のような方法を用いてユーザ特性を特定する。そして、特定部１３２は、特定したユーザ特性を変形部１３３へ送信する。 The specifying unit 132 specifies user characteristics using the method described above. Then, the specifying unit 132 transmits the specified user characteristic to the deforming unit 133.

一方、音声認識装置２０が音の特徴情報としてユーザ特性を特定するには、例えば、以下に示す方法がある。具体的には、音声認識装置２０は、ユーザ端末１０から受信した音声データを解析することにより、音の特徴情報としてユーザ特性を特定する。つまり、音声認識装置２０は、音声データ解析により音声波形を生成する。そして、応答生成装置１００は、生成した音声波形のピーク形状や周波数等を所定の情報と照合することにより、ユーザ特性を特定する。例えば、音の特徴情報から得られるユーザ特性としては、年齢、性別、性別、感情、方言、体調等がある。そして、音声認識装置２０は、音声テキストと共に、特定したユーザ特性を応答生成装置１００へ送信する。音声認識装置２０からユーザ特性を受信した、応答生成装置１００の受信部１３１は、かかるユーザ特性を変形部１３３へ送信する。 On the other hand, in order for the voice recognition apparatus 20 to specify user characteristics as sound feature information, for example, there is a method described below. Specifically, the voice recognition device 20 identifies user characteristics as sound characteristic information by analyzing voice data received from the user terminal 10. That is, the voice recognition device 20 generates a voice waveform by voice data analysis. Then, the response generation device 100 specifies the user characteristic by comparing the peak shape, frequency, and the like of the generated speech waveform with predetermined information. For example, user characteristics obtained from sound feature information include age, sex, gender, emotion, dialect, physical condition, and the like. Then, the voice recognition device 20 transmits the specified user characteristics together with the voice text to the response generation device 100. The reception unit 131 of the response generation device 100 that has received the user characteristic from the speech recognition apparatus 20 transmits the user characteristic to the deformation unit 133.

変形部１３３は、音声認識装置２０や特定部１３２によって特定されたユーザ特性に基づいて、応答メッセージの内容を変形する。まず、変形部１３３は、ユーザ特性を受信した場合に、応答メッセージを取得する。 The deformation unit 133 deforms the content of the response message based on the user characteristics specified by the voice recognition device 20 or the specifying unit 132. First, the deformation | transformation part 133 acquires a response message, when a user characteristic is received.

ここで、応答生成装置１００は、受信部１３１から音声テキスト及びユーザ情報を受け付けた場合は、音声テキストと判定情報記憶部１２１に記憶された情報（判定ツリー）とを用いて、応答メッセージを選択する。例えば、検索部１３２は、音声テキストを受け付けた場合に、かかる音声テキストに含まれているキーワードを有する検出ノードを判定し、判定した検出ノードと接続された動作ノードに対応する応答メッセージが出力されるよう、かかる応答メッセージのデータを出力制御する。続いて、検索部１３２は、この応答メッセージに対する入力メッセージの音声テキストを受け付けた場合に、かかる動作ノードと接続された複数の検出ノードのうち、この音声テキストに含まれているキーワードを有する検出ノードを判定する。そして、検索部１３２は、判定した検出ノードに接続された動作ノードに対応する応答メッセージのデータを出力制御する。このように、応答生成装置１００は、検出ノードと動作ノードを使用してユーザとの会話を実現する。 Here, when the response generation apparatus 100 receives the speech text and the user information from the reception unit 131, the response generation device 100 selects a response message using the speech text and the information (determination tree) stored in the determination information storage unit 121. To do. For example, when receiving the speech text, the search unit 132 determines a detection node having a keyword included in the speech text, and a response message corresponding to the operation node connected to the determined detection node is output. The output of the response message data is controlled. Subsequently, when receiving the voice text of the input message for the response message, the search unit 132 has a keyword included in the voice text among the plurality of detection nodes connected to the operation node. Determine. Then, the search unit 132 controls the output of response message data corresponding to the operation node connected to the determined detection node. In this way, the response generation device 100 realizes a conversation with the user using the detection node and the operation node.

そして、変形部１３３は、応答生成装置１００によって検出ノードが使用されるたびに、かかる検出ノードに対応する動作ノード（応答メッセージ）となる広告情報を広告入札装置４０から取得する。また、広告情報が登録されていない場合には、変形部１３３は、動作ノードとして判定ツリーに登録されている通常の応答メッセージのデータを取得する。 Then, whenever the detection node is used by the response generation device 100, the transformation unit 133 acquires advertisement information that becomes an operation node (response message) corresponding to the detection node from the advertisement bidding device 40. Further, when the advertisement information is not registered, the transformation unit 133 acquires normal response message data registered in the determination tree as an operation node.

例えば、変形部１３３は、広告情報を取得する場合には、応答生成装置１００によって使用された検出ノードＩＤを広告入札装置４０へ通知する。ここで、広告入札装置４０の提示部４５は、応答生成装置１００によって通知されたノードＩＤに対応する広告情報を広告情報記憶部４２から検索し、検索した広告情報に含まれる広告データを応答生成装置１００に提示する。そして、変形部１３３は、提示された広告データを取得する。 For example, when acquiring the advertising information, the deforming unit 133 notifies the advertising bidding device 40 of the detection node ID used by the response generation device 100. Here, the presentation unit 45 of the advertising bid device 40 searches the advertising information storage unit 42 for advertising information corresponding to the node ID notified by the response generating device 100, and generates response data for the advertising data included in the searched advertising information. Present to device 100. Then, the deforming unit 133 acquires the presented advertisement data.

なお、必ずしも検出ノードのノードＩＤに基づく検索が行われる必要はない。例えば、変形部１３３は、入力メッセージに含まれる所定のキーワードを検索キーワードとする。例えば、応答生成装置１００が、「キーマカレー好きやわ」といった入力メッセージのテキストを受け付けたとすると、変形部１３３は、「キーマカレー」を検索キーワードとする。そして、変形部１３３は、検索キーワードとユーザ特性とを広告入札装置４０へ送信する。ここで、広告入札装置４０の提示部４５は、検索キーワードと広告キーワードが一致し、かつ、ユーザ特性と広告タグが一致する広告情報を広告情報記憶部４２から検索し、検索した広告情報に含まれる広告データを応答生成装置１００に提示する。また、提示部４５は、通知された検出ノードＩＤに基づいて検索を行った場合、複数の広告情報が得られた際には、検索キーワードとユーザ特性を用いることにより広告情報を絞り込んでもよい。 Note that the search based on the node ID of the detection node is not necessarily performed. For example, the deformation unit 133 uses a predetermined keyword included in the input message as a search keyword. For example, if the response generation apparatus 100 accepts an input message text such as “I like Kema Curry,” the transformation unit 133 uses “Kima Curry” as a search keyword. Then, the deforming unit 133 transmits the search keyword and the user characteristic to the advertising bid device 40. Here, the presentation unit 45 of the advertisement bidding device 40 searches the advertisement information storage unit 42 for advertisement information in which the search keyword and the advertisement keyword match and the user characteristic and the advertisement tag match, and is included in the searched advertisement information. The advertisement data to be displayed is presented to the response generation apparatus 100. In addition, when performing a search based on the notified detection node ID, the presentation unit 45 may narrow down the advertisement information by using a search keyword and user characteristics when a plurality of pieces of advertisement information is obtained.

以下では、変形部１３３は、出力時における音声の特徴が、特定されたユーザ特性に対応する音声の特徴と同様の特徴となるように、応答メッセージの内容を変形する例について説明する。具体的には、音声認識装置２０によって特定されたユーザ特性に応じて、通常の応答メッセージの内容（以下、「応答文」と表記する場合がある）、及び、広告データの内容を変形する例について図７を用いて説明する。 Hereinafter, an example will be described in which the deforming unit 133 modifies the content of the response message so that the voice feature at the time of output is the same as the voice feature corresponding to the specified user characteristic. Specifically, an example in which the content of a normal response message (hereinafter sometimes referred to as “response text”) and the content of advertisement data are modified according to the user characteristics specified by the speech recognition device 20. Will be described with reference to FIG.

まず、ユーザＵ０１が、例えば、「カレーめっちゃ好きやわ」といったメッセージを入力したことにより、音声認識装置２０は、ユーザ特性「大阪弁」を特定したものとする。そして、応答生成装置１００は、音声認識装置２０から音声テキストと、ユーザ特性「大阪弁」を受け付けたことにより、検出ノードＩＤ「Ｎ１」を使用したとする。また、応答生成装置１００の受信部１３１は、ユーザ特性「大阪弁」を変形部１３３へ送信する。 First, it is assumed that the voice recognition device 20 specifies the user characteristic “Osaka dialect” when the user U01 inputs a message such as “I love curry, I love it”, for example. Then, it is assumed that the response generation apparatus 100 uses the detection node ID “N1” by receiving the voice text and the user characteristic “Osaka dialect” from the voice recognition apparatus 20. In addition, the reception unit 131 of the response generation device 100 transmits the user characteristic “Osaka dialect” to the deformation unit 133.

ここで、応答生成装置１００の変形部１３３は、応答生成装置１００によって検出ノードＩＤ「Ｎ１」が使用されたことにより、検出ノードＩＤ「Ｎ１」に対する広告情報が広告入札装置４０に登録されている場合には、その広告情報に含まれる広告データを取得する。ここでは、広告情報の登録がなく、変形部１３３は、動作ノードＩＤ「Ｎ３」に対応する通常の応答メッセージを取得したとする。かかる場合に、変形部１３３は、取得した応答メッセージをユーザ特性「大阪弁」に合わせて大阪弁の文章に変形する。具体的には、変形部１３３は、「どこのカレー？六本木？」を「どこのカレーなん？六本木？」と変形する。そして、変形部１３３は、変形した応答メッセージのデータを出力制御部１３４へ送信する。さらに、この応答メッセージに対して、ユーザＵ０１によってメッセージが入力されたことにより、応答生成装置１００が検出ノードＩＤ「Ｎ５」を使用した場合に、変形部１３３は、広告入札装置４０に広告情報の登録がないため、動作ノードＩＤ「Ｎ８」に対応する通常の応答メッセージを取得したとする。かかる場合に、変形部１３３は、取得した応答メッセージ「どんなカレーが好き？」を「どんなカレーが好きなん？」と変形する。そして、変形部１３３は、変形した応答メッセージのデータを出力制御部１３４へ送信する。 Here, the transformation unit 133 of the response generation device 100 registers the advertisement information for the detection node ID “N1” in the advertisement bidding device 40 when the response generation device 100 uses the detection node ID “N1”. In such a case, advertisement data included in the advertisement information is acquired. Here, it is assumed that there is no registration of advertisement information, and the deforming unit 133 acquires a normal response message corresponding to the operation node ID “N3”. In such a case, the transformation unit 133 transforms the acquired response message into an Osaka dialect text in accordance with the user characteristic “Osaka dialect”. Specifically, the deforming unit 133 transforms “where curry? Roppongi?” To “where is curry? Roppongi?”. Then, the deforming unit 133 transmits the deformed response message data to the output control unit 134. Further, when the response generation apparatus 100 uses the detected node ID “N5” because the message is input by the user U01 in response to this response message, the deforming unit 133 transmits the advertisement information to the advertisement bidding apparatus 40. Since there is no registration, it is assumed that a normal response message corresponding to the operation node ID “N8” is acquired. In such a case, the deforming unit 133 transforms the acquired response message “What kind of curry do you like?” To “What kind of curry do you like?”. Then, the deforming unit 133 transmits the deformed response message data to the output control unit 134.

さらに、この応答メッセージに対して、ユーザＵ０１によってメッセージが入力されたことにより、応答生成装置１００が検出ノードＩＤ「Ｎ９」を使用した場合に、変形部１３３は、検出ノードＩＤ「Ｎ９」に対する広告情報が広告入札装置４０に登録されていることにより、その広告情報に含まれる広告データを広告入札装置４０から取得する。具体的には、変形部１３３は、検出ノードＩＤ「Ｎ９」を広告入札装置４０へ通知することにより、対応する広告データを取得する。例えば、図４では、検出ノードＩＤ「Ｎ９」を登録している広告主が２つ存在する。ここで、広告入札装置４０の提示部４５は、変形部１３３によって設定された検索キーワード「キーマカレー」とユーザ特性「大阪弁」とを受け付け、これらに対応する広告情報を検索し、「Ｃ０１」を得る。そして、提示部４５は、広告データ「Ａ店のキーマカレーすごくおいしいです」を提示する。 Further, when the response generation apparatus 100 uses the detection node ID “N9” because the message is input by the user U01 in response to this response message, the transformation unit 133 advertises the detection node ID “N9”. When the information is registered in the advertisement bidding device 40, the advertisement data included in the advertisement information is acquired from the advertisement bidding device 40. Specifically, the deforming unit 133 notifies the advertisement bidding apparatus 40 of the detection node ID “N9”, thereby acquiring corresponding advertisement data. For example, in FIG. 4, there are two advertisers that register the detection node ID “N9”. Here, the presentation unit 45 of the advertisement bidding apparatus 40 receives the search keyword “Keima Curry” and the user characteristic “Osaka dialect” set by the transformation unit 133, searches for advertisement information corresponding to these, and searches for “C01”. Get. Then, the presenting unit 45 presents the advertisement data “A store's Keema curry is very delicious”.

変形部１３３は、提示部４５によって提示された広告データ「Ａ店のキーマカレーすごくおいしいです」を取得し、ユーザ特性「大阪弁」に合わせて大阪弁の文章に変形する。具体的には、変形部１３３は、「Ａ店のキーマカレーすごくおいしいです」を「Ａ店のキーマカレーめっちゃうまいで！」と変形する。そして、変形部１３３は、変形した広告データを出力制御部１３４へ送信する。 The deforming unit 133 acquires the advertisement data “Ama ’s Keema Curry is very delicious” presented by the presenting unit 45, and transforms it into a sentence of Osaka dialect according to the user characteristic “Osaka dialect”. Specifically, the transformation unit 133 transforms “A store's keema curry is very delicious” to “A store's keema curry is really good!”. Then, the deforming unit 133 transmits the deformed advertisement data to the output control unit 134.

なお、変形部１３３は、変形した応答メッセージ応じてイントネーションも変形してもよい。例えば、変形部１３３は、ユーザ特性「大阪弁」に基づいて変形した場合には、変形した応答メッセージを「大阪弁」のイントネーションに変形する。 The deforming unit 133 may also deform intonation according to the deformed response message. For example, when the deforming unit 133 deforms based on the user characteristic “Osaka dialect”, the deforming response message transforms the deformed response message into an “Osaka dialect” intonation.

出力制御部１３４は、変形部１３３によって変形された応答メッセージの応答データを受け付けた場合に、受け付けた応答データを音声合成装置７０に送信して中間表現（例えば、再生波形のデータ）を受信する。そして、出力制御部１３４は、受信した中間表現や応答データのテキストを応答生成装置１００へ送信する。また、応答生成装置１００は、受信した中間表現や応答データのテキストをユーザ端末１０へ送信する。 When the output control unit 134 receives the response data of the response message transformed by the transformation unit 133, the output control unit 134 transmits the received response data to the speech synthesizer 70 and receives an intermediate representation (for example, reproduced waveform data). . Then, the output control unit 134 transmits the received intermediate representation and response data text to the response generation device 100. In addition, the response generation apparatus 100 transmits the received intermediate representation and response data text to the user terminal 10.

〔４．応答生成処理フロー〕
次に、図８を用いて、実施形態にかかる応答生成装置１００による応答生成処理について説明する。図８は、実施形態にかかる応答生成装置１００による応答生成処理手順を示すシーケンス図である。 [4. Response generation process flow)
Next, a response generation process performed by the response generation apparatus 100 according to the embodiment will be described with reference to FIG. FIG. 8 is a sequence diagram illustrating a response generation processing procedure performed by the response generation apparatus 100 according to the embodiment.

図８に示すように、まず、ユーザ端末１０が、ユーザの発話に関する音声を受信する（ステップＳ２０１）。そして、ユーザ端末１０は、受信した音声データを音声認識装置２０へ送信する（ステップＳ２０２）。音声認識装置２０は、音声データを受信した場合に、かかる音声データをテキストデータに変換すると共に、音の特徴情報としてユーザ特性を特定する（ステップＳ２０３）。そして、音声認識装置２０は、音声テキストと特定したユーザ特性をユーザ端末１０へ送信する（ステップＳ２０４）。ユーザ端末１０は、受け付けた音声テキストとユーザ特性を応答生成装置１００へ送信する（ステップＳ２０５）。 As shown in FIG. 8, first, the user terminal 10 receives a voice related to the user's utterance (step S201). Then, the user terminal 10 transmits the received voice data to the voice recognition device 20 (step S202). When the voice recognition device 20 receives voice data, the voice recognition device 20 converts the voice data into text data and specifies user characteristics as sound characteristic information (step S203). And the speech recognition apparatus 20 transmits the user characteristic specified with the speech text to the user terminal 10 (step S204). The user terminal 10 transmits the received voice text and user characteristics to the response generation device 100 (step S205).

ここで、応答生成装置１００は、会話の進め方の傾向としてユーザ特性を特定する処理を行う（ステップＳ２０６）。例えば、応答生成装置１００の特定部１３２は、上述した第１〜第６の特定方法によってユーザ特性を特定する。なお、音声認識装置２０によってユーザ特性が特定される場合には、応答生成装置１００の特定部１３２は、ユーザ特性の特定処理を行わなくてもよい。 Here, the response generation device 100 performs a process of specifying the user characteristics as the tendency of the conversation to proceed (step S206). For example, the specifying unit 132 of the response generation device 100 specifies the user characteristics by the first to sixth specifying methods described above. When the user characteristic is specified by the speech recognition apparatus 20, the specifying unit 132 of the response generation apparatus 100 may not perform the user characteristic specifying process.

続いて、応答生成装置１００の変形部１３３は、ユーザ特性を受け付けると、出力する応答メッセージとなる広告情報を取得する（ステップＳ２０７）。なお、広告情報が登録されていない場合には、通常の応答メッセージを取得する。そして、変形部１３３は、取得した応答メッセージの内容をユーザ特性に基づいて変形し（ステップＳ２０８）、変形した応答メッセージがユーザ端末１０によって出力されるよう出力制御する（ステップＳ２０９）。 Subsequently, when receiving the user characteristic, the deforming unit 133 of the response generation apparatus 100 acquires advertisement information that is a response message to be output (step S207). When advertisement information is not registered, a normal response message is acquired. Then, the deformation unit 133 deforms the content of the acquired response message based on the user characteristics (step S208), and performs output control so that the deformed response message is output by the user terminal 10 (step S209).

〔５．変形例〕
上述した実施例にかかる応答生成装置１００は、上記実施形態以外にも種々の異なる形態にて実施されてよい。そこで、以下では、応答生成装置１００の他の実施例について説明する。 [5. (Modification)
The response generation apparatus 100 according to the above-described example may be implemented in various different forms other than the above embodiment. Accordingly, another embodiment of the response generation device 100 will be described below.

〔５−１．変形処理（１）〕
上述してきた応答生成装置１００の変形部１３３は、応答メッセージが通常の応答文であっても、また、広告文であってもユーザ特性に基づき、その内容を変形する例を示した。しかし、変形部１３３は、特定されたユーザ特性に対応する音声の特徴とは異なる音声の特徴となるよう、応答メッセージとしての広告情報の内容を変形してもよい。例えば、音声認識装置２０によってユーザ特性「標準語」が特定された場合に、変形部１３３は、広告文のみを「標準語」とは異なる方言の文章に変形する。この点について、図９を用いて説明する。 [5-1. Deformation process (1)]
The deforming unit 133 of the response generating apparatus 100 described above has shown an example in which the content of the response message is modified based on the user characteristics regardless of whether the response message is a normal response text or an advertisement text. However, the deforming unit 133 may modify the content of the advertisement information as the response message so that the voice feature is different from the voice feature corresponding to the specified user characteristic. For example, when the user characteristic “standard language” is specified by the voice recognition device 20, the deforming unit 133 transforms only the advertisement text into a dialect sentence different from the “standard language”. This point will be described with reference to FIG.

図９は変形例にかかる判定ツリーの一例を示す図である。例えば、所定のユーザとして、ユーザＵ０２がメッセージを入力したことにより、音声認識装置２０は、ユーザ特性「標準語」を特定したものとする。そして、応答生成装置１００は、音声認識装置２０から音声テキストと、ユーザ特性「標準語」を受け付けたことにより、検出ノードＩＤ「Ｎ１」を使用したとする。また、応答生成装置１００の受信部１３１は、ユーザ特性「大阪弁」を変形部１３３へ送信する。 FIG. 9 is a diagram illustrating an example of a determination tree according to a modification. For example, it is assumed that the voice recognition device 20 specifies the user characteristic “standard word” when the user U02 inputs a message as a predetermined user. Then, it is assumed that the response generation apparatus 100 uses the detection node ID “N1” by receiving the voice text and the user characteristic “standard word” from the voice recognition apparatus 20. In addition, the reception unit 131 of the response generation device 100 transmits the user characteristic “Osaka dialect” to the deformation unit 133.

ここで、変形部１３３は、応答生成装置１００によって検出ノードＩＤ「Ｎ１」が使用されたことにより、検出ノードＩＤ「Ｎ１」に対する広告情報が広告入札装置４０に登録されている場合には、その広告情報に含まれる広告データを取得する。ここでは、広告情報の登録がなく、動作ノードＩＤ「Ｎ３」に対応する通常の応答メッセージを取得したとする。変形部１３３は、取得した応答メッセージの変形は行わず、そのまま出力制御部１３４へ送信する。さらに、この応答メッセージに対して、ユーザＵ０２によってメッセージが入力されたことにより、応答生成装置１００が検出ノードＩＤ「Ｎ５」を使用した場合に、変形部１３３は、広告入札装置４０に広告情報の登録がないため、動作ノードＩＤ「Ｎ８」に対応する通常の応答メッセージを取得したとする。変形部１３３は、取得した応答メッセージの変形は行わず、そのまま出力制御部１３４へ送信する。 Here, when the advertisement information for the detection node ID “N1” is registered in the advertisement bidding device 40 due to the use of the detection node ID “N1” by the response generation device 100, the transformation unit 133 Get advertisement data included in advertisement information. Here, it is assumed that the advertisement information is not registered and a normal response message corresponding to the operation node ID “N3” is acquired. The deforming unit 133 does not modify the acquired response message and transmits the response message to the output control unit 134 as it is. Further, when the response generation apparatus 100 uses the detected node ID “N5” due to the input of the message by the user U02 to the response message, the deforming unit 133 transmits the advertisement information to the advertisement bidding apparatus 40. Since there is no registration, it is assumed that a normal response message corresponding to the operation node ID “N8” is acquired. The deforming unit 133 does not modify the acquired response message and transmits the response message to the output control unit 134 as it is.

さらに、この応答メッセージに対して、ユーザＵ０２によってメッセージが入力されたことにより、応答生成装置１００が検出ノードＩＤ「Ｎ９」を使用した場合に、変形部１３３は、検出ノードＩＤ「Ｎ９」に対する広告情報が広告入札装置４０に登録されていることにより、その広告情報に含まれる広告データを取得する。具体的には、変形部１３３は、検出ノードＩＤ「Ｎ９」を広告入札装置４０へ通知することにより、対応する広告データを取得する。例えば、変形部１３３は、広告ＩＤ「Ｃ０１」に対応する広告データ「Ａ店のキーマカレーすごくおいしいです」を取得したとする。ここで、変形部１３３は、例えば、ユーザ特性「標準語」とは異なる「大阪弁」に広告データを変形する。具体的には、変形部１３３は、広告データ「Ａ店のキーマカレーすごくおいしいです」を「Ａ店のキーマカレーめっちゃうまいで！」と変形する。そして、変形部１３３は、変形した広告データを出力制御部１３４へ送信する。 Further, when the response generation apparatus 100 uses the detection node ID “N9” because the message is input by the user U02 in response to this response message, the transformation unit 133 advertises the detection node ID “N9”. When the information is registered in the advertisement bidding device 40, the advertisement data included in the advertisement information is acquired. Specifically, the deforming unit 133 notifies the advertisement bidding apparatus 40 of the detection node ID “N9”, thereby acquiring corresponding advertisement data. For example, it is assumed that the deforming unit 133 has acquired the advertisement data “Kima curry of store A is very delicious” corresponding to the advertisement ID “C01”. Here, the transformation unit 133 transforms the advertisement data into “Osaka dialect” different from the user characteristic “standard language”, for example. Specifically, the deforming unit 133 transforms the advertisement data “A store's keema curry is very delicious” into “A store's keema curry is really good!”. Then, the deforming unit 133 transmits the deformed advertisement data to the output control unit 134.

なお、ここでは、ユーザＵ０２の使用する方言とは異なる方言として「大阪弁」を例示したが、これに限るものではない。例えば、変形部１３３は、ユーザ特性として「無駄話嫌い」を受け付けた場合には、広告文をあえて長い文章に変形してもよい。この場合、変形部１３３は、例えば、広告文が所定の製品を広告する内容であるなら、詳細情報や店舗情報等を追加することにより広告文を長く変形する。また、変形部１３３は、ユーザ特性として「落ち込んでいる」を受け付けた場合には、広告文をあえて友好的な明るい文章に変形したり、明るいイントネーションに変形する。 Here, “Osaka dialect” is exemplified as a dialect different from the dialect used by the user U02, but the dialect is not limited to this. For example, when the deforming unit 133 accepts “I don't like useless talk” as the user characteristic, it may intentionally transform the advertisement text into a long text. In this case, for example, if the advertisement text is content that advertises a predetermined product, the deforming unit 133 deforms the advertisement text longer by adding detailed information, store information, and the like. Also, when accepting “depressed” as the user characteristic, the deforming unit 133 intentionally transforms the advertisement text into a friendly bright text or transforms into a bright intonation.

このように、応答生成装置１００は、特定されたユーザ特性とは異なるユーザ特性に基づく変形を広告文に対してのみ行う。これにより、応答生成装置１００は、ユーザに対して広告を印象付けることができ、広告効果を高めることができる。例えば、応答生成装置１００とユーザとの標準語の会話の中で、所定の応答メッセージだけが大阪弁で出力されたとすると、その応答メッセージはユーザに対して印象に残る可能性が高いといえる。また、応答生成装置１００は、このような意外性を利用することにより、ユーザとの会話を盛り上げることもできる。 As described above, the response generation apparatus 100 performs only the transformation on the advertisement text based on the user characteristic different from the specified user characteristic. Thereby, the response generation apparatus 100 can impress an advertisement with respect to a user, and can improve an advertisement effect. For example, in a standard language conversation between the response generation device 100 and the user, if only a predetermined response message is output by the Osaka dialect, it can be said that the response message is likely to leave an impression on the user. Moreover, the response generation apparatus 100 can excite the conversation with the user by using such unexpectedness.

〔５−２．変形処理（２）〕
また、上記実施形態において応答生成装置１００の特定部１３２は、出力予定の広告情報を予測し、予測した広告情報の内容から、かかる広告情報の特徴を特定し、変形部１３３は、特定部１３２によって特定された広告情報の特徴に基づいて、応答メッセージを変形してもよい。この点について、図１０を用いて説明する。 [5-2. Deformation process (2)]
In the above embodiment, the specifying unit 132 of the response generation device 100 predicts the advertisement information to be output, specifies the feature of the advertisement information from the predicted content of the advertisement information, and the deforming unit 133 specifies the specifying unit 132. The response message may be deformed based on the feature of the advertisement information specified by. This point will be described with reference to FIG.

まず、特定部１３２は、応答生成装置１００とユーザとの会話の経過に基づいて出力予定の広告情報を予測する。具体的には、特定部１３２は、判定ツリーを用いて、応答生成装置１００によって使用されたノードの経過から、将来、応答生成装置１００によって使用されるノードを予測する。例えば、特定部１３２は、応答生成装置１００によってノードが使用されるたびに、使用されたノードを記憶する。そして、特定部１３２は、応答生成装置１００とユーザとの間でメッセージのやり取りが進むことにより、所定数のノードを記憶した時点で、応答生成装置１００によって将来使用される検出ノードを予測する。 First, the specifying unit 132 predicts the advertisement information scheduled to be output based on the progress of the conversation between the response generation device 100 and the user. Specifically, the identifying unit 132 predicts a node to be used by the response generation device 100 in the future from the progress of the nodes used by the response generation device 100 using the determination tree. For example, each time the node is used by the response generation device 100, the specifying unit 132 stores the used node. The specifying unit 132 predicts a detection node to be used in the future by the response generation device 100 when a predetermined number of nodes are stored as a result of message exchange between the response generation device 100 and the user.

そして、特定部１３２は、予測した検出ノードに対する広告情報が広告入札装置４０に登録されている場合には、その広告情報を出力予定の広告情報とし、その広告情報に含まれる広告データを取得する。そして、特定部１３２は、取得した広告データの内容から、かかる広告データの特徴を特定し、特定した特徴と予測したノードのノードＩＤとを変形部１３３に通知する。 Then, when the advertisement information for the predicted detection node is registered in the advertisement bidding apparatus 40, the specifying unit 132 sets the advertisement information as the advertisement information scheduled to be output, and acquires advertisement data included in the advertisement information. . Then, the specifying unit 132 specifies the feature of the advertisement data from the content of the acquired advertisement data, and notifies the deforming unit 133 of the specified feature and the node ID of the predicted node.

ここで、図１０を用いて説明する。図１０は変形例にかかる判定ツリーの一例を示す図である。例えば、特定部１３２は、ノードＩＤ「Ｎ１」が使用された時点で、ノードＩＤ「０１」よりも先に存在するノード（すなわち、ノードＩＤ「０１」を根ノードとした際の葉ノード）を抽出し、抽出したノードに、広告検索を行わせるノードが含まれるか判定する。図１０に示す例では、特定部１３２は、ノードＩＤ「Ｎ９」が示すノードを特定する。そして、特定部１３２は、特定したノードのノードＩＤ「Ｎ９」を広告入札装置４０へ通知し、広告入札装置４０による検索によって、ノードＩＤ「Ｎ９」に対応する広告情報が存在する場合には、その広告情報を出力予定の広告情報とする。ここでは、広告入札装置４０によって、出力予定の広告情報として広告データ「Ｂ店のキーマカレーめっちゃうまいで！」が検索されたとし、特定部１３２は、かかる広告データを取得する。 Here, it demonstrates using FIG. FIG. 10 is a diagram illustrating an example of a determination tree according to a modification. For example, when the node ID “N1” is used, the specifying unit 132 selects a node existing before the node ID “01” (that is, a leaf node when the node ID “01” is the root node). Extraction is performed, and it is determined whether or not the extracted node includes a node for performing an advertisement search. In the example illustrated in FIG. 10, the specifying unit 132 specifies the node indicated by the node ID “N9”. Then, the specifying unit 132 notifies the advertisement bid device 40 of the node ID “N9” of the specified node, and when the advertisement information corresponding to the node ID “N9” exists by the search by the advertisement bid device 40, The advertisement information is set as the advertisement information scheduled to be output. Here, it is assumed that the advertisement bidding device 40 retrieves the advertisement data “Kima Curry at store B is so good!” As the advertisement information to be output, and the specifying unit 132 acquires the advertisement data.

ここで、特定部１３２は、取得した広告データ「Ｂ店のキーマカレーめっちゃうまいで！」の特徴として「大阪弁」であることを特定したとする。特定部１３２は、特定した広告情報の特徴「大阪弁」と、ノードＩＤ「Ｎ９」とを変形部１３３に通知する。なお、特定部１３２は、必ずしも方言等の地域に関する特徴を特定する必要はない。かかる特徴は、例えば、性別、性格、メッセージ長さ、抑揚等に関する特徴であってもよい。 Here, it is assumed that the specifying unit 132 specifies “Osaka dialect” as a characteristic of the acquired advertisement data “Kima Curry at Store B is so good!”. The specifying unit 132 notifies the transformation unit 133 of the feature “Osaka dialect” of the specified advertisement information and the node ID “N9”. Note that the specifying unit 132 does not necessarily need to specify features related to a region such as a dialect. Such features may be, for example, features related to gender, personality, message length, intonation, and the like.

変形部１３３は、特定部１３２から広告情報の特徴とノードＩＤを受け付けた場合に、出力予定の応答メッセージのうち、特定部１３２によって予測された出力予定の広告情報が出力されるまでの応答回数が所定の閾値を超える応答メッセージの内容を広告情報の特徴に基づいて変形する。 When the transformation unit 133 receives the feature and the node ID of the advertisement information from the specifying unit 132, the number of responses until the output of the advertisement information predicted to be output by the specifying unit 132 is output from the response message scheduled to be output. The content of the response message that exceeds a predetermined threshold is transformed based on the feature of the advertisement information.

例えば、変形部１３３は、出力予定の広告情報が出力されるまでの応答回数に閾値「１回」を設定しているものとする。上述したように、特定部１３２は、ノードＩＤ「Ｎ１」の時点で、将来は、応答生成装置１００によってノードＩＤ「Ｎ９」が使用されると予測したとすると、変形部１３３は、判定ツリーを用いることにより、かかる閾値「１回」を超える応答メッセージとして、ノードＩＤ「Ｎ１」からカウントすることによりノードＩＤ「Ｎ８」を特定する。なお、変形部１３３は、ノードＩＤ「Ｎ１」からカウントして１回目の応答メッセージであるノードＩＤ「Ｎ３」は閾値に含まれるため除外する。 For example, it is assumed that the deforming unit 133 sets the threshold “one time” for the number of responses until the advertisement information scheduled to be output is output. As described above, if the specifying unit 132 predicts that the node ID “N9” will be used by the response generation apparatus 100 in the future at the time of the node ID “N1”, the deforming unit 133 displays the determination tree. By using it, the node ID “N8” is specified by counting from the node ID “N1” as a response message exceeding the threshold “one time”. The deforming unit 133 counts from the node ID “N1” and excludes the node ID “N3” that is the first response message because it is included in the threshold value.

ここで、変形部１３３は、特定したノードＩＤ「Ｎ８」に対応する応答メッセージ「どんなカレーが好き？」を取得し、広告情報の特徴「大阪弁」に合わせて「どんなカレーが好きなん？」と変形する。また、変形部１３３は、文章の変形に伴い、イントネーションも変形してもよい。 Here, the transformation unit 133 acquires the response message “what kind of curry do you like?” Corresponding to the identified node ID “N8”, and “what kind of curry do you like?” According to the feature “Osaka dialect” of the advertisement information. And deformed. Moreover, the deformation | transformation part 133 may also deform intonation with the deformation | transformation of a text.

このように、応答生成装置１００は、出力予定の広告情報を予測し、予測した広告情報の特徴を特定し、特定した特徴に基づいて、予測した広告情報までの出力予定の応答メッセージの内容を変形する。これにより、応答生成装置１００は、自然な流れでユーザとの会話の中に、広告情報を応答メッセージとして出力することができる。 In this way, the response generation apparatus 100 predicts the advertisement information scheduled to be output, identifies the characteristics of the predicted advertisement information, and, based on the identified characteristics, determines the contents of the response message to be output up to the predicted advertisement information. Deform. Thereby, the response generation device 100 can output the advertisement information as a response message in the conversation with the user in a natural flow.

なお、かかる変形例では、広告情報の特徴に合わせて応答メッセージを変形する例を示したが、広告情報の特徴とは異なる特徴に変形してもよい。例えば、広告情報の特徴「明るい抑揚」に対して、変形部１３３は、その広告情報までの出力予定の応答メッセージの内容を「暗い抑揚」に変形する。これにより、応答生成装置１００は、ユーザに対して広告情報を印象付けることができ、広告効果を高めることができる。 In this modification, the example in which the response message is modified in accordance with the feature of the advertisement information has been shown, but the response message may be modified to a feature different from the feature of the advertisement information. For example, for the feature “bright inflection” of the advertisement information, the transformation unit 133 transforms the content of the response message scheduled to be output up to the advertisement information into “dark inflection”. Thereby, the response generation device 100 can impress advertisement information on the user, and can enhance the advertisement effect.

〔５−３．検索処理〕
上述してきた応答生成装置１００の変形部１３３は、応答メッセージを変形するにあたって、検出ノードに対する広告情報が広告入札装置４０に登録されている場合には、かかる広告情報を変形対象として取得し、登録されていない場合には、判定ツリーに登録されている通常の応答メッセージを変形対象として取得する例を示した。しかし、変形部１３３は、遷移確立に基づいて、変形する応答メッセージを取得してもよい。例えば、応答生成装置１００によって、所定の検出ノードに対して、通常の応答メッセージに対応する動作ノードと、広告用の動作ノードが遷移確立と共に紐付られているとする。ここで、変形部１３３は、遷移確立に基づき、通常の応答メッセージに対応する動作ノードを選択した場合には、その動作ノードに対応する応答メッセージを変形し、広告用の動作ノードを選択した場合には、その広告用の動作ノードに含めるための広告情報を広告入札装置４０から取得する。取得方法は、上述してきたように、検出ノードＩＤを用いたものであってもよいし、検索キーワード及びユーザ特性を用いたものであってもよい。 [5-3. Search process)
When transforming the response message, the deforming unit 133 of the response generation device 100 described above acquires the advertisement information as a modification target and registers it when the advertisement information for the detection node is registered in the advertisement bidding device 40. In the case where it has not been done, an example is shown in which a normal response message registered in the determination tree is acquired as a transformation target. However, the deformation unit 133 may acquire a response message to be deformed based on the transition establishment. For example, it is assumed that the response generation apparatus 100 associates an operation node corresponding to a normal response message and an operation node for advertisement with a transition establishment with respect to a predetermined detection node. Here, when the transformation unit 133 selects the operation node corresponding to the normal response message based on the transition establishment, the transformation unit 133 transforms the response message corresponding to the operation node and selects the operation node for advertisement. The advertisement information to be included in the advertisement operation node is acquired from the advertisement bidding apparatus 40. As described above, the acquisition method may use a detection node ID, or may use a search keyword and user characteristics.

〔５−４．広告主による変形処理〕
上記実施形態では、変形部１３３によって応答メッセージの内容が変形される例を示した。しかし、広告主がユーザ特性に応じた広告情報を複数入札しておくことで、変形部１３３は、受け付けたユーザ特性と一致するユーザ特性に対応付けられた広告データを取得し、変形することなく出力制御させてもよい。 [5-4. Advertiser transformation process)
In the embodiment described above, an example in which the content of the response message is deformed by the deforming unit 133 has been described. However, when the advertiser bids a plurality of pieces of advertisement information corresponding to the user characteristics, the deforming unit 133 acquires the advertisement data associated with the user characteristics that match the received user characteristics, and does not modify the information. The output may be controlled.

例えば、広告主は、広告情報として大阪弁用の広告データと、無駄話好きな人用の広告データを入札しておく。ここで、変形部１３３は、ユーザ特性「無駄話好き」を受け付けたとする。そして、広告入札装置４０は、変形部１３３から送信されたユーザ特性「無駄話好き」に基づいて、かかる広告主の広告情報に含まれる広告データのうち、無駄話好きな人用の広告データ取得し、変形部１３３に提示する。そして、変形部１３３は、その広告データを取得する。このような広告データは、特定部１３２によって特定されたユーザ特性に対応しているので、変形部１３３が変形を行う必要はない。したがって、変形部１３３は、取得した広告データを変形することなく、出力制御部１３４へ送信する。 For example, the advertiser bids advertisement data for Osaka dialect and advertisement data for people who like useless talk as advertisement information. Here, it is assumed that the deforming unit 133 has received the user characteristic “I like useless talk”. Then, the advertisement bidding device 40 acquires advertisement data for people who like useless talk among the advertisement data included in the advertisement information of the advertiser based on the user characteristic “like useless talk” transmitted from the deforming unit 133. And present it to the deformation unit 133. And the deformation | transformation part 133 acquires the advertisement data. Since such advertisement data corresponds to the user characteristics specified by the specifying unit 132, the deforming unit 133 does not need to be deformed. Therefore, the transformation unit 133 transmits the acquired advertisement data to the output control unit 134 without transformation.

〔５−５．装置構成（１）〕
上記実施形態では、応答生成装置１００は、音声認識装置２０によって特定された音の特徴情報としてのユーザ特性に基づいて、変形処理を行う例を示した。しかし、応答生成装置１００が音の特徴情報としてのユーザ特性を特定してもよい。この場合、応答生成装置１００は、音声認識装置２０の音声データ解析機能を有することになる。 [5-5. Device configuration (1)]
In the said embodiment, the response production | generation apparatus 100 showed the example which performs a deformation | transformation process based on the user characteristic as the characteristic information of the sound identified by the speech recognition apparatus 20. However, the response generation apparatus 100 may specify user characteristics as sound feature information. In this case, the response generation device 100 has the voice data analysis function of the voice recognition device 20.

〔５−６．装置構成（２）〕
また、上記実施形態では、音声認識装置２０または応答生成装置１００によって、ユーザ特性が特定させる例を示した。しかしながら、実施形態は、これに限定されるものではない。例えば、ユーザ端末１０は、上述した特定処理を行い、特定したユーザ特性を応答生成装置１００に送信する。そして、応答生成装置１００は、ユーザ端末１０から受け付けたユーザ特性に基づいて、上述した変形処理を行ってもよい。 [5-6. Device configuration (2)]
Moreover, in the said embodiment, the example in which a user characteristic is specified by the speech recognition apparatus 20 or the response generation apparatus 100 was shown. However, the embodiment is not limited to this. For example, the user terminal 10 performs the specifying process described above, and transmits the specified user characteristic to the response generation device 100. Then, the response generation device 100 may perform the above-described modification process based on the user characteristics received from the user terminal 10.

〔５−７．ユーザ端末以外の例〕
上記実施形態では、ユーザはユーザ端末１０を用いて、応答生成装置１００と会話を行う例を示した。しかし、ユーザ端末１０の有する応答生成装置１００との対話機能が、会話を行うロボットに搭載されていてもよい。これにより、かかるロボットがユーザに代わって応答生成装置１００と会話を行うことが実現できる。 [5-7. Example other than user terminal)
In the above-described embodiment, an example has been described in which the user performs a conversation with the response generation apparatus 100 using the user terminal 10. However, the interactive function with the response generation device 100 of the user terminal 10 may be installed in a robot that performs conversation. Thereby, it can be realized that the robot has a conversation with the response generation device 100 on behalf of the user.

〔５−８．プログラム〕
また、上述してきた各実施形態にかかる応答生成装置１００は、例えば図１１に示すような構成のコンピュータ１０００によって実現される。以下、応答生成装置１００を例に挙げて説明する。図１１は、応答生成装置１００の機能を実現するコンピュータ１０００の一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 [5-8. program〕
Further, the response generation apparatus 100 according to each embodiment described above is realized by a computer 1000 having a configuration as shown in FIG. 11, for example. Hereinafter, the response generation apparatus 100 will be described as an example. FIG. 11 is a hardware configuration diagram illustrating an example of a computer 1000 that implements the function of the response generation apparatus 100. The computer 1000 includes a CPU 1100, RAM 1200, ROM 1300, HDD 1400, communication interface (I / F) 1500, input / output interface (I / F) 1600, and media interface (I / F) 1700.

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400 and controls each unit. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started up, a program depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、通信網５０を介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータを、通信網５０を介して他の機器へ送信する。 The HDD 1400 stores programs executed by the CPU 1100, data used by the programs, and the like. The communication interface 1500 receives data from other devices via the communication network 50 and sends the data to the CPU 1100, and transmits the data generated by the CPU 1100 to other devices via the communication network 50.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを、入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls an output device such as a display and a printer and an input device such as a keyboard and a mouse via the input / output interface 1600. The CPU 1100 acquires data from the input device via the input / output interface 1600. Further, the CPU 1100 outputs the generated data to the output device via the input / output interface 1600.

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The media interface 1700 reads a program or data stored in the recording medium 1800 and provides it to the CPU 1100 via the RAM 1200. The CPU 1100 loads such a program from the recording medium 1800 onto the RAM 1200 via the media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. Etc.

例えば、コンピュータ１０００が実施形態にかかる応答生成装置１００として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、制御部１３０の機能を実現する。また、ＨＤＤ１４００には、判定情報記憶部内のデータが格納される。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを、記録媒体１８００から読み取って実行するが、他の例として、他の装置から、通信網５０を介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the response generation apparatus 100 according to the embodiment, the CPU 1100 of the computer 1000 implements the function of the control unit 130 by executing a program loaded on the RAM 1200. The HDD 1400 stores data in the determination information storage unit. The CPU 1100 of the computer 1000 reads these programs from the recording medium 1800 and executes them, but as another example, these programs may be acquired from other devices via the communication network 50.

〔５−９．その他〕
上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 [5-9. Others]
Of the processes described in the above embodiment, all or part of the processes described as being automatically performed can be performed manually, or all of the processes described as being performed manually or A part can be automatically performed by a known method. In addition, the processing procedures, specific names, and information including various data and parameters shown in the document and drawings can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured.

また、上述してきた各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Moreover, each embodiment mentioned above can be combined suitably in the range which does not contradict a process content.

〔６．効果〕
上述してきたように、実施形態にかかる応答生成装置１００は、特定部１３２と、変形部１３３と、出力制御部１３４とを有する。特定部１３２は、対話エージェントシステムとユーザとの会話の特徴に関する特徴情報を特定する。変形部１３３は、特定部１３２によって特定された特徴情報に応じて応答メッセージの内容を変形する。出力制御部１３４は、変形部１３３によって内容が変形された応答メッセージを出力するよう制御する。 [6. effect〕
As described above, the response generation device 100 according to the embodiment includes the specifying unit 132, the deformation unit 133, and the output control unit 134. The specifying unit 132 specifies feature information related to the features of the conversation between the dialog agent system and the user. The deforming unit 133 deforms the content of the response message according to the feature information specified by the specifying unit 132. The output control unit 134 controls to output a response message whose content has been transformed by the transformation unit 133.

これにより、実施形態にかかる応答生成装置１００は、ユーザと適切な会話を行うことができる。具体的には、応答生成装置１００は、ユーザの特性に応じて応答メッセージを変形することによりユーザと自然な会話を行うことができ、また、会話に対するユーザの満足度を高めることができる。 Thereby, the response generation device 100 according to the embodiment can perform an appropriate conversation with the user. Specifically, the response generation device 100 can perform a natural conversation with the user by transforming the response message according to the characteristics of the user, and can increase the user's satisfaction with the conversation.

また、実施形態にかかる特定部１３２は、特徴情報として、ユーザに関するユーザ情報を特定し、変形部１３３は、特定部によって特定されたユーザ情報に基づいて、応答メッセージの内容を変形する。 The specifying unit 132 according to the embodiment specifies user information about the user as the feature information, and the deforming unit 133 deforms the content of the response message based on the user information specified by the specifying unit.

これにより、実施形態にかかる応答生成装置１００は、ユーザと適切な会話を行うことができる。 Thereby, the response generation device 100 according to the embodiment can perform an appropriate conversation with the user.

また、実施形態にかかる特定部１３２は、特徴情報として、ユーザの音声の特徴を特定し、変形部１３３は、出力時における音声の特徴が、特定部１３２によって特定されたユーザ情報に対応する音声の特徴とは異なる音声の特徴となるように、応答メッセージとしての広告情報の内容を変形する。 In addition, the specifying unit 132 according to the embodiment specifies the feature of the user's voice as the feature information, and the deforming unit 133 has the voice feature at the time of output corresponding to the user information specified by the specifying unit 132. The content of the advertisement information as the response message is transformed so as to have a voice characteristic different from the above characteristic.

これにより、実施形態にかかる応答生成装置１００は、ユーザに対して広告情報を印象付けることができるので、広告効果を高めることができる。 Thereby, since the response generation apparatus 100 according to the embodiment can impress the advertisement information on the user, the advertisement effect can be enhanced.

また、実施形態にかかる特定部１３２は、特徴情報として、ユーザの音声の特徴を特定し、変形部１３３は、出力時における音声の特徴が、特定部１３２によって特定されたユーザ情報に対応する音声の特徴と同様の特徴となるように、応答メッセージとしての広告情報の内容を変形する。 In addition, the specifying unit 132 according to the embodiment specifies the feature of the user's voice as the feature information, and the deforming unit 133 has the voice feature at the time of output corresponding to the user information specified by the specifying unit 132. The content of the advertisement information as the response message is modified so as to have the same characteristics as the above-described characteristics.

これにより、実施形態にかかる応答生成装置１００は、自然な流れでユーザとの会話の中に、広告情報を応答メッセージとして出力することができる。 Thereby, the response generation apparatus 100 according to the embodiment can output the advertisement information as a response message in the conversation with the user in a natural flow.

応答メッセージとして登録されている所定の広告情報であって、出力予定の広告情報を予測し、予測した広告情報の内容から当該広告情報の特徴を特定し、変形部１３３は、特定部１３２によって特定された広告情報の特徴に基づいて、応答メッセージの内容を変形する。 Predetermined advertisement information registered as a response message, predicting the advertisement information scheduled to be output, specifying the feature of the advertisement information from the content of the predicted advertisement information, and the deforming unit 133 is specified by the specifying unit 132 The content of the response message is transformed based on the feature of the advertisement information.

また、実施形態にかかる変形部１３３は、出力予定の応答メッセージのうち、特定部１３２によって予測された広告情報が出力されるまでの応答回数が所定の閾値以下である場合に、応答メッセージの内容を変形する。 Further, the deformation unit 133 according to the embodiment includes the content of the response message when the number of responses until the advertisement information predicted by the specifying unit 132 is output is equal to or less than a predetermined threshold among the response messages scheduled to be output. Transform.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 As described above, some of the embodiments of the present application have been described in detail with reference to the drawings. However, these are merely examples, and various modifications, including the aspects described in the disclosure section of the invention, based on the knowledge of those skilled in the art It is possible to implement the present invention in other forms with improvements.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、特定部は、特定手段や特定回路に読み替えることができる。 In addition, the “section (module, unit)” described above can be read as “means” or “circuit”. For example, the specifying unit can be read as specifying means or a specific circuit.

１０ユーザ端末
２０音声認識装置
３０広告主端末
４０広告入札装置
４２広告情報記憶部
４４入札受付部
４５提示部
１００応答生成装置
１２０判定情報記憶部
１３０制御部
１３１受信部
１３２特定部
１３３変形部
１３４出力制御部 DESCRIPTION OF SYMBOLS 10 User terminal 20 Voice recognition apparatus 30 Advertiser terminal 40 Advertisement bid apparatus 42 Advertisement information storage part 44 Bid acceptance part 45 Presentation part 100 Response generation apparatus 120 Judgment information storage part 130 Control part 131 Reception part 132 Identification part 133 Deformation part 134 Output Control unit

Claims

Based on the number of uses of the node corresponding to the conversation with the conversation agent system and the user, a front Symbol specifying unit which specifies the tendency of how to proceed the user's conversation,
A deforming unit that deforms the content of the response message in accordance with the tendency of the user's conversation progress specified by the specifying unit;
An output control unit for controlling to output a response message whose content has been transformed by the transformation unit;
A response generation apparatus comprising:

The specifying unit specifies user characteristics according to a tendency of the user's conversation progress,
The deforming unit deforms the content of the response message based on the user characteristic specified by the specifying unit.
The response generation device according to claim 1.

Before SL deformations, characteristics of the audio during output, the so that the features of different audio features of the sound corresponding to the user characteristics specified from the characteristics of the voice of the user, the advertisement information as a response message Transform the content,
The response generation apparatus according to claim 2.

Before SL deformation part, speech features during output, so that the same characteristics and features of the voice corresponding to the user characteristics specified from the feature of the voice of the user, the content of the advertisement information as a response message Deform,
The response generation apparatus according to claim 2.

The identification unit is predetermined advertisement information registered as a response message, predicts the advertisement information scheduled to be output, identifies the feature of the advertisement information from the content of the predicted advertisement information,
The deformation unit deforms the content of the response message based on the feature of the advertisement information specified by the specifying unit.
The response generation apparatus according to claim 1, wherein the response generation apparatus is a response generation apparatus.

The transforming unit transforms the content of the response message when the number of responses until the advertisement information predicted by the specifying unit is output is equal to or less than a predetermined threshold among the response messages scheduled to be output.
The response generation apparatus according to claim 5.

A response generation method executed by a computer,
Based on the number of uses of the node corresponding to the conversation with the conversation agent system and the user, and a specifying step of specifying the trend of how to proceed conversation before Symbol user,
A transformation step of transforming the content of the response message according to the tendency of the user's conversation progress specified by the identification step;
An output control step for controlling to output a response message whose content has been transformed by the transformation step;
A response generation method characterized by comprising:

Based on the number of uses of the node corresponding to the conversation with the conversation agent system and the user, and a specific procedure for identifying trends of how to proceed conversation before Symbol user,
A transformation procedure for transforming the content of the response message according to the tendency of the user's conversation progress specified by the specific procedure;
An output control procedure for controlling to output a response message whose content has been transformed by the transformation procedure;
A response generation program for causing a computer to execute.