JP2018045413A

JP2018045413A - Information processing device, information processing method, and program

Info

Publication number: JP2018045413A
Application number: JP2016179229A
Authority: JP
Inventors: 晃平菅原; Kohei Sugawara; 隼人小林; Hayato Kobayashi; 達洋丹羽; Tatsuhiro Niwa; 清水　徹; Toru Shimizu; 徹清水; 伸裕鍜治; Nobuhiro Kaji; 伸幸清水; Nobuyuki Shimizu
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2016-09-14
Filing date: 2016-09-14
Publication date: 2018-03-22
Anticipated expiration: 2036-09-14
Also published as: JP6482512B2

Abstract

PROBLEM TO BE SOLVED: To provide an information processing device, information processing method, and program that improve the satisfaction of a user with a message which is transmitted to a terminal device of the user.SOLUTION: An information processing device includes a reception unit, a generation unit, and a transmission unit. The reception unit receives a user message from a terminal device of a user. The generation unit generates a candidate message including candidates narrowed down using a candidate selection model learned by reinforcement learning. The transmission unit transmits the candidate message generated by the generation unit, to the terminal device.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、情報処理方法、およびプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program.

従来、ユーザの端末装置から送信されたメッセージに対して、質問文をユーザの端末装置に送信するにあたり、質問文を送信する前に、複数のコンテンツ集合からコンテンツを選択してユーザの端末装置にコンテンツを送信する情報処理装置が知られている（特許文献１参照）。 Conventionally, in transmitting a question message to a user terminal device in response to a message transmitted from a user terminal device, before transmitting the question message, content is selected from a plurality of content sets and transmitted to the user terminal device. An information processing apparatus that transmits content is known (see Patent Document 1).

特開２０１５−６９６０８号公報Japanese Patent Laying-Open No. 2015-69608

しかし、上記情報処理装置では、複数のコンテンツ集合から選択されるコンテンツは、平均的に選択されており、ユーザが所望するコンテンツがユーザの端末装置に送信されない場合があり、ユーザの満足度が低下する場合がある。 However, in the information processing apparatus, content selected from a plurality of content sets is selected on average, and content desired by the user may not be transmitted to the user's terminal device, which reduces user satisfaction. There is a case.

本願は、上記に鑑みてなされたものであって、ユーザの端末装置に送信されるメッセージに対するユーザの満足度を向上させる情報処理装置、情報処理方法、およびプログラムを提供することを目的とする。 The present application has been made in view of the above, and an object thereof is to provide an information processing apparatus, an information processing method, and a program that improve a user's satisfaction with a message transmitted to the user's terminal device.

本願にかかる情報処理装置は、受信部と、生成部と、送信部とを備える。受信部は、ユーザの端末装置から、ユーザメッセージを受信する。生成部は、ユーザメッセージに対し、強化学習により学習された候補選択モデルを用いて絞り込まれた候補を含む候補メッセージを生成する。送信部は、生成部によって生成された候補メッセージを端末装置に送信する。 An information processing apparatus according to the present application includes a reception unit, a generation unit, and a transmission unit. The receiving unit receives a user message from the user terminal device. The generation unit generates a candidate message including candidates narrowed down using the candidate selection model learned by reinforcement learning for the user message. The transmission unit transmits the candidate message generated by the generation unit to the terminal device.

実施形態の一態様によれば、ユーザの端末装置に送信されるメッセージに対するユーザの満足度を向上させる情報処理装置、情報処理方法、およびプログラムを提供することができる。 According to one embodiment of the present invention, it is possible to provide an information processing device, an information processing method, and a program that improve user satisfaction with a message transmitted to a user terminal device.

図１は、実施形態に係る情報処理の説明図である。FIG. 1 is an explanatory diagram of information processing according to the embodiment. 図２は、情報処理システムの構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of the information processing system. 図３は、情報処理装置の構成例を示す図である。FIG. 3 is a diagram illustrating a configuration example of the information processing apparatus. 図４は、情報処理装置によってメニューが絞り込まれる一例を示す図である。FIG. 4 is a diagram illustrating an example in which menus are narrowed down by the information processing apparatus. 図５は、端末装置の表示画面の一例を示す図である。FIG. 5 is a diagram illustrating an example of a display screen of the terminal device. 図６は、実施形態に係るメニュー選択処理の一例を示すフローチャートである。FIG. 6 is a flowchart illustrating an example of menu selection processing according to the embodiment. 図７は、プログラムを実行するコンピュータのハードウェア構成の一例を示す図である。FIG. 7 is a diagram illustrating an example of a hardware configuration of a computer that executes a program.

以下に、本願にかかる情報処理装置、情報処理方法、およびプログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願にかかる情報処理装置、情報処理方法、およびプログラムが限定されるものではない。 Hereinafter, an information processing apparatus, an information processing method, and a form for implementing a program (hereinafter referred to as “embodiment”) according to the present application will be described in detail with reference to the drawings. Note that the information processing apparatus, the information processing method, and the program according to the present application are not limited by this embodiment.

[１．情報処理]
実施形態に係る情報処理の一例について説明する。図１は、実施形態に係る情報処理の説明図である。ここでは、情報処理装置１が、ユーザの端末装置２から受信したメニュー検索に対する処理を行う場合について説明するが、これに限定されることはない。 [1. Information processing]
An example of information processing according to the embodiment will be described. FIG. 1 is an explanatory diagram of information processing according to the embodiment. Here, although the case where the information processing apparatus 1 performs the process with respect to the menu search received from the user's terminal device 2 is demonstrated, it is not limited to this.

情報処理装置１は、ユーザの発話に基づくユーザメッセージである、メニュー検索データを受信する（ステップＳ１）。例えば、ユーザが「今日はキャベツを使いたい」と発話した場合に、情報処理装置１は、「今日はキャベツを使いたい」に対応したメニュー検索データを受信する。 The information processing apparatus 1 receives menu search data, which is a user message based on the user's utterance (step S1). For example, when the user speaks “I want to use cabbage today”, the information processing apparatus 1 receives menu search data corresponding to “I want to use cabbage today”.

情報処理装置１は、メニュー検索データから抽出した食材データに基づいて強化学習により学習された候補選択モデル（以下、強化学習モデルという。）を用いて、食材が使われるメニューを選択し、選択したメニューを含む応答文に対応するテキストデータ（以下、「応答文」と記載する場合がある。）を生成する（ステップＳ２）。応答文は、ユーザメッセージに対して、絞り込まれた候補を含む候補メッセージでもある。 The information processing apparatus 1 uses a candidate selection model (hereinafter referred to as reinforcement learning model) learned by reinforcement learning based on the ingredient data extracted from the menu search data to select and select a menu that uses the ingredient. Text data corresponding to the response sentence including the menu (hereinafter may be referred to as “response sentence”) is generated (step S2). The response sentence is also a candidate message including candidates narrowed down with respect to the user message.

強化学習モデルについて、詳しくは後述するが、実施形態に係る強化学習モデルは、食材データに対し、報酬の積算値（以下、スコアという。）が高いメニューが選択されるモデルである。 Although the reinforcement learning model will be described in detail later, the reinforcement learning model according to the embodiment is a model in which a menu having a high reward integrated value (hereinafter referred to as a score) is selected for the food material data.

例えば、情報処理装置１は、食材として「キャベツ」を使ったメニューの中で、スコアが高い「ロールキャベツ」を選択する。そして情報処理装置１は、「ロールキャベツはどうでしょうか？」の応答文を生成する。 For example, the information processing apparatus 1 selects “roll cabbage” having a high score from a menu using “cabbage” as a food material. Then, the information processing apparatus 1 generates a response sentence “How about the roll cabbage?”.

情報処理装置１は、生成した応答文に関するデータをユーザの端末装置２に送信する（ステップＳ３）。例えば、情報処理装置１は、「ロールキャベツはどうでしょうか？」とする応答文を送信する。 The information processing apparatus 1 transmits data relating to the generated response sentence to the user terminal device 2 (step S3). For example, the information processing apparatus 1 transmits a response sentence “How about a roll cabbage?”.

情報処理装置１は、ユーザの端末装置２から、端末装置２の操作に基づくメニュー選択結果を受信する（ステップＳ４）。例えば、情報処理装置１は、端末装置２の操作に基づいて、メニューとして「ロールキャベツ」が選択された情報を受信する。 The information processing apparatus 1 receives the menu selection result based on the operation of the terminal device 2 from the user terminal device 2 (step S4). For example, the information processing apparatus 1 receives information in which “roll cabbage” is selected as a menu based on the operation of the terminal device 2.

情報処理装置１は、メニュー選択結果に基づいて強化学習モデルを更新する（ステップＳ５）。例えば、情報処理装置１は、メニューとして「ロールキャベツ」のスコアが高くなるように強化学習モデルを更新する。 The information processing apparatus 1 updates the reinforcement learning model based on the menu selection result (step S5). For example, the information processing apparatus 1 updates the reinforcement learning model so that the score of “roll cabbage” is high as a menu.

このように、情報処理装置１は、ユーザの端末装置２から食材情報を含むメニュー検索データを受信し、受信したメニュー検索データに基づいて、強化学習モデルを用いて、メニューを選択する。そして、情報処理装置１は、選択したメニューを含む応答文を生成し、応答文を端末装置２に送信する。情報処理装置１は、強化学習モデルを用いて、メニューを選択することで、ユーザが好むメニューを素早く提供することができ、メニュー検索に対するユーザの満足度を向上させることができる。 As described above, the information processing apparatus 1 receives the menu search data including the food material information from the user terminal device 2 and selects a menu using the reinforcement learning model based on the received menu search data. Then, the information processing apparatus 1 generates a response sentence including the selected menu, and transmits the response sentence to the terminal device 2. By selecting a menu using the reinforcement learning model, the information processing apparatus 1 can quickly provide a menu that the user likes, and can improve user satisfaction with respect to menu search.

[２．情報処理システム５の構成]
図２は、情報処理システム５の構成例を示す図である。図２に示すように、実施形態に係る情報処理システム５は、情報処理装置１と、端末装置２と、音声認識サーバ３と、音声合成サーバ４と、を備える。 [2. Configuration of information processing system 5]
FIG. 2 is a diagram illustrating a configuration example of the information processing system 5. As illustrated in FIG. 2, the information processing system 5 according to the embodiment includes an information processing device 1, a terminal device 2, a speech recognition server 3, and a speech synthesis server 4.

端末装置２、音声認識サーバ３、音声合成サーバ４、および情報処理装置１は、ネットワークＮを介して無線または有線で互いに通信可能に接続される。ネットワークＮは、例えば、ＬＡＮ（Local Area Network）や、インターネットなどのＷＡＮ（Wide Area Network）である。 The terminal device 2, the speech recognition server 3, the speech synthesis server 4, and the information processing device 1 are connected to be communicable with each other wirelessly or via a network N. The network N is, for example, a LAN (Local Area Network) or a WAN (Wide Area Network) such as the Internet.

端末装置２は、スマートフォンや、タブレット型端末や、デスクトップ型ＰＣ（Personal Computer）や、ノート型ＰＣや、ＰＤＡ（Personal Digital Assistant）等により実現される。 The terminal device 2 is realized by a smartphone, a tablet terminal, a desktop PC (Personal Computer), a notebook PC, a PDA (Personal Digital Assistant), or the like.

音声認識サーバ３は、音声情報に対して自然言語処理を実行し、音声データをテキストデータに変換する装置である。音声認識サーバ３は、端末装置２から発話の音声データを受信すると、音声データをテキストデータに変換する。音声認識サーバ３は、音声データを変換したテキストデータを情報処理装置１に送信する。 The speech recognition server 3 is a device that performs natural language processing on speech information and converts speech data into text data. When the speech recognition server 3 receives speech speech data from the terminal device 2, the speech recognition server 3 converts the speech data into text data. The voice recognition server 3 transmits text data obtained by converting the voice data to the information processing apparatus 1.

音声合成サーバ４は、情報処理装置１によって生成された応答文のテキストデータを音声データに変換する。音声合成サーバ４は、テキストデータを変換した音声データを、端末装置２に送信する。 The voice synthesis server 4 converts the text data of the response sentence generated by the information processing apparatus 1 into voice data. The voice synthesis server 4 transmits the voice data obtained by converting the text data to the terminal device 2.

情報処理装置１は、端末装置２から送信されたテキストデータ、または音声認識サーバ３を介して音声データが変換されたテキストデータに基づいて、応答文のテキストデータを生成する。情報処理装置１は、生成した応答文のテキストデータを、音声合成サーバ４、および端末装置２に送信する。 The information processing device 1 generates text data of a response sentence based on the text data transmitted from the terminal device 2 or the text data obtained by converting the speech data via the speech recognition server 3. The information processing apparatus 1 transmits the generated text data of the response sentence to the speech synthesis server 4 and the terminal device 2.

なお、音声認識サーバ３や音声合成サーバ４を、情報処理装置１と一体的に構成してもよい。また、端末装置２が、音声認識機能や、音声合成機能を有する場合には、これらの機能を用いて、音声データとテキストデータとを変換してもよい。 Note that the speech recognition server 3 and the speech synthesis server 4 may be configured integrally with the information processing apparatus 1. When the terminal device 2 has a voice recognition function or a voice synthesis function, the voice data and the text data may be converted using these functions.

[３．情報処理装置１の構成]
次に、実施形態に係る情報処理装置１について、図３を参照し説明する。図３は、情報処理装置１の構成例を示す図である。 [3. Configuration of information processing apparatus 1]
Next, the information processing apparatus 1 according to the embodiment will be described with reference to FIG. FIG. 3 is a diagram illustrating a configuration example of the information processing apparatus 1.

ここでは、端末装置２から、ユーザの発話による音声データが送信される例を一例として説明するが、テキストデータが送信されてもよい。 Here, an example in which voice data based on a user's speech is transmitted from the terminal device 2 will be described as an example, but text data may be transmitted.

また、端末装置２（ユーザ）と情報処理装置１との間で、以下の応答が行われる例を一例として説明する。
（ユーザ発話１）：「今日はキャベツを使いたい」
（応答文１）：「ひき肉があれば、ロールキャベツはどうでしょう？」
（ユーザ発話２）：「お肉がないから違うのがいい」
（応答文２）：「もやしがあれば、野菜炒めはどうでしょう？」
（ユーザ発話３）：「卵と人参がある」
（応答文３）：「キャベツと卵と人参で作れるメニューを表示します」 Further, an example in which the following response is performed between the terminal device 2 (user) and the information processing device 1 will be described as an example.
(User utterance 1): “I want to use cabbage today”
(Response 1): "What about roll cabbage if you have ground meat?"
(User utterance 2): “It's better to be different because there is no meat”
(Response 2): “If you have bean sprouts, how about fried vegetables?”
(User utterance 3): “There are eggs and carrots”
(Response 3): “Displays a menu that can be made with cabbage, eggs and carrots”

情報処理装置１は、ユーザによりメニュー検索が行われた場合に、端末装置２（図２参照）から音声認識サーバ３（図２参照）を介して送信されたメニュー検索データに対する応答文を生成する応答生成装置である。情報処理装置１は、受信部１０と、送信部２０と、記憶部３０と、処理部４０とを備える。 When the menu search is performed by the user, the information processing device 1 generates a response sentence to the menu search data transmitted from the terminal device 2 (see FIG. 2) via the voice recognition server 3 (see FIG. 2). It is a response generation device. The information processing apparatus 1 includes a reception unit 10, a transmission unit 20, a storage unit 30, and a processing unit 40.

受信部１０は、ネットワークＮを介して、端末装置２からユーザメッセージを受信する。ユーザメッセージには、発話に基づくメニュー検索データなどのテキストデータや、端末装置２の操作情報に関するデータが含まれる。端末装置２の操作情報は、テキストデータを含まない情報であり、例えば、端末装置２における選択情報（クリックやタッチ）などである。 The receiving unit 10 receives a user message from the terminal device 2 via the network N. The user message includes text data such as menu search data based on utterances and data related to operation information of the terminal device 2. The operation information of the terminal device 2 is information that does not include text data, such as selection information (click or touch) in the terminal device 2.

記憶部３０は、対話モデル記憶部３１と、強化学習モデル記憶部３２と、メニュー記憶部３３と、食材記憶部３４とを備える。記憶部３０は、例えば、ＲＡＭ、フラッシュメモリ等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。 The storage unit 30 includes a dialogue model storage unit 31, a reinforcement learning model storage unit 32, a menu storage unit 33, and a food storage unit 34. The storage unit 30 is realized by, for example, a semiconductor memory element such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk.

対話モデル記憶部３１は、ユーザの発話内容に対して、予め設定された応答文を生成する対話モデルを記憶する。 The dialogue model storage unit 31 stores a dialogue model for generating a response sentence set in advance for the content of the user's utterance.

対話モデルは、ユーザの発話内容に対応したテキストデータに対し、対となるテキストデータを生成するためのモデルである。例えば、対話モデルは、ユーザの発話によるテキストデータが、「今日は＊＊＊を使いたい」である場合に、「×××があれば、○○○はどうでしょう？」といった応答文や、「○○○はどうでしょう？」といった応答文を生成するためのモデルである。 The dialogue model is a model for generating text data to be paired with text data corresponding to the utterance content of the user. For example, in the dialogue model, when the text data from the user's utterance is “I want to use *** today”, a response sentence such as “What is XXX with XXX?” This is a model for generating a response sentence such as "What about XX?"

強化学習モデル記憶部３２は、食材データに基づいてメニューを選択するための強化学習モデルを記憶する。強化学習モデルは、強化学習、例えば、Ｑ−ｌｅａｒｎｉｎｇにより学習される。 The reinforcement learning model storage unit 32 stores a reinforcement learning model for selecting a menu based on the food material data. The reinforcement learning model is learned by reinforcement learning, for example, Q-learning.

強化学習とは、或る状態に対して行動を与えた場合に、与えた行動によって得られる報酬が最大となるような行動を優先的に選択する学習方法である。状態に対し、様々な行動が試され、試された行動に応じた報酬を受け取ることで強化学習モデルが更新、すなわち学習される。 Reinforcement learning is a learning method that preferentially selects an action that maximizes the reward obtained by the given action when the action is given to a certain state. Various actions are tried for the state, and the reinforcement learning model is updated, that is, learned by receiving a reward corresponding to the tried action.

本実施形態に係る強化学習は、メニュー検索データから得られた食材データを「状態」とし、食材データに基づいて選択したメニューを「行動」とし、メニューを提供したユーザの応答に基づくスコアを「報酬」として行われる。強化学習モデルでは、各メニューに対してスコアが付されており、メニューを選択する際に、強化学習モデルが用いられると、スコアが最も高いメニューが選択される。 In the reinforcement learning according to the present embodiment, the food data obtained from the menu search data is set to “state”, the menu selected based on the food data is set to “action”, and the score based on the response of the user who provided the menu is “ Reward ”is done. In the reinforcement learning model, a score is assigned to each menu. When the reinforcement learning model is used when selecting a menu, the menu having the highest score is selected.

例えば、ユーザが選択したメニューの報酬は、「＋１」である。また、例えば、ユーザに提供されたメニューに対し、ユーザに選択されなかったメニューの報酬は、「−０．１」である。 For example, the reward of the menu selected by the user is “+1”. Further, for example, the reward of the menu not selected by the user with respect to the menu provided to the user is “−0.1”.

なお、ユーザがメニューについてレビューを書いた場合には、レビューを書いたメニューの報酬を「＋２」としてもよい。メニューのレビューが書かれた場合には、実際にユーザが料理を作成したことがわかるので、報酬が大きくなる。 If the user has written a review about the menu, the reward of the menu that wrote the review may be “+2”. When the review of the menu is written, it is known that the user has actually created the dish, so the reward is increased.

なお、強化学習モデルにおけるスコアの初期値は、例えば、一般的なメニューランキングに基づいて設定される。スコアの初期値は、人気度に応じて設定され、例えば、メニューとして人気が高いメニューに対するスコアの初期値が大きくなる。 Note that the initial value of the score in the reinforcement learning model is set based on, for example, a general menu ranking. The initial value of the score is set according to the degree of popularity. For example, the initial value of the score for a menu that is popular as a menu increases.

メニュー記憶部３３は、メニュー、および必要な食材を含むレシピを対応付けて記憶する。例えば、「ロールキャベツ」のメニューについては、食材として「キャベツ」、「ひき肉」、および「玉ねぎ」が対応付けられて記憶されている。 The menu storage unit 33 stores a menu and a recipe including necessary ingredients in association with each other. For example, in the “roll cabbage” menu, “cabbage”, “ground meat”, and “onion” are stored in association with each other as ingredients.

食材記憶部３４は、後述する解析部４１によって抽出された食材データを一時的に記憶する。なお、記憶された食材データは、所定条件、例えば、ユーザによりメニューが選択された場合や、予め設定された所定時間が経過した場合に消去される。所定時間は、例えば、メニュー選択が終了すると判断できる時間である。 The food storage unit 34 temporarily stores the food data extracted by the analysis unit 41 described later. The stored food data is erased when a predetermined condition, for example, when a menu is selected by the user or when a preset predetermined time has elapsed. The predetermined time is, for example, a time when it can be determined that the menu selection is completed.

食材記憶部３４は、ユーザが有する食材データの状態をベクトルで表し、例えば、ユーザの有しているか不明な食材データを「０」、有していることがわかっている食材データを「１」、有していないことがわかっている食材データを「−１」として、記憶する。 The food storage unit 34 represents the state of the food data that the user has as a vector. For example, the food data that the user has or is unknown is “0”, and the food data that is known to be “1”. , The food data that is known not to be stored is stored as “−1”.

処理部４０は、解析部４１と、メニュー選択部４２と、応答生成部４３と、学習部４４とを備える。 The processing unit 40 includes an analysis unit 41, a menu selection unit 42, a response generation unit 43, and a learning unit 44.

解析部４１は、受信部１０によって受信されたユーザメッセージが、テキストデータであるか、端末装置２の操作情報に関するメッセージであるかどうか判定する。また、解析部４１は、ユーザメッセージが端末装置２の操作情報に関するメッセージである場合、例えば、メニュー選択結果に関する操作情報であるかどうか判定する。 The analysis unit 41 determines whether the user message received by the reception unit 10 is text data or a message related to operation information of the terminal device 2. Moreover, the analysis part 41 determines whether it is the operation information regarding a menu selection result, for example, when a user message is a message regarding the operation information of the terminal device 2. FIG.

解析部４１は、テキストデータを含むメッセージである場合、形態素解析等を用いて、ユーザメッセージを解析し、テキストデータに含まれる単語群を抽出する。具体的には、解析部４１は、テキストデータに含まれる、食材データや、メニューデータを抽出する。また、解析部４１は、食材データや、メニューデータに対して肯定的な内容であるか、否定的な内容であるかを特定する。 If the message includes text data, the analysis unit 41 analyzes the user message using morphological analysis or the like, and extracts a word group included in the text data. Specifically, the analysis unit 41 extracts food material data and menu data included in the text data. Moreover, the analysis part 41 specifies whether it is the positive content or negative content with respect to foodstuff data or menu data.

すなわち、解析部４１は、テキストデータの内容がメニュー検索に関する内容であり、その内容が食材データに対して肯定的な内容であるか、否定的な内容であるかを特定する。解析部４１によって抽出された食材データは、食材記憶部３４に一時的に記憶される。 That is, the analysis unit 41 specifies whether the content of the text data is content related to menu search, and whether the content is positive content or negative content with respect to the food material data. The food data extracted by the analysis unit 41 is temporarily stored in the food storage unit 34.

また、解析部４１は、テキストデータの内容が、メニュー選択結果に関する内容であり、その内容がメニューに対して肯定的な内容であるか、否定的な内容であるかを特定する。 Moreover, the analysis part 41 specifies whether the content of text data is the content regarding a menu selection result, and the content is a positive content or a negative content with respect to a menu.

解析部４１は、メニュー検索データとして、例えば、受信部１０によって（ユーザ発話１）：「今日はキャベツを使いたい」が受信されると、食材データとして「キャベツ」を抽出し、肯定的な内容であると特定する。これにより、食材記憶部３４には、食材データとして「キャベツ」を有することが記憶される。 For example, when the receiving unit 10 receives (user utterance 1): “I want to use cabbage today” as the menu search data, the analysis unit 41 extracts “cabbage” as food data, To be identified. As a result, the food storage unit 34 stores “Cabbage” as the food data.

また、解析部４１は、メニュー検索データとして、例えば、受信部１０によって（ユーザ発話２）：「お肉がないから違うのがいい」が受信されると、食材データとして「肉」を抽出し、否定的な内容であると特定する。これにより、食材記憶部３４には、食材データとして「肉」を有していないことがさらに記憶される。 Further, the analysis unit 41 extracts “meat” as the food data when the receiving unit 10 receives (user utterance 2): “May be different because there is no meat” as the menu search data, for example. , To identify negative content. Thereby, it is further memorize | stored in the foodstuff memory | storage part 34 that it does not have "meat" as foodstuff data.

また、解析部４１は、メニュー検索データとして、例えば、受信部１０によって（ユーザ発話３）：「卵と人参がある」が受信されると、食材データとして「卵」、および「人参」を抽出し、肯定的な内容であると特定する。これにより、食材記憶部３４には、食材データとして「卵」、および「人参」を有していることがさらに記憶される。 Further, for example, when the receiving unit 10 receives (user utterance 3): “There is an egg and carrot” as the menu search data, the analysis unit 41 extracts “egg” and “carrot” as food data. And identify it as positive. Thereby, it is further memorize | stored in the foodstuff storage part 34 that it has "egg" and "carrot" as foodstuff data.

メニュー選択部４２は、食材記憶部３４に記憶された食材データに基づいて、強化学習モデルを用いて、スコアが高いメニューを選択する。 The menu selection unit 42 selects a menu having a high score using the reinforcement learning model based on the food data stored in the food storage unit 34.

具体的には、メニュー選択部４２は、食材記憶部３４に記憶された食材データの食材が使用されるメニューの中で、スコアが高い順にメニューをランキングする。 Specifically, the menu selection unit 42 ranks the menus in descending order of the score among the menus using the ingredients of the ingredient data stored in the ingredient storage unit 34.

例えば、（ユーザ発話１）：「今日はキャベツを使いたい」に基づき、食材記憶部３４に記憶された食材データが「キャベツ」のみであった場合には、食材として「キャベツ」が使用されるメニューの中でスコアが高いメニューをランキングする。 For example, based on (user utterance 1): “I want to use cabbage today” and the food data stored in the food storage unit 34 is only “cabbage”, “cabbage” is used as the food. Ranking the menu with the highest score in the menu.

これにより、例えば、図４に示すように、「ロールキャベツ」、「回鍋肉」、「野菜炒め」、「野菜スープ」、「お好み焼き」、「サラダ」、「キャベツ炒飯」の順にランキングされた場合、メニュー選択部４２は、ランキングの中でスコアが一番高い「ロールキャベツ」をメニューとして選択する。図４は、情報処理装置１によってメニューが絞り込まれる一例を示す図である。 Thereby, for example, as shown in FIG. 4, in the order of “roll cabbage”, “boiled pot meat”, “fried vegetables”, “vegetable soup”, “okonomiyaki”, “salad”, “fried cabbage” The menu selection unit 42 selects “roll cabbage” having the highest score in the ranking as a menu. FIG. 4 is a diagram illustrating an example in which menus are narrowed down by the information processing apparatus 1.

また、例えば、（ユーザ発話２）：「お肉がないから違うのがいい」に基づき、食材記憶部３４に記憶された食材データが「キャベツ」、および「肉」であり、「肉」を有していないことがわかると、メニュー選択部４２は、食材として「キャベツ」が使用され、「肉」が使用されないメニューの中でスコアが高いメニューをランキングする。 In addition, for example, based on (user utterance 2): “There is no meat and it is good to be different”, the food data stored in the food storage unit 34 is “cabbage” and “meat”, and “meat” is If it is found that the menu selection unit 42 does not have the menu, the menu selection unit 42 ranks the menu having a high score among the menus in which “cabbage” is used as the food and “meat” is not used.

これにより、例えば、図４に示すように、「野菜炒め」、「野菜スープ」、「お好み焼き」、「サラダ」、「キャベツ炒飯」の順にランキングされる。このランキングでは、食材データが「キャベツ」のみであった場合のランキングに対して、食材として「肉」が使用される「ロールキャベツ」、および「回鍋肉」が除外され、メニューが絞り込まれる。この場合、メニュー選択部４２は、ランキングの中でスコアが一番高い「野菜炒め」をメニューとして選択する。 As a result, for example, as shown in FIG. 4, ranking is performed in the order of “fried vegetables”, “vegetable soup”, “okonomiyaki”, “salad”, and “fried cabbage”. In this ranking, the menu is narrowed down by excluding “roll cabbage”, which uses “meat” as a food, and “roasted meat” compared to the ranking when the food data is only “cabbage”. In this case, the menu selection unit 42 selects “stir-fried vegetables” having the highest score in the ranking as a menu.

また、例えば、（ユーザ発話３）：「卵と人参がある」に基づき、食材記憶部３４に記憶された食材データが「キャベツ」、「肉」、「卵」、および「人参」であり、「肉」を有していないことがわかると、メニュー選択部４２は、食材として「キャベツ」、「卵」、および「人参」が使用され、「肉」が使用されないメニューの中でスコアが高いメニューをランキングする。 For example, based on (user utterance 3): “There are eggs and carrots”, the food data stored in the food storage unit 34 is “cabbage”, “meat”, “eggs”, and “carrots”. If it turns out that it does not have "meat", the menu selection part 42 uses "cabbage", "egg", and "carrot" as a foodstuff, and a score is high in the menu in which "meat" is not used. Ranking menus.

これにより、例えば、図４に示すように、「野菜スープ」、「キャベツ炒飯」の順にランキングされる。このランキングでは、食材データが「キャベツ」、および「肉」であり、「肉」を有していない場合のランキングに対して、食材として「卵」、または「人参」が使用されない「野菜炒め」、「お好み焼き」、および「サラダ」が除外され、メニューが絞り込まれる。 Thereby, for example, as shown in FIG. 4, ranking is performed in the order of “vegetable soup” and “fried cabbage”. In this ranking, the ingredients data is “cabbage” and “meat”, and “ranked egg” or “carrot” is not used as an ingredient compared to the ranking when “meat” is not included. , “Okonomiyaki”, and “salad” are excluded, and the menu is narrowed down.

メニュー選択部４２は、スコアが所定スコア以上となるメニューが所定数、例えば３つ以下となると、全てのメニューを選択する。所定スコアは、予め設定された値である。例えば、スコアが所定スコア以上となるメニューとして、ランキングに「野菜スープ」、および「キャベツ炒飯」が残った場合、メニュー選択部４２は、メニューの中でスコアが一番高い「野菜スープ」に加えて「キャベツ炒飯」もメニューとして選択する。 The menu selection unit 42 selects all menus when the number of menus having a score equal to or higher than a predetermined score is a predetermined number, for example, three or less. The predetermined score is a preset value. For example, when “vegetable soup” and “fried cabbage” remain in the ranking as a menu having a score equal to or higher than a predetermined score, the menu selection unit 42 adds to the “vegetable soup” having the highest score in the menu. Select “Cabbage fried rice” as the menu.

なお、メニュー選択部４２は、メニューの検索を開始してから、所定回数の応答を行った場合に、ランキングの中で上位のメニューを所定数選択してもよい。所定回数は、予め設定された回数であり、例えば、３回である。 Note that the menu selection unit 42 may select a predetermined number of higher-order menus in the ranking when a predetermined number of responses are made after starting the menu search. The predetermined number of times is a preset number of times, for example, three times.

応答生成部４３は、メニュー選択部４２によって選択されたメニュー、および対話モデルに基づいて、選択されたメニューを含む応答文を生成する。 The response generation unit 43 generates a response sentence including the selected menu based on the menu selected by the menu selection unit 42 and the dialogue model.

応答生成部４３は、メニュー選択部４２によって選択されたメニューで必要な食材であり、食材記憶部３４に食材データとして記憶されていない食材の中で重要な食材の有無を尋ねる応答文を生成する。つまり、応答生成部４３は、ユーザに対し、食材データを引き出すようなヒントを含んだ応答文を生成する。 The response generation unit 43 generates a response sentence asking whether there is an important food among the foods necessary for the menu selected by the menu selection unit 42 and not stored as food data in the food storage unit 34. . That is, the response generation unit 43 generates a response sentence including a hint for extracting food data from the user.

例えば、応答生成部４３は、メニューに対する食材の優先順位に基づき、情報が無い食材の中で優先度が高い食材の有無を尋ねる応答文を生成する。なお、食材の優先順位は、メニューとともに、メニュー記憶部３３に記憶されている。 For example, the response generation unit 43 generates a response sentence asking whether there is a food with a high priority among the ingredients without information based on the priority of the food with respect to the menu. In addition, the priority order of foodstuffs is memorize | stored in the menu memory | storage part 33 with the menu.

例えば、「ロールキャベツ」に対して、食材の優先順位として「キャベツ」、「肉」、「玉ねぎ」、「パン粉」の順に優先順位が付けられており、食材記憶部３４に食材データとして「キャベツ」が記憶されている場合、応答生成部４３は、「肉」の有無を尋ねる応答文を生成する。 For example, priority is given to “roll cabbage” in the order of “cabbage”, “meat”, “onion”, and “bread crumbs” as priority of ingredients, and “cabbage” is stored in the ingredient storage unit 34 as ingredient data. "Is stored, the response generation unit 43 generates a response sentence asking whether or not" meat "exists.

例えば、（ユーザ発話１）：「今日はキャベツを使いたい」に対して、応答生成部４３は、（応答文１）：「ひき肉があれば、ロールキャベツはどうでしょう？」を生成する。 For example, in response to (user utterance 1): “I want to use cabbage today”, the response generation unit 43 generates (response 1): “If there is minced meat, what about roll cabbage?”.

また、（ユーザ発話２）：「お肉がないから違うのがいい」に対して、メニュー選択部４２によってメニューとして「野菜炒め」が選択され、食材の優先順位として「もやし」が高い場合、応答生成部４３は、（応答文２）：「もやしがあれば、野菜炒めはどうでしょう？」を生成する。 In addition, when (user utterance 2): “There is no meat and it is good to be different”, “stir-fried vegetables” is selected as the menu by the menu selection unit 42, and “sprouts” is high as the priority of the ingredients. The response generation unit 43 generates (Response sentence 2): “If there is bean sprouts, how about fried vegetables?”.

また、（ユーザ発話３）：「卵と人参がある」に対しては、メニュー選択部４２によって「野菜スープ」、および「キャベツ炒飯」が選択され、メニューが絞られたので、応答生成部４３は、（応答文３）：「キャベツと卵と人参で作れるメニューを表示します」を生成する。 Also, for (user utterance 3): “There are eggs and carrots”, the menu selection unit 42 selects “vegetable soup” and “cabbage fried rice”, and the menu is narrowed down, so the response generation unit 43 (Response sentence 3): “Display menu that can be made with cabbage, eggs and carrots” is generated.

なお、応答生成部４３は、メニュー選択部４２によって選択されたメニューのみを含む応答文を生成してもよい。 Note that the response generation unit 43 may generate a response sentence including only the menu selected by the menu selection unit 42.

送信部２０は、ネットワークＮを介して、端末装置２や、音声合成サーバ４（図２参照）に、応答生成部４３によって生成された応答文を送信する。また、送信部２０は、メニューが絞られ、例えば、メニュー選択部４２によって「野菜スープ」、および「キャベツ炒飯」が選択された場合には、応答文に加えて、「野菜スープ」、および「キャベツ炒飯」のレシピデータを端末装置２に送信する。 The transmission unit 20 transmits the response sentence generated by the response generation unit 43 to the terminal device 2 and the voice synthesis server 4 (see FIG. 2) via the network N. The transmission unit 20 narrows down the menu. For example, when “vegetable soup” and “cabbage fried rice” are selected by the menu selection unit 42, “vegetable soup” and “vegetable soup” and “ The recipe data of “fried cabbage” is transmitted to the terminal device 2.

学習部４４は、ユーザによるメニュー選択結果に基づいて、強化学習モデルを更新する。学習部４４は、ユーザによるメニュー選択結果に基づいて、メニュー選択部４２によって選択されたメニューに対する報酬を与える。 The learning unit 44 updates the reinforcement learning model based on the menu selection result by the user. The learning unit 44 gives a reward for the menu selected by the menu selection unit 42 based on the menu selection result by the user.

例えば、（応答文３）：「キャベツと卵と人参で作れるメニューを表示します」が端末装置２に送信され、また「野菜スープ」、および「キャベツ炒飯」のレシピが、図５に示すように端末装置２に表示されたとする。図５は、端末装置２の表示画面の一例を示す図である。 For example, (Response 3): “Display menu that can be made with cabbage, eggs and carrots” is sent to the terminal device 2, and recipes for “vegetable soup” and “fried cabbage” are as shown in FIG. Is displayed on the terminal device 2. FIG. 5 is a diagram illustrating an example of a display screen of the terminal device 2.

そして、「野菜スープ」が、ユーザにより選択されると、学習部４４は、「野菜スープ」のメニューに報酬「＋１」を与え、強化学習モデルを更新する。 When “vegetable soup” is selected by the user, the learning unit 44 gives a reward “+1” to the “vegetable soup” menu and updates the reinforcement learning model.

また、学習部４４は、（応答文１）に含まれる「ロールキャベツ」、（応答文２）に含まれる「野菜炒め」、および最終的に選択されなかった「キャベツ炒飯」のメニューに報酬「−０．１」を与え、強化学習モデルを更新する。 Further, the learning unit 44 rewards “Roll cabbage” included in (Response sentence 1), “Fried vegetables” included in (Response sentence 2), and “Cabbage fried rice” that was not finally selected. -0.1 "is given, and the reinforcement learning model is updated.

これにより、メニュー選択部４２では、次回以降において、今回のメニュー選択時よりも、「野菜スープ」が選択される確率が高くなり、「ロールキャベツ」、「野菜炒め」、および「キャベツ炒飯」が選択される確率が低くなる。すなわち、ユーザの好みを反映したメニュー選択が行われ、情報処理装置１は、ユーザが好むメニューを素早く提供できるようになる。 As a result, the menu selection unit 42 has a higher probability that “vegetable soup” will be selected in the next and subsequent times than when this menu is selected, and “roll cabbage”, “fried vegetables”, and “fried cabbage” The probability of being selected is reduced. That is, the menu selection reflecting the user's preference is performed, and the information processing apparatus 1 can quickly provide the menu that the user likes.

なお、学習部４４は、例えば、（応答文１）：「ひき肉があれば、ロールキャベツはどうでしょう？」に対し、ユーザが「ロールキャベツいいね」などと発話し、メニューが選択され、決定された場合には、（応答文１）に含まれる「ロールキャベツ」のメニューに報酬「＋１」を与え、強化学習モデルを更新する。 Note that the learning unit 44, for example, responds to (Response 1): “If there is minced meat, what about roll cabbage?”, The user utters “Roll cabbage is good”, and the menu is selected and determined. If it is, the reward “+1” is given to the “roll cabbage” menu included in (response sentence 1), and the reinforcement learning model is updated.

また、学習部４４は、応答生成文のテキストデータが送信された後に、所定時間、ユーザの発話が無い場合、例えば、（応答文１）：「ひき肉があれば、ロールキャベツはどうでしょう？」に対し、所定時間が経過しても、ユーザの次の発話が無い場合には、（応答文１）に含まれる「ロールキャベツ」のメニューに報酬「−０．１」を与え、強化学習モデルを更新する。 In addition, the learning unit 44 may, for example, (response sentence 1): “How about roll cabbage if there is minced meat?” When there is no user utterance for a predetermined time after the text data of the response generation sentence is transmitted. On the other hand, if there is no next utterance of the user even after the predetermined time has passed, the reward “−0.1” is given to the “roll cabbage” menu included in (response sentence 1), and the reinforcement learning model is Update.

[４．メニュー選択処理]
次に、実施形態に係るメニュー選択処理について図６を参照し説明する。図６は、実施形態に係るメニュー選択処理の一例を示すフローチャートである。 [4. Menu selection process]
Next, menu selection processing according to the embodiment will be described with reference to FIG. FIG. 6 is a flowchart illustrating an example of menu selection processing according to the embodiment.

処理部４０は、受信部１０によってユーザメッセージが受信されると（ステップＳ１０）、ユーザメッセージがテキストデータを含むメッセージであるかどうか判定する（ステップＳ１１）。 When the user message is received by the receiving unit 10 (step S10), the processing unit 40 determines whether the user message is a message including text data (step S11).

処理部４０は、ユーザメッセージがテキストデータを含むメッセージである場合（ステップＳ１１：Ｙｅｓ）、テキストデータを解析し、テキストデータに含まれる食材データや、メニューデータを抽出し、これらのデータに対してテキストデータの内容が肯定的な内容であるか、否定的な内容であるか特定する（ステップＳ１２）。 When the user message is a message including text data (step S11: Yes), the processing unit 40 analyzes the text data, extracts food material data and menu data included in the text data, and extracts these data. It is specified whether the content of the text data is a positive content or a negative content (step S12).

処理部４０は、テキストデータの内容が、食材データを含み、メニュー検索に関する内容である場合（ステップＳ１３：Ｙｅｓ）、抽出した食材データを食材記憶部３４に記憶させる（ステップＳ１４）。 When the content of the text data includes food data and is related to menu search (step S13: Yes), the processing unit 40 stores the extracted food data in the food storage unit 34 (step S14).

処理部４０は、食材記憶部３４に記憶されている食材データに基づいて、強化学習モデルを用いて、スコアが最も高いメニューを選択する（ステップＳ１５）。 The processing unit 40 selects a menu with the highest score using the reinforcement learning model based on the food data stored in the food storage unit 34 (step S15).

処理部４０は、選択されたメニュー、および対話モデルに基づいて、選択されたメニューを含む応答文を生成する（ステップＳ１６）。 The processing unit 40 generates a response sentence including the selected menu based on the selected menu and the interaction model (step S16).

処理部４０は、生成された応答文を送信部２０からユーザの端末装置２へ送信する（ステップＳ１７）。 The processing unit 40 transmits the generated response sentence from the transmission unit 20 to the user terminal device 2 (step S17).

また、処理部４０は、ユーザメッセージがテキストデータを含まないメッセージ、すなわち、端末装置２の操作情報に関するメッセージである場合（ステップＳ１１：Ｎｏ）、またはテキストデータの内容が、メニュー検索に関する内容ではない場合（ステップＳ１３：Ｎｏ）には、ユーザメッセージがメニュー選択結果に関する内容であるかどうか判定する（ステップＳ１８）。 In addition, when the user message is a message that does not include text data, that is, a message regarding operation information of the terminal device 2 (step S11: No), or the content of the text data is not related to menu search. In the case (step S13: No), it is determined whether or not the user message is content related to the menu selection result (step S18).

処理部４０は、ユーザメッセージがメニュー選択結果に関する内容である場合（ステップＳ１８：Ｙｅｓ）、メニュー選択結果に基づいて、強化学習モデルを更新する（ステップＳ１９）。 When the user message is content related to the menu selection result (step S18: Yes), the processing unit 40 updates the reinforcement learning model based on the menu selection result (step S19).

処理部４０は、テキストデータの内容が、メニュー選択結果に関する内容ではない場合（ステップＳ１８：Ｎｏ）、今回の処理を終了する。 When the content of the text data is not the content related to the menu selection result (step S18: No), the processing unit 40 ends the current process.

[５．変形例]
上記実施形態では、メニュー選択部４２は、ランキングに対してスコアが一番高いメニューを選択したが、ランキングの中から所定の確率でメニューを選択してもよい。例えば、スコアが最も高いメニューに対しては、選択される確率を５０％とし、スコアがその次に高いメニューに対しては、選択される確率を２０％などとしてもよい。 [5. Modified example]
In the above-described embodiment, the menu selection unit 42 selects the menu having the highest score with respect to the ranking. However, the menu selection unit 42 may select the menu with a predetermined probability from the ranking. For example, for the menu with the highest score, the probability of selection may be 50%, and for the menu with the next highest score, the probability of selection may be 20%.

これにより、メニュー選択部４２によって同じメニューが選択されることを抑制し、ユーザに提供するメニューを多様化させることができる。 Thereby, it can suppress that the same menu is selected by the menu selection part 42, and can diversify the menu provided to a user.

また、メニュー選択部４２は、食材に対する特徴を表す実数ベクトルにより表される食材データに基づいて、メニューを選択してもよい。例えば、大量のレシピ文章から、Ｓｋｉｐ−ｇｒａｍモデルなどを用いて各食材に対応する実数ベクトル表現を学習し、メニュー選択部４２は、実数ベクトル表現に基づいて、メニューを選択する。 Moreover, the menu selection part 42 may select a menu based on the foodstuff data represented by the real number vector showing the characteristic with respect to a foodstuff. For example, a real vector expression corresponding to each food is learned from a large amount of recipe text using a Skip-gram model or the like, and the menu selection unit 42 selects a menu based on the real vector expression.

食材データを実数ベクトルとして表すことで、メニュー選択部４２は、食材を表す実数ベクトルの類似度により、予め設定された代替可能な食材を用いたメニューの中から、スコアが高いメニューを選択することが可能となる。 By representing the food data as a real vector, the menu selection unit 42 selects a menu having a high score from a menu that uses preset alternative foods according to the similarity of the real vector representing the food. Is possible.

また、実数ベクトル表現により食材データを表す場合には、強化学習モデルと対話モデルとを、ニューラルネットワークにより１つのモデルとして学習し、応答文を生成してもよい。 In addition, when food material data is represented by a real vector expression, the reinforcement learning model and the dialogue model may be learned as one model by a neural network to generate a response sentence.

また、メニュー選択部４２は、例えば、所定のタイミングで、ユーザが作ったことがないメニューや、ランキングのスコアが低いメニューを選択してもよい。所定のタイミングは、予め設定されており、例えば、数か月に一度のタイミングである。また、学習部４４は、Ｗｅｂサイト上の履歴に基づく協調フィルタリングの推薦結果に基づいて、スコアに対して重み付けを行ってもよい。 The menu selection unit 42 may select, for example, a menu that the user has not created or a menu with a low ranking score at a predetermined timing. The predetermined timing is set in advance, for example, once every several months. Further, the learning unit 44 may weight the score based on the recommendation result of collaborative filtering based on the history on the website.

これにより、メニュー選択部４２によって同じメニューが選択されることを抑制し、ユーザが予想しないメニューを提供することができる。 Thereby, it can suppress that the same menu is selected by the menu selection part 42, and can provide the menu which a user does not anticipate.

また、食材データを記憶する所定時間を食材の消費時間としてもよい。食材記憶部３４は、食材の消費時間が経過した後に、食材データを消去する。食材の消費時間は、例えば、食材の平均消費時間や、平均消費時間から食材の存在確率、例えば、選択されたメニューで使用される量に関連付けられた確率に応じて低下させた時間、すなわち、食材が全て消費されずに残っていると推測可能な時間である。 In addition, a predetermined time for storing the food material data may be used as a food consumption time. The food storage unit 34 deletes the food data after the consumption time of the food has elapsed. The consumption time of the ingredients is, for example, the average consumption time of the ingredients, or the average consumption time reduced according to the probability of existence of the ingredients, for example, the probability associated with the amount used in the selected menu, i.e. It is a time when it can be estimated that all ingredients remain without being consumed.

メニュー選択部４２は、今回のメニュー検索で得られた食材データに加えて、食材の消費時間が経過しておらず、食材記憶部３４に記憶された食材データ、例えば、以前のメニュー検索時に得られた食材データであり、全て消費されていないと推測される食材データに基づいて、メニューを選択する。 In addition to the food data obtained in the current menu search, the menu selection unit 42 has not consumed the food consumption time, and the food data stored in the food storage unit 34, for example, obtained during the previous menu search. The menu is selected based on the food data that is estimated to be not consumed.

また、応答生成部４３は、食材の残量を、ユーザに問いかける内容の応答文を生成し、食材記憶部３４は、食材の残量を記憶してもよい。 Moreover, the response production | generation part 43 may produce | generate the response sentence of the content which asks a user about the residual amount of a foodstuff, and the foodstuff memory | storage part 34 may memorize | store the residual quantity of a foodstuff.

これにより、ユーザに選択され易いメニューを早期に提案することができ、ユーザの満足度を向上させることができる。 Thereby, the menu which is easy to be selected by the user can be proposed at an early stage, and the satisfaction of the user can be improved.

また、学習部４４は、例えば、ユーザにより選択されたメニューに対し、類似メニューについては、所定期間、スコアを一時的に小さくしてもよい。類似メニューとは、ユーザに選択されたメニューと同一、または類似するメニューである。類似するメニューとは、或る食材に対して、代替食材を使用して作成可能なメニューである。所定期間は、予め設定されており、例えば、１か月である。例えば、学習部４４は、類似メニューに対するスコアが小さくなるように、スコアに対して重み付けを行う。 For example, the learning unit 44 may temporarily reduce the score of a similar menu for a predetermined period with respect to a menu selected by the user. The similar menu is a menu that is the same as or similar to the menu selected by the user. A similar menu is a menu that can be created using a substitute ingredient for a certain ingredient. The predetermined period is set in advance, for example, one month. For example, the learning unit 44 weights the score so that the score for the similar menu becomes small.

これにより、短い期間の間に、メニュー選択部４２によって類似メニューが選択されることを抑制し、ユーザに提供するメニューを多様化させることができる。 Thereby, it can suppress that a similar menu is selected by the menu selection part 42 in a short period, and can diversify the menu provided to a user.

また、メニュー選択部４２は、ユーザの健康情報に基づいてメニューを選択してもよい。健康情報は、例えば、体重や、血圧や、運動量や、日々の食事に関する情報である。健康情報は、例えば、解析部４１によって、ユーザとの対話によるテキストデータより抽出され、取得される。また、健康情報は、外部サービスを介して取得されてもよい。例えば、学習部４４は、健康状態を、強化学習の「状態」とし、健康情報に基づいて、健康性が高いメニューのスコアが大きくなるように、スコアに対して重み付けを行う。 Moreover, the menu selection part 42 may select a menu based on a user's health information. The health information is, for example, information on weight, blood pressure, exercise amount, and daily meals. For example, the health information is extracted and acquired by the analysis unit 41 from text data obtained by dialogue with the user. Moreover, health information may be acquired through an external service. For example, the learning unit 44 sets the health state as the “state” of reinforcement learning, and weights the score based on the health information so that the score of the menu with high health becomes large.

これにより、健康的なメニューをユーザに提供することができる。 Thereby, a healthy menu can be provided to the user.

また、メニュー選択部４２は、ユーザとの対話などにより取得されたユーザの好み、ユーザの発話履歴などの情報に基づいて、メニューを選択してもよい。例えば、学習部４４は、ユーザの好みなどに応じて、スコアに対して重み付けを行う。また、応答生成部４３は、ユーザの好みなどに基づいて、応答文を生成してもよい。例えば、応答生成部４３は、「ロールキャベツが好みでないなら、野菜炒めはどうでしょう？」などと応答文を生成してもよい。 In addition, the menu selection unit 42 may select a menu based on information such as user preferences and user utterance history acquired by interaction with the user. For example, the learning unit 44 weights the score according to the user's preference. Further, the response generation unit 43 may generate a response sentence based on user preferences and the like. For example, the response generation unit 43 may generate a response sentence such as “How about fried vegetables if you don't like roll cabbage?”.

これにより、自然な発話を実現し、ユーザが好むメニューを早期に提供することができる。 Thereby, natural utterance can be realized and a menu preferred by the user can be provided early.

また、メニュー選択部４２は、ユーザの食材の好みの情報に基づいて、メニューを選択してもよい。例えば、ユーザが「ピーマンは嫌い」と発話した場合には、食材記憶部３４は、所定時間では消去されない長期的情報として「ピーマン」が無いことを記憶する。また、学習部４４が、食材として「ピーマン」を使用するメニューのスコアを小さくしてもよい。 Moreover, the menu selection part 42 may select a menu based on the user's preference information of foodstuffs. For example, when the user utters “I don't like bell peppers”, the food storage unit 34 stores that there is no “green pepper” as long-term information that is not erased in a predetermined time. In addition, the learning unit 44 may reduce the score of a menu that uses “green pepper” as a food material.

これにより、食材として「ピーマン」を使用するメニューが選択され難くなり、ユーザの好みに応じたメニューを提供することができる。 This makes it difficult to select a menu that uses “green pepper” as a food material, and provides a menu according to the user's preference.

また、メニュー選択部４２は、所定のメニューを表示させないようにすることも可能である。例えば、アレルギーを持ったユーザに対しては、アレルギー食材を含むメニューが選択されないようにしてもよい。例えば、学習部４４は、アレルギー食材を含むメニューのスコアが小さくなるように、スコアに対して重み付けを行う。また、メニュー選択部４２は、アレルギー食材を含むメニューをランキングから除外してもよい。 Further, the menu selection unit 42 can be configured not to display a predetermined menu. For example, a menu including allergic ingredients may not be selected for a user who has allergies. For example, the learning unit 44 performs weighting on the score so that the score of the menu including the allergic food is small. Moreover, the menu selection part 42 may exclude the menu containing an allergic foodstuff from ranking.

これにより、ユーザが好まないメニューを提供することを抑制することができる。 Thereby, it can suppress providing a menu which a user does not like.

上記実施形態では、応答生成部４３は、メニュー選択部４２によって選択されたメニューを含んだ応答文を生成したが、メニューを含まない応答文を生成してもよい。例えば、応答生成部４３は、メニューを含まない「どんな食材がありますか？」といった応答文を生成してもよい。これにより、ユーザが有する食材データをユーザから引き出すことができ、情報処理装置１は、多くの食材データに基づいて、ユーザが好むメニューを早期に提供することができる。 In the above embodiment, the response generation unit 43 generates a response sentence including the menu selected by the menu selection unit 42, but may generate a response sentence that does not include the menu. For example, the response generation unit 43 may generate a response sentence such as “What foods do you have?” That does not include a menu. Thereby, the ingredient data which a user has can be withdraw | derived from a user, and the information processing apparatus 1 can provide the menu which a user likes at an early stage based on much ingredient data.

また、応答生成部４３は、複数のメニューを含んだ応答文を生成してもよい。例えば、上記した（応答文２）に対し「もやしがあれば、野菜炒めが作れますが、無ければサラダはどうでしょう？」といった応答文を生成してもよい。 The response generation unit 43 may generate a response sentence including a plurality of menus. For example, a response sentence such as “If you have bean sprouts, you can make fried vegetables, but what about salads?” May be generated for the above (Response sentence 2).

これにより、ユーザが好むメニューを早期に提供することができる。 Thereby, the menu which a user likes can be provided at an early stage.

上記実施形態では、学習部４４は、ユーザによるメニュー選択結果に基づいて報酬を付与し、強化学習モデルを更新したが、ユーザとの応答回数（対話回数）を考慮してもよい。例えば、ユーザがメニューを選択し、それまでに複数回の応答があった場合には、応答回数に応じて、選択されたメニューに報酬として、「＋１／応答回数」を与えてもよい。 In the above embodiment, the learning unit 44 gives a reward based on the menu selection result by the user and updates the reinforcement learning model. However, the number of responses with the user (number of interactions) may be taken into account. For example, when the user selects a menu and there are a plurality of responses so far, “+ 1 / number of responses” may be given to the selected menu as a reward according to the number of responses.

また、学習部４４は、ユーザによってメニューが選択された場合の報酬を、メニュー選択工程における食材のデータ量に応じて変更してもよい。 The learning unit 44 may change the reward when the menu is selected by the user according to the amount of food data in the menu selection process.

例えば、学習部４４は、ユーザがメニュー検索を行い、１品目のメニューを選択した後に、２品目のメニュー検索を行い、２品目のメニューを選択した場合には、２品目のメニューに対する報酬を、１品目のメニューに対する報酬よりも小さくする。 For example, when the user performs a menu search and selects a menu of one item after the user performs a menu search of two items and selects a menu of two items, a reward for the menu of two items is calculated. Make it smaller than the reward for one menu item.

２品目のメニュー検索は、１品目のメニューを選択する際にユーザから食材に関する情報が得られた状態で行われている。そのため、例えば、同じ応答回数で、１品目のメニューと２品目のメニューとがユーザに選択された場合であっても、情報処理装置１における１品目のメニュー選択は、２品目のメニュー選択よりも少ない食材データに基づいて行われている。 The two-item menu search is performed in a state in which information on food ingredients is obtained from the user when the one-item menu is selected. Therefore, for example, even when the user selects a menu of one item and a menu of two items with the same number of responses, the menu selection of one item in the information processing apparatus 1 is more effective than the menu selection of two items. It is based on a small amount of food data.

そのため、学習部４４は、２品目のメニューに対する報酬を、１品目のメニューに対する報酬よりも小さくし、食材データが少ない状態でユーザに選択されたメニューに対しては、報酬を大きくし、強化学習モデルの学習を行う。 Therefore, the learning unit 44 makes the reward for the menu of the two items smaller than the reward for the menu of the one item, and increases the reward for the menu selected by the user in a state where the food data is small, and the reinforcement learning. Train the model.

例えば、１品目のメニューに対する報酬を「＋１」とし、２品目のメニューに対する報酬に重みを付けて「＋０．１」とする。 For example, the reward for the menu of one item is “+1”, and the reward for the menu of two items is weighted to “+0.1”.

これにより、少ない食材のデータ量に基づいて、ユーザの好むメニューを提供できた場合には、報酬が大きくなり、ユーザが好むメニューを早期に提供することができる。 Thereby, when the menu which a user likes can be provided based on the data amount of few foodstuffs, a reward becomes large and the menu which a user likes can be provided at an early stage.

[６．効果]
実施形態に係る情報処理装置１は、受信部１０と、応答生成部４３と、送信部２０とを備える。受信部１０は、ユーザの端末装置２から、ユーザメッセージを受信する。応答生成部４３は、ユーザメッセージに対し、強化学習モデル（候補選択モデル）を用いて、絞り込まれたメニューを含む応答文を生成する。送信部２０は、生成された応答文を端末装置２に送信する。 [6. effect]
The information processing apparatus 1 according to the embodiment includes a reception unit 10, a response generation unit 43, and a transmission unit 20. The receiving unit 10 receives a user message from the user terminal device 2. The response generation unit 43 generates a response sentence including the narrowed menu using a reinforcement learning model (candidate selection model) in response to the user message. The transmission unit 20 transmits the generated response sentence to the terminal device 2.

これにより、ユーザのメニュー検索に対し、強化学習モデルを用いて選択したメニューを含む応答文を提供することができ、ユーザの好むメニューを早期に提供することができ、ユーザの満足度を向上させることができる。 As a result, a response sentence including a menu selected using the reinforcement learning model can be provided for a menu search of the user, a menu that the user likes can be provided at an early stage, and user satisfaction is improved. be able to.

応答生成部４３は、応答文に対する新たなユーザメッセージに対し、さらにメニューを絞り込んだ応答文を生成する。 The response generation unit 43 generates a response sentence further narrowing down the menu for a new user message for the response sentence.

これにより、メニューをさらに絞り込んだ応答文を提供できるため、ユーザの好むメニューを早期に提供することができ、ユーザの満足度を向上させることができる。 Thereby, since the response sentence which narrowed down a menu can be provided, the menu which a user likes can be provided at an early stage, and a user's satisfaction can be improved.

情報処理装置１は、応答文に対するユーザメッセージに応じて得られた報酬に基づいて強化学習モデルを学習する学習部４４を備える。応答生成部４３は、学習された強化学習モデルを用いて、ユーザメッセージを生成する。 The information processing apparatus 1 includes a learning unit 44 that learns a reinforcement learning model based on a reward obtained according to a user message for a response sentence. The response generation unit 43 generates a user message using the learned reinforcement learning model.

これにより、過去に得られた報酬に基づいて学習された強化学習モデルを用いて、ユーザの好みに応じたメニューを早期に提供することができる。 Thereby, the menu according to a user's liking can be provided at an early stage using the reinforcement learning model learned based on the reward obtained in the past.

学習部４４は、ユーザによってメニューが選択されるまでの応答回数が少ないほど、報酬を大きくする。 The learning unit 44 increases the reward as the number of responses until the menu is selected by the user is smaller.

これにより、ユーザの好むメニューを素早く提供できた場合には、報酬が大きくなるので、ユーザが好むメニューを早期に提供することができる。 Thereby, when the menu that the user likes can be quickly provided, the reward increases, so the menu that the user likes can be provided early.

学習部４４は、スコアの初期値を人気度に応じて設定し、人気度が高いほどスコアの初期値を大きくする。 The learning unit 44 sets the initial value of the score according to the popularity, and increases the initial value of the score as the popularity increases.

これにより、強化学習の初期段階で、人気度が低いメニューがユーザに提供されることを抑制し、ユーザの満足度が低下することを抑制することができる。 Thereby, it can suppress that a menu with low popularity is offered to a user in the initial stage of reinforcement learning, and can suppress that a user's satisfaction falls.

学習部４４は、ユーザによって選択されたメニューと、同一、または類似する類似メニューに対するスコアを、所定期間、一時的に小さくする。 The learning unit 44 temporarily reduces the score for a similar menu that is the same as or similar to the menu selected by the user for a predetermined period.

応答生成部４３は、強化学習モデル、およびユーザの健康状態を用いて応答文を生成する。 The response generation unit 43 generates a response sentence using the reinforcement learning model and the user's health state.

[７．ハードウェアの構成]
上記してきた実施形態に係る情報処理装置１は、例えば図７に示すような構成のコンピュータ１０００によって実現される。図７は、情報処理装置１の機能を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 [7. Hardware configuration]
The information processing apparatus 1 according to the embodiment described above is realized by a computer 1000 configured as shown in FIG. 7, for example. FIG. 7 is a hardware configuration diagram illustrating an example of a computer that realizes the functions of the information processing apparatus 1. The computer 1000 includes a CPU 1100, RAM 1200, ROM 1300, HDD 1400, communication interface (I / F) 1500, input / output interface (I / F) 1600, and media interface (I / F) 1700.

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400 and controls each unit. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started up, a program depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、ネットワークＮを介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が決定したデータをネットワークＮを介して他の機器へ送信する。 The HDD 1400 stores programs executed by the CPU 1100, data used by the programs, and the like. The communication interface 1500 receives data from other devices via the network N and sends the data to the CPU 1100, and transmits data determined by the CPU 1100 to other devices via the network N.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、決定したデータを入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls an output device such as a display and a printer and an input device such as a keyboard and a mouse via the input / output interface 1600. The CPU 1100 acquires data from the input device via the input / output interface 1600. Further, the CPU 1100 outputs the determined data to the output device via the input / output interface 1600.

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The media interface 1700 reads a program or data stored in the recording medium 1800 and provides it to the CPU 1100 via the RAM 1200. The CPU 1100 loads such a program from the recording medium 1800 onto the RAM 1200 via the media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. Etc.

例えば、コンピュータ１０００が実施形態に係る情報処理装置１として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、処理部４０の機能を実現する。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを記録媒体１８００から読み取って実行するが、他の例として、他の装置からネットワークＮを介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the information processing apparatus 1 according to the embodiment, the CPU 1100 of the computer 1000 implements the function of the processing unit 40 by executing a program loaded on the RAM 1200. The CPU 1100 of the computer 1000 reads these programs from the recording medium 1800 and executes them. However, as another example, these programs may be acquired from other devices via the network N.

以上、本願の実施形態及び変形例のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の行に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 As described above, some of the embodiments and modifications of the present application have been described in detail with reference to the drawings. However, these are merely examples, and various aspects can be made based on the knowledge of those skilled in the art including the aspects described in the disclosure line of the invention. It is possible to carry out the present invention in other forms that have been modified and improved.

[８．その他]
上記した、強化学習モデルは、上記したメニュー検索以外でも使用することができる。例えば、ユーザとの対話システムに用いることができる。対話システムにおいては、「状態」をユーザの発話内容とし、「行動」をユーザへの提案（応答内容）とし、「報酬」を提案に対するユーザの応答とし、強化学習を用いることで、ユーザの満足度が高い提案を行うことが可能となる。 [8. Other]
The above-described reinforcement learning model can be used for other than the menu search described above. For example, it can be used in a dialog system with a user. In a dialogue system, “state” is the user's utterance content, “behavior” is the proposal (response content) to the user, “reward” is the user's response to the proposal, and reinforcement learning is used to satisfy the user. It is possible to make a proposal with a high degree.

例えば、ユーザからの問いかけ（商品検索）に対し、ユーザの発話などに基づいた過去の提案内容からユーザの好みを学習した強化学習モデルを用いて、ユーザの満足度の高い提案（ユーザが満足する商品を提案）をすることができる。 For example, in response to an inquiry (product search) from a user, a proposal with high user satisfaction (a user is satisfied) using a reinforcement learning model in which user preferences are learned from past proposal contents based on user utterances and the like. Product proposal).

また、上記実施形態及び変形例において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 In addition, among the processes described in the above-described embodiments and modifications, all or a part of the processes described as being automatically performed can be manually performed, or are described as being performed manually. All or part of the processing can be automatically performed by a known method. In addition, the processing procedures, specific names, and information including various data and parameters shown in the document and drawings can be arbitrarily changed unless otherwise specified. For example, the various types of information illustrated in each drawing is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured.

また、上述してきた実施形態及び変形例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 In addition, the above-described embodiments and modifications can be combined as appropriate within a range that does not contradict processing contents.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、受信部１０は、受信手段や受信回路に読み替えることができる。 In addition, the “section (module, unit)” described above can be read as “means” or “circuit”. For example, the receiving unit 10 can be read as receiving means or a receiving circuit.

１情報処理装置
２端末装置
１０受信部
２０送信部
３０記憶部
４０処理部
４１解析部
４２メニュー選択部
４３応答生成部（生成部）
４４学習部 DESCRIPTION OF SYMBOLS 1 Information processing apparatus 2 Terminal device 10 Reception part 20 Transmission part 30 Storage part 40 Processing part 41 Analysis part 42 Menu selection part 43 Response generation part (generation part)
44 Learning Department

Claims

A receiving unit for receiving a user message from a user terminal device;
For the user message, a generation unit that generates a candidate message including candidates narrowed down using a candidate selection model learned by reinforcement learning;
A transmission unit that transmits the candidate message generated by the generation unit to the terminal device;
An information processing apparatus comprising:

The generator is
The information processing apparatus according to claim 1, wherein a new candidate message further narrowing down the candidates is generated for a new user message corresponding to the candidate message.

A learning unit that learns the candidate selection model based on a reward obtained according to the new user message for the candidate message;
The generator is
The information processing apparatus according to claim 2, wherein the candidate message is generated using the candidate selection model learned by the learning unit in the past.

The learning unit
The information processing apparatus according to claim 3, wherein the reward for the candidate selected by the user is increased as the number of responses before the candidate is selected by the user is reduced.

The learning unit
5. The information processing apparatus according to claim 3, wherein an initial value of a score in the candidate selection model is set according to popularity, and the initial value is increased as the popularity is higher.

The learning unit
6. The score in the candidate selection model for a similar candidate that is the same as or similar to the candidate selected by the user is temporarily reduced for a predetermined period. 6. Information processing device.

An acquisition unit for acquiring the health information of the user;
The user message is:
Message about cooking recipes,
The generator is
The information processing apparatus according to any one of claims 1 to 6, wherein the candidate message is generated for the user message using the candidate selection model and the health information.

An information processing method executed by an information processing apparatus,
A receiving step of receiving a user message from the user terminal device;
Generating a candidate message including candidates narrowed down using a candidate selection model learned by reinforcement learning for the user message;
A transmission step of transmitting the candidate message generated by the generation step to the terminal device;
An information processing method comprising:

A receiving procedure for receiving a user message from a user terminal device;
For the user message, a generation procedure for generating a candidate message including candidates narrowed down using a candidate selection model learned by reinforcement learning;
A transmission procedure for transmitting the candidate message generated by the generation procedure to the terminal device;
A program that causes a computer to execute.