JP2018097474A

JP2018097474A - Information service system

Info

Publication number: JP2018097474A
Application number: JP2016239582A
Authority: JP
Inventors: 光古賀; Ko Koga; 卓司山田; Takuji Yamada; 鈴木　恵子; Keiko Suzuki; 恵子鈴木
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2016-12-09
Filing date: 2016-12-09
Publication date: 2018-06-21
Anticipated expiration: 2036-12-09
Also published as: JP6642401B2

Abstract

PROBLEM TO BE SOLVED: To provide an information service system capable of making a proposal at an appropriate timing by learning a proposal result.SOLUTION: A center 12 of an information service system 10 has: a proposal acquisition section 38 acquiring a proposal to a user; and a timing learning section 35 which identifies a state containing a user location and a user status in the location as feature quantity, and when a proposal is made, grants a reward to the state when the proposal is made according to an acceptance result of the proposal and manages the state as learning information. Further, an agent ECU 11 connected to the center 12 has a timing determination section 20 which, referring to a state specified by the center 12 and strengthened learning information which is obtained from the center 12 and which has a state identical to the specified state, determines whether the specified state corresponds to a timing suitable to proposal, and when it is determined that the specified state corresponds to the suitable timing, makes a proposal to the user.SELECTED DRAWING: Figure 1

Description

本発明は、提案に対するユーザの受け入れ結果を学習し、その学習結果に基づきユーザに提案を行なう情報提供システムに関する。 The present invention relates to an information providing system that learns a user acceptance result for a proposal and makes a proposal to the user based on the learning result.

従来、この種の情報提供システムとして、例えば特許文献１に記載のナビゲーション装置が知られている。このナビゲーション装置は、当該装置が搭載された車両の走行予定経路から所定の提案距離以内に位置する施設を、立ち寄り地点としてユーザに提案する。提案距離は、提案対象となる施設のジャンルや環境毎に設定され、過去にユーザが提案を受け入れた度合いに基づいて修正される。例えば、提案回数に対する受け入れ回数が８０％以上である場合、提案距離を長くなる方向に修正する。提案回数に対する受け入れ回数が６０％未満である場合には、提案距離を短くなる方向に修正する。そして、次回の提案からは、そのように修正した提案距離が用いられることとなる。 Conventionally, as this type of information providing system, for example, a navigation device described in Patent Document 1 is known. This navigation device proposes to a user a facility located within a predetermined suggested distance from a planned travel route of a vehicle on which the device is mounted as a stop-off point. The suggested distance is set for each genre or environment of the facility to be proposed, and is corrected based on the degree of acceptance of the proposal by the user in the past. For example, when the number of times of acceptance with respect to the number of proposals is 80% or more, the proposal distance is corrected to become longer. When the number of times of acceptance with respect to the number of proposals is less than 60%, the proposal distance is corrected to be shortened. From the next proposal, the proposed distance corrected in this way is used.

特開２０１６−１２１８７９号公報Japanese Patent Laying-Open No. 2006-121879

しかし、上記ナビゲーション装置は、ユーザに車両での走行予定があることを前提としている。一方、ユーザの中には、走行予定は無いもののドライブへの潜在的な要望があるユーザが存在しうる。このようなユーザにドライブ内容を提案できれば、有用性の高い内容の提案を行うことができると考えられる。 However, the navigation device is based on the assumption that the user has a plan to travel in the vehicle. On the other hand, among users, there is a user who has no driving plan but has a potential demand for driving. If the drive content can be proposed to such a user, highly useful content can be proposed.

しかし、ユーザが乗車していないときに頻繁に提案が行われると、ユーザが外出したくないとき等、ユーザがドライブの要望を全く有していない場合にも提案が行われる可能性もある。このようにユーザの要望に相反するタイミングでドライブの提案が行われると、かえって有用性の低い提案が行われることとなる。 However, if the proposal is frequently made when the user is not on the board, the proposal may be made even when the user has no desire for driving, such as when the user does not want to go out. Thus, if a drive proposal is made at a timing that conflicts with the user's request, a less useful proposal will be made.

本発明は、このような実情に鑑みてなされたものであり、その目的は、提案結果を学習することにより適切なタイミングで提案を行うことのできる情報提供システムを提供することにある。 The present invention has been made in view of such a situation, and an object thereof is to provide an information providing system capable of making a proposal at an appropriate timing by learning a proposal result.

以下、上記課題を解決するための手段及びその作用効果について記載する。
上記課題を解決する情報提供システムは、ユーザに対する提案を取得する提案取得部と、前記ユーザの居場所及び当該居場所でのユーザ状態を特徴量として含む状態を特定する状態特定部と、提案が行なわれたときに、当該提案の受け入れ結果に応じて、当該提案が行なわれたときの前記状態に対して報酬を付与して学習情報とする提案結果学習部と、前記状態特定部により新たに特定された状態と、前記提案結果学習部により学習され前記特定された状態と同じ状態の学習情報とを参照して、前記特定した状態が提案に適したタイミングに相当するか否かを判定し、前記特定した状態が提案に適したタイミングに相当すると判定した際に、前記ユーザに対して提案を行なう提案タイミング判定部と、を備える。 Hereinafter, means for solving the above-described problems and the effects thereof will be described.
An information providing system that solves the above problems includes a proposal acquisition unit that acquires a proposal for a user, a state specifying unit that specifies a location including the user's whereabouts and a user state at the location as a feature amount, and a proposal. When the proposal is accepted, a proposal result learning unit which gives a reward to the state when the proposal is made and uses it as learning information is newly specified by the state specifying unit. Determining whether the specified state corresponds to a timing suitable for the proposal, with reference to the state and the learning information in the same state as the specified state learned by the proposal result learning unit, A proposal timing determination unit that makes a proposal to the user when it is determined that the identified state corresponds to a timing suitable for the proposal.

上記構成によれば、過去に行われた提案の受け入れ結果を、ユーザの居場所やユーザ状態を特徴量として含む状態とともに学習する。そして、その学習結果に基づいて、提案に適したタイミングであるか否かが判定され、提案に適したタイミングであるときに提案が行われる。このため、ユーザに対し、有用性の高い提案を、提案が受け入れられやすいタイミングで行うことができる。 According to the said structure, the acceptance result of the proposal performed in the past is learned with the state containing a user's whereabouts and a user state as a feature-value. Then, based on the learning result, it is determined whether or not the timing is suitable for the proposal, and the proposal is made when the timing is suitable for the proposal. For this reason, a highly useful proposal can be made to the user at a timing at which the proposal is easily accepted.

情報提供システムの一実施形態について、概略構成を示すブロック図。The block diagram which shows schematic structure about one Embodiment of an information provision system. 同実施形態における学習テーブルの概念図。The conceptual diagram of the learning table in the embodiment. 同実施形態における状態特定の手順を示すフローチャート。The flowchart which shows the procedure of the state specification in the embodiment. 同実施形態における強化学習の手順を示すフローチャート。The flowchart which shows the procedure of reinforcement learning in the embodiment. 同実施形態における提案タイミングの判定処理の手順を示すフローチャート。The flowchart which shows the procedure of the determination process of the proposal timing in the embodiment.

以下、情報提供システムの一実施形態について説明する。
本実施形態の情報提供システムは、車両に持ち込み可能な装置であって、ユーザに対してドライブに関する提案を行うエージェントＥＣＵ（電子制御装置）を有している。エージェントＥＣＵは、提案内容を行ったときの受け入れ結果に基づき強化学習を行う。強化学習とは、エージェントＥＣＵが環境に基づきある行動を選択したとき、当該選択した行動に基づく環境の変化に伴って何らかの報酬をエージェントに与えることにより、試行錯誤を通じてエージェントＥＣＵを環境に適応させていく学習手法である。なお、本実施形態では、エージェントＥＣＵは、ユーザの居場所、居場所でのユーザの状態（ユーザ状態）等を含む「状態」によって、複数の状態の集合である状態空間を構成している。また、各状態でのユーザ提案の受け入れの可否が、強化学習でいうところの報酬に相当する。エージェントＥＣＵは、所定のタイミングで、状態空間を構成する状態を特定し、特定した状態と強化学習の結果とを照らし合わせ、特定した状態が提案の受け入れの可能性が高い状態であって、提案のタイミングとして適していると判定したときに、ユーザとの対話を通じて提案を行う。 Hereinafter, an embodiment of the information providing system will be described.
The information providing system according to the present embodiment is an apparatus that can be brought into a vehicle, and includes an agent ECU (electronic control apparatus) that makes suggestions regarding driving to a user. The agent ECU performs reinforcement learning based on the acceptance result when the proposal is made. Reinforcement learning means that when an agent ECU selects an action based on the environment, the agent ECU is adapted to the environment through trial and error by giving the agent some kind of reward as the environment changes based on the selected action. It is a learning method that goes. In the present embodiment, the agent ECU forms a state space that is a set of a plurality of states by “state” including the user's whereabouts, the user's state (user state) at the whereabouts, and the like. In addition, whether or not to accept a user proposal in each state corresponds to a reward in terms of reinforcement learning. The agent ECU identifies a state constituting the state space at a predetermined timing, compares the identified state with the result of reinforcement learning, and the identified state is a state where the possibility of accepting the proposal is high. When it is determined that the timing is suitable, a proposal is made through dialogue with the user.

図１を参照して、情報提供システム１０の構成について説明する。情報提供システム１０は、エージェントＥＣＵ１１、及びエージェントＥＣＵ１１に接続されたセンター１２のタイミング学習部３５を有している。本実施形態では、エージェントＥＣＵ１１は、車両に持込が可能な携帯情報端末に搭載されている。エージェントＥＣＵ１１は、通信部１５を介して、センター１２と接続されている。センター１２は、携帯情報端末を用いるユーザの属性や趣向に合わせたドライブ情報を取得する。センター１２は、ドライブ情報を取得すると、エージェントＥＣＵ１１にドライブ情報を送信する。ドライブ情報は、目的地、経由地等を含む情報であって、ユーザが車内に居る場合及び自宅に居る場合の両方において提供される。エージェントＥＣＵ１１は、ドライブ情報として、例えば目的地又は自宅へ向かう車両内に居るユーザに対し、ユーザの興味があると推定される場所であって車両で向かうことができる経由地を提供する。また、エージェントＥＣＵ１１は、ドライブ情報として、自宅にいるユーザに対し、ユーザの興味があると推定される場所であって車両で向かうことができる目的地を提供する。 The configuration of the information providing system 10 will be described with reference to FIG. The information providing system 10 includes an agent ECU 11 and a timing learning unit 35 of the center 12 connected to the agent ECU 11. In this embodiment, the agent ECU 11 is mounted on a portable information terminal that can be brought into the vehicle. The agent ECU 11 is connected to the center 12 via the communication unit 15. The center 12 acquires drive information according to the attributes and preferences of the user who uses the portable information terminal. When the center 12 acquires the drive information, the center 12 transmits the drive information to the agent ECU 11. The drive information is information including a destination, a waypoint, and the like, and is provided both when the user is in the car and at home. As the drive information, the agent ECU 11 provides, for example, a user who is in the vehicle heading to the destination or home to a place that is estimated to be of interest to the user and can be reached by the vehicle. Moreover, agent ECU11 provides the destination which can be headed with a vehicle which is a place presumed to be a user's interest with respect to the user at home as drive information.

エージェントＥＣＵ１１の構成について説明する。エージェントＥＣＵ１１は、カメラ５０から画像を取得して画像認識を行う画像認識部１６を備える。画像認識部１６は、携帯情報端末に設けられたカメラ５０から画像情報を取得する。カメラ５０が取得する画像は、ユーザ又はユーザの周辺環境の画像である。又は、画像認識部１６は、車内を撮像する車載カメラ、自宅内を撮像するカメラから画像情報を取得するようにしてもよい。画像認識部１６は、取得した画像に対して画像処理を行って、物体等の認識を行い、画像認識結果を、ユーザ状態推定部１７及びユーザ行動推定部１８に出力する。 The configuration of the agent ECU 11 will be described. The agent ECU 11 includes an image recognition unit 16 that acquires an image from the camera 50 and performs image recognition. The image recognition unit 16 acquires image information from the camera 50 provided in the portable information terminal. The image acquired by the camera 50 is an image of the user or the surrounding environment of the user. Or you may make it the image recognition part 16 acquire image information from the vehicle-mounted camera which images the inside of a vehicle, and the camera which images the inside of a house. The image recognition unit 16 performs image processing on the acquired image, recognizes an object or the like, and outputs the image recognition result to the user state estimation unit 17 and the user behavior estimation unit 18.

ユーザ状態推定部１７は、画像認識結果を画像認識部１６から取得するほかに、車載機器５４から無線通信等を介して車両情報及び操作情報を取得し、家電機器５５から無線通信等を介して家電機器５５の操作情報を取得する。車載機器５４は、例えばナビゲーションシステムや、空調システムの操作装置等である。家電機器５５は、テレビ、オーディオシステム等である。ユーザ状態推定部１７は、画像認識結果、車両情報及び車載機器５４の操作情報、家電機器５５の操作情報を取得すると、それらの情報を統合してユーザ状態推定情報としてセンター１２に送信する。また、ユーザ状態推定部１７は、ユーザ状態推定情報に対するユーザ状態の学習結果を、センター１２から受信して、ユーザ状態を判定する。 In addition to acquiring the image recognition result from the image recognition unit 16, the user state estimation unit 17 acquires vehicle information and operation information from the in-vehicle device 54 via wireless communication or the like, and acquires from the home appliance 55 via wireless communication or the like. Operation information of the home appliance 55 is acquired. The in-vehicle device 54 is, for example, a navigation system or an operation device for an air conditioning system. The home appliance 55 is a television, an audio system, or the like. When the user state estimation unit 17 acquires the image recognition result, the vehicle information, the operation information of the in-vehicle device 54, and the operation information of the home appliance 55, the user state estimation unit 17 integrates the information and transmits it to the center 12 as user state estimation information. Moreover, the user state estimation part 17 receives the learning result of the user state with respect to user state estimation information from the center 12, and determines a user state.

具体的には、ユーザ状態推定部１７は、センター１２から取得したユーザ状態の学習結果に基づき、ユーザの居場所が、「車内」、「自宅」又はそれ以外のいずれであるかを判定する。また、ユーザ状態推定部１７は、ユーザの居場所が車内であると判定したとき、ユーザ状態の学習結果に基づき、車両の走行エリア及び車両の周辺状況を判定する。走行エリアは、例えば「高速道路」、「一般道」等の道路種別であってもよいし、自宅周辺（生活圏）、自宅から離れた遠方のエリア（生活圏外）であるか否かを示す情報であってもよい。車両の周辺状況は、「渋滞」、「道路規制中」、「それ以外」等といった交通情報である。また、ユーザ状態推定部１７は、ユーザの居場所が車内であると判定したとき、ユーザ状態の学習結果又は車両情報に基づき、例えば「前進」、「後進」、「停車」「乗降中」等の車両状態を判定する。「前進」、「後進」、「停車」及び「乗降中」は、例えばナビゲーションシステム等、車載ネットワークに接続された制御装置から取得することが可能である。また、ユーザ状態推定部１７は、ユーザの居場所が自宅であると判定したとき、ユーザ状態の学習結果に基づき、例えば「立ち」、「座り」、「歩き」、「寝そべり」等のユーザの自宅での状態を判定する。 Specifically, the user state estimation unit 17 determines whether the user's whereabouts are “inside the vehicle”, “home”, or any other place based on the learning result of the user state acquired from the center 12. In addition, when the user state estimation unit 17 determines that the user's whereabouts are in the vehicle, the user state estimation unit 17 determines the traveling area of the vehicle and the surrounding situation of the vehicle based on the learning result of the user state. The travel area may be, for example, a road type such as “highway” or “general road”, and indicates whether the area is near the home (living area) or far away from the home (outside the living area). It may be information. The surrounding situation of the vehicle is traffic information such as “congestion”, “under road regulation”, “other than that”, and the like. Further, when the user state estimation unit 17 determines that the user's whereabouts are in the vehicle, based on the learning result of the user state or the vehicle information, for example, “forward”, “reverse”, “stop”, “getting on and off”, etc. The vehicle state is determined. “Forward”, “reverse”, “stop” and “getting on and off” can be acquired from a control device connected to the in-vehicle network such as a navigation system. Further, when the user state estimating unit 17 determines that the user's whereabouts are at home, the user's home such as “standing”, “sitting”, “walking”, “sleeping”, etc., based on the learning result of the user state. The state at is determined.

ユーザ行動推定部１８は、ユーザ状態推定部１７と同様に、画像認識結果を画像認識部１６から取得するほかに、車載機器５４から無線通信等を介して車両情報及び操作情報を取得し、家電機器５５から無線通信等を介して家電機器５５の操作情報を取得する。ユーザ行動推定部１８は、画像認識結果、及び車載機器５４の操作情報、家電機器５５の操作情報を取得すると、それらの情報を統合してユーザ行動推定情報としてセンター１２に送信する。また、ユーザ行動推定部１８は、ユーザ行動推定情報に対するユーザ行動の学習結果を、センター１２から受信して、ユーザ行動を判定する。 Similar to the user state estimation unit 17, the user behavior estimation unit 18 acquires vehicle information and operation information from the in-vehicle device 54 via wireless communication or the like in addition to acquiring the image recognition result from the image recognition unit 16. Operation information of the home appliance 55 is acquired from the device 55 via wireless communication or the like. When acquiring the image recognition result, the operation information of the in-vehicle device 54, and the operation information of the home appliance 55, the user behavior estimation unit 18 integrates the information and transmits the information to the center 12 as user behavior estimation information. In addition, the user behavior estimation unit 18 receives the learning result of the user behavior with respect to the user behavior estimation information from the center 12 and determines the user behavior.

具体的には、ユーザ行動推定部１８は、ユーザの居場所が車内であると判定されたとき、センター１２から取得したユーザ行動の学習結果に基づき、例えば「機器操作中」、「会話中」、「睡眠中」等の車内でのユーザの行動を判定する。一方、ユーザ行動推定部１８は、ユーザの居場所が自宅であると判定されたとき、センター１２から取得したユーザ行動の学習結果に基づき、例えば「飲食中」、「オーディオ視聴中」、「電話中」等の自宅でのユーザの行動を推定する。 Specifically, when it is determined that the user's whereabouts are in the vehicle, the user behavior estimating unit 18, for example, “during device operation”, “during conversation”, based on the learning result of the user behavior acquired from the center 12. The user's behavior in the vehicle such as “sleeping” is determined. On the other hand, when it is determined that the user's whereabouts are at home, the user behavior estimating unit 18 is based on the learning result of the user behavior acquired from the center 12, for example, “drinking”, “watching audio”, “busy” The user's behavior at home such as “is estimated.

また、エージェントＥＣＵ１１は、タイミング判定部２０を備える。タイミング判定部２０は、エージェントＥＣＵ１１がドライブ情報を受信したとき等の所定のタイミングで、ユーザ状態推定部１７からユーザの状態を取得するとともに、ユーザ行動推定部１８からユーザの行動を取得する。エージェントＥＣＵ１１は、取得したユーザ状態及びユーザ行動をセンター１２に送信する。また、タイミング判定部２０は、センター１２からタイミング判定状況の強化学習の結果を受信し、この強化学習結果に基づき、ユーザ状態及びユーザ行動で特定される状態Ｓ（ｉ）が、ドライブの提案を行うタイミングに適したものであるか否かを判定する。そして、タイミング判定部２０は、状態Ｓ（ｉ）が、ドライブを行うタイミングに適したものであると判定したとき、対話制御部２５に対してドライブ情報の出力要求を行う。対話制御部２５は、センター１２から受信したドライブ情報を、音声合成部２６に出力する。音声合成部２６は、ドライブ情報の内容を音声化し、携帯情報端末に設けられたスピーカ５２を介してドライブ情報を音声で出力する。 Further, the agent ECU 11 includes a timing determination unit 20. The timing determination unit 20 acquires the user state from the user state estimation unit 17 and the user behavior from the user behavior estimation unit 18 at a predetermined timing such as when the agent ECU 11 receives drive information. The agent ECU 11 transmits the acquired user state and user behavior to the center 12. In addition, the timing determination unit 20 receives the result of the reinforcement learning of the timing determination situation from the center 12, and based on the result of the reinforcement learning, the state S (i) specified by the user state and the user action proposes the drive. It is determined whether or not the timing is suitable. Then, when the state determination unit 20 determines that the state S (i) is suitable for the drive timing, the timing determination unit 20 requests the dialogue control unit 25 to output drive information. The dialogue control unit 25 outputs the drive information received from the center 12 to the voice synthesis unit 26. The voice synthesizing unit 26 converts the content of the drive information into voice, and outputs the drive information as voice through the speaker 52 provided in the portable information terminal.

対話制御部２５は、ユーザが提案の受け入れの可否をユーザとの対話によって判定する。ユーザの発話による音声は、携帯情報端末のマイク５１を通じて信号化され、音声認識部２７に入力される。音声認識部２７は、入力した音声信号を解析して音声認識を行い、提案が受け入れられたか否かを判定する。そして、受け入れの可否を示す提案結果情報をセンター１２に送信する。 The dialogue control unit 25 determines whether or not the user can accept the proposal based on a dialogue with the user. Voice generated by the user's utterance is converted into a signal through the microphone 51 of the portable information terminal and input to the voice recognition unit 27. The voice recognition unit 27 analyzes the input voice signal and performs voice recognition to determine whether or not the proposal has been accepted. Then, proposal result information indicating acceptance / rejection is transmitted to the center 12.

次に、センター１２の構成について説明する。センター１２は、エージェントＥＣＵ１１と通信を行う通信部３０と、ユーザ状態学習部３１及びユーザ行動学習部３２を備える。ユーザ状態学習部３１は、エージェントＥＣＵ１１が送信したユーザ状態推定情報やユーザとの対話を通じて得られた情報等を学習して、学習結果を学習結果記憶部３３に記録する。例えば、ユーザ状態学習部３１は、ユーザ状態を特定したとき、ユーザとの対話を通じて特定したユーザ状態が実際の状態に合致したものであるか否かを学習する。さらに、ユーザ状態学習部３１は、エージェントＥＣＵ１１からユーザ状態推定情報を受信した際に、学習結果記憶部３３に記憶された学習結果に基づき、ユーザ状態推定情報と類似度の高いユーザ状態をエージェントＥＣＵ１１に送信する。 Next, the configuration of the center 12 will be described. The center 12 includes a communication unit 30 that communicates with the agent ECU 11, a user state learning unit 31, and a user behavior learning unit 32. The user state learning unit 31 learns the user state estimation information transmitted by the agent ECU 11, information obtained through dialogue with the user, and the like, and records the learning result in the learning result storage unit 33. For example, when the user state is specified, the user state learning unit 31 learns whether or not the user state specified through the dialogue with the user matches the actual state. Furthermore, when the user state learning unit 31 receives the user state estimation information from the agent ECU 11, the user state learning unit 31 assigns a user state having a high similarity to the user state estimation information based on the learning result stored in the learning result storage unit 33. Send to.

ユーザ行動学習部３２は、エージェントＥＣＵ１１が送信したユーザ行動推定情報やユーザとの対話を通じて得られた情報等を学習して、学習結果を学習結果記憶部３３に記録する。さらに、ユーザ行動学習部３２は、エージェントＥＣＵ１１からユーザ行動推定情報を受信した際に、学習結果記憶部３３に記憶された学習結果に基づき、ユーザ行動推定情報と類似度の高いユーザ状態をエージェントＥＣＵ１１に送信する。 The user behavior learning unit 32 learns the user behavior estimation information transmitted by the agent ECU 11, information obtained through dialogue with the user, and the like, and records the learning result in the learning result storage unit 33. Further, when the user behavior learning unit 32 receives the user behavior estimation information from the agent ECU 11, the user behavior learning unit 32 indicates a user state having a high similarity to the user behavior estimation information based on the learning result stored in the learning result storage unit 33. Send to.

また、センター１２は、タイミング学習部３５を備える。タイミング学習部３５は、エージェントＥＣＵ１１から送信された提案結果情報を、提案履歴情報として提案履歴記憶部３６に記憶する。また、タイミング学習部３５は、一つの状態の提案履歴情報に基づき強化学習を行い、その状態の強化学習結果を状態と関連付けて学習結果記憶部３３に記録する。 The center 12 includes a timing learning unit 35. The timing learning unit 35 stores the proposal result information transmitted from the agent ECU 11 in the proposal history storage unit 36 as proposal history information. The timing learning unit 35 performs reinforcement learning based on the proposal history information of one state, and records the reinforcement learning result of the state in the learning result storage unit 33 in association with the state.

さらに、センター１２は、提案取得部３８を備える。提案取得部３８は、ユーザの属性（年齢、性別、住所、居所…）、状況（天候、時間帯…）、ユーザの趣向等に基づき、ユーザに提案しうるドライブ情報を取得する。 Further, the center 12 includes a proposal acquisition unit 38. The proposal acquisition unit 38 acquires drive information that can be proposed to the user based on the user's attributes (age, gender, address, whereabouts ...), the situation (weather, time zone ...), the user's preferences, and the like.

次に、提案タイミングの強化学習について説明する。提案タイミングを判定するための状態Ｓ（ｉ）の特徴量Ｘ（ｊ）は、例えば以下のように定義される。
・Ｘ１：ユーザの居場所…車内、自宅
・Ｘ２：車両の居場所詳細…運転席、助手席、後席
・Ｘ３：自宅の居場所詳細…リビング、ダイニング、台所、寝室、浴室、トイレ
・Ｘ４：車両の走行エリア…高速道路、一般道路（生活圏）、一般道路（生活圏外）
・Ｘ５：車両の周辺状況…渋滞中、道路規制中、その他の状態
・Ｘ６：車両でのユーザ状態…前進中、後退中、停車中、乗降中
・Ｘ７：自宅でのユーザ状態…立ち姿勢、座り姿勢、歩行中、横臥
・Ｘ８：車両でのユーザ行動…機器操作中、会話中、睡眠中
・Ｘ９：自宅でのユーザ行動…飲食中、オーディオ視聴中、作業中、電話中、会話中、睡眠中、スマートフォン操作中
図２に示すように、学習情報としての強化学習情報１００は、学習結果記憶部３３に記録されている。タイミング学習部３５は、提案結果情報をエージェントＥＣＵ１１から取得したとき、その提案結果情報を提案履歴情報として提案履歴記憶部３６に記録する。また、その提案結果情報に対応するユーザ状態学習部３１により特定されたユーザ状態（Ｘ１〜Ｘ７）、ユーザ行動学習部３２により特定されたユーザ行動（Ｘ８，Ｘ９）で定められる状態Ｓ（ｉ）を特定する。状態Ｓ（ｉ）は、既に学習されている場合もあるし、新たに得られる場合もある。 Next, reinforcement learning of proposal timing will be described. The feature quantity X (j) of the state S (i) for determining the proposal timing is defined as follows, for example.
-X1: User's whereabouts: in the car, at home-X2: Details of whereabouts of the vehicle ... Driver's seat, passenger's seat, rear seat-X3: Details of whereabouts at home ... Living, dining, kitchen, bedroom, bathroom, toilet * X4: Vehicle's location Travel area: Expressway, general road (living area), general road (out of living area)
-X5: Vehicle surroundings ... Traffic jams, road regulation, other states-X6: User status in vehicles ... Advancing, retreating, stopping, getting on / off-X7: User status at home ... standing posture, Sitting posture, walking, lying down X8: User behavior in the vehicle ... operating the device, talking, sleeping ... X9: User behavior at home ... drinking, watching audio, working, calling, talking, During sleep and smartphone operation As shown in FIG. 2, the reinforcement learning information 100 as learning information is recorded in the learning result storage unit 33. When the timing learning unit 35 acquires the proposal result information from the agent ECU 11, the timing learning unit 35 records the proposal result information in the proposal history storage unit 36 as the proposal history information. Further, the state S (i) determined by the user state (X1 to X7) specified by the user state learning unit 31 corresponding to the proposal result information and the user behavior (X8, X9) specified by the user behavior learning unit 32 Is identified. The state S (i) may be already learned or may be newly obtained.

タイミング学習部３５は、特定した状態Ｓ（ｉ）に対し、新たに取得された提案結果情報に基づき報酬Ｒ（Ｓｉ）を算出する。例えば状態Ｓ（ｉ）のときにドライブ情報に基づく提案を行った際に、受け入れられた場合の報酬をＲ（Ｓｉ）を高くし（例えば「１」）、受け入れられなかった場合の報酬Ｒ（Ｓｉ）を低くする（例えば「０」）。 The timing learning unit 35 calculates a reward R (Si) for the specified state S (i) based on the newly obtained proposal result information. For example, when a proposal based on drive information is made in the state S (i), R (Si) is increased as a reward when accepted (for example, “1”), and a reward R when not accepted (( Si) is lowered (for example, “0”).

そして、タイミング学習部３５は、算出した報酬Ｒ（Ｓｉ）に基づき、状態価値関数Ｖ（Ｓｉ）を算出する。タイミング学習部３５は、過去に算出した報酬Ｒ（Ｓｉ）´と、新たに算出した報酬Ｒ（Ｓｉ）との平均を、その状態Ｓ（ｉ）の状態価値関数Ｖ（Ｓｉ）とする。例えば、特定した状態Ｓ（ｉ）が過去に学習されていない場合には、算出した報酬Ｒ（Ｓｉ）が状態価値関数Ｖ（Ｓｉ）となる。一方、状態Ｓ（ｉ）に対し、既に報酬Ｒ（Ｓｉ）が付与されており、新たに報酬Ｒ（Ｓｉ）´を算出した場合には、報酬Ｒ（Ｓｉ）及びＲ（Ｓｉ）´の相加平均が状態価値関数Ｖ（Ｓｉ）となる。なお、過去の報酬Ｒ（Ｓｉ）を状態価値関数Ｖ（Ｓｉ）に反映できればよいため、報酬Ｒ（Ｓｉ）及びＲ（Ｓｉ）´の中央値、相乗平均等の相加平均以外の平均値を「平均」としてもよい。 Then, the timing learning unit 35 calculates the state value function V (Si) based on the calculated reward R (Si). The timing learning unit 35 sets an average of the reward R (Si) ′ calculated in the past and the newly calculated reward R (Si) as the state value function V (Si) of the state S (i). For example, when the specified state S (i) has not been learned in the past, the calculated reward R (Si) becomes the state value function V (Si). On the other hand, when the reward R (Si) has already been given to the state S (i) and the reward R (Si) ′ is newly calculated, the phases of the rewards R (Si) and R (Si) ′ The arithmetic mean is the state value function V (Si). In addition, since it is only necessary to reflect the past reward R (Si) in the state value function V (Si), the average value other than the arithmetic mean such as the median of the rewards R (Si) and R (Si) ′ and the geometric mean It may be “average”.

図２に示す状態Ｓ（１）〜Ｓ（５）は、過去に提案が１回だけ行われたものである。例えば状態Ｓ（１）のときには提案が受け入れられなかったため（「失敗」）、状態価値関数Ｖ（Ｓｉ）は「０」等の低い値である。また、状態Ｓ（２）のときには提案が受け入れられたため（「成功」）、状態価値関数Ｖ（Ｓｉ）は「１」等の高い値である。 In the states S (1) to S (5) shown in FIG. 2, a proposal has been made only once in the past. For example, since the proposal was not accepted in the state S (1) (“failure”), the state value function V (Si) is a low value such as “0”. Further, since the proposal is accepted in the state S (2) (“success”), the state value function V (Si) is a high value such as “1”.

提案しうるドライブ情報が存在した時点から、状態Ｓ（ｉ）が、図２に示すＳ（３）、Ｓ（４）、Ｓ（５）のように遷移し、状態Ｓ（５）のときに提案がなされ、提案が受け入れられたとすると、報酬をＳ（３）、Ｓ（４）、Ｓ（５）の全てに高い報酬が付与される。その結果、各状態Ｓ（３）〜Ｓ（５）の状態価値関数Ｖ（Ｓｉ）が高くなる。 When the proposed drive information exists, the state S (i) transitions to S (3), S (4), S (5) shown in FIG. 2 and is in the state S (5). If a proposal is made and the proposal is accepted, a high reward is given to all of S (3), S (4), and S (5). As a result, the state value function V (Si) of each state S (3) to S (5) becomes high.

図２に示す状態Ｓ（６）は、平均化された報酬Ｒ（Ｓｉ）を状態価値関数Ｖ（Ｓｉ）としたものである。同じ状態Ｓ（ｉ）での提案回数が多い場合において、提案タイミングによって報酬Ｒ（Ｓｉ）が「０」や「１」等に変化しても、報酬Ｒ（Ｓｉ）が平均化されることによって、状態価値関数Ｖ（Ｓｉ）の値が過大又は過小となることを抑制することができる。 In the state S (6) shown in FIG. 2, the averaged reward R (Si) is a state value function V (Si). When the number of proposals in the same state S (i) is large, the reward R (Si) is averaged even if the reward R (Si) changes to “0”, “1”, etc. depending on the proposal timing. It is possible to prevent the value of the state value function V (Si) from being excessively large or small.

状態価値関数Ｖ（Ｓｉ）が高い状態Ｓ（ｉ）は、ユーザが提案を受け入れやすいと推定される状態であり、提案を行うタイミングとして適している。状態価値関数Ｖ（Ｓｉ）が低い状態Ｓ（ｉ）は、ユーザが提案を受け入れにくいと推定される状態であり、提案を行うタイミングとして適していない。エージェントＥＣＵ１１は、この強化学習結果に基づき、特定された状態Ｓ（ｉ）がドライブ情報に基づく提案を行うタイミングとして適しているか否かを判定する。 A state S (i) having a high state value function V (Si) is a state in which it is estimated that the user can easily accept the proposal, and is suitable as a timing for making the proposal. The state S (i) where the state value function V (Si) is low is a state that is estimated to be difficult for the user to accept the proposal, and is not suitable as a timing for making the proposal. Based on the reinforcement learning result, the agent ECU 11 determines whether or not the specified state S (i) is suitable as a timing for making a proposal based on drive information.

次に図３〜図５を参照して、センター１２及びエージェントＥＣＵ１１による処理手順を説明する。
まず図３を参照して、センター１２による状態Ｓ（ｉ）の特定処理を説明する。タイミング学習部３５は、エージェントＥＣＵ１１から取得したユーザ状態推定情報及びユーザ行動推定情報と、学習結果記憶部３３に記録された学習情報に基づき、ユーザの居場所を推定する（ステップＳ１０）。また、タイミング学習部３５は、居場所の推定結果に基づき、ユーザが車内にいるか否かを推定する（ステップＳ１１）。タイミング学習部３５は、ユーザが車内にいると推定すると（ステップＳ１１：ＹＥＳ）、エージェントＥＣＵ１１から取得した車両情報に基づき走行エリアを特定し（ステップＳ１２）、走行エリアの状況を特定する（ステップＳ１３）。また、タイミング学習部３５は、車両の状態を推定し（ステップＳ１４）、車内でのユーザの行動を推定する（ステップＳ１５）。 Next, processing procedures by the center 12 and the agent ECU 11 will be described with reference to FIGS.
First, with reference to FIG. 3, the identification process of the state S (i) by the center 12 will be described. The timing learning unit 35 estimates the user's whereabouts based on the user state estimation information and user behavior estimation information acquired from the agent ECU 11 and the learning information recorded in the learning result storage unit 33 (step S10). Moreover, the timing learning part 35 estimates whether a user is in a vehicle based on the estimation result of a whereabouts (step S11). When the timing learning unit 35 estimates that the user is in the vehicle (step S11: YES), the timing learning unit 35 identifies the travel area based on the vehicle information acquired from the agent ECU 11 (step S12), and identifies the situation of the travel area (step S13). ). Moreover, the timing learning part 35 estimates the state of a vehicle (step S14), and estimates a user's action in a vehicle (step S15).

一方、タイミング学習部３５は、ユーザの居場所が車内ではないと推定すると（ステップＳ１１：ＮＯ）、ユーザの居場所が自宅であるか否かを推定する（ステップＳ１６）。タイミング学習部３５は、ユーザの居場所が自宅ではないと推定すると（ステップＳ１６：ＮＯ）、状態Ｓ（ｉ）が特定できない旨の通知をエージェントＥＣＵ１１に送信し、特定処理を終了する。タイミング学習部３５は、ユーザの居場所が自宅であると推定すると（ステップＳ１６：ＹＥＳ）、自宅でのユーザの状態を推定するとともに（ステップＳ１７）、自宅でのユーザの行動を推定する（ステップＳ１８）。 On the other hand, when the timing learning unit 35 estimates that the user's whereabouts are not in the vehicle (step S11: NO), the timing learning unit 35 estimates whether or not the user's whereabouts are at home (step S16). If the timing learning unit 35 estimates that the user's whereabouts are not at home (step S16: NO), the timing learning unit 35 transmits a notification that the state S (i) cannot be specified to the agent ECU 11, and ends the specifying process. When the timing learning unit 35 estimates that the user's whereabouts are at home (step S16: YES), the timing learning unit 35 estimates the state of the user at home (step S17) and estimates the user's behavior at home (step S18). ).

次に図４を参照して、センター１２による提案タイミングの強化学習の手順について説明する。この処理は、センター１２が、提案の受け入れ結果をエージェントＥＣＵ１１から受信した時に行われる。タイミング学習部３５は、エージェントＥＣＵ１１から送信された提案結果情報に基づき、状態Ｓ（ｉ）に対して報酬Ｒ（Ｓｉ）を付与する（ステップＳ２０）。例えばタイミング学習部３５は、提案が受け入れられたと判断すると「１」を報酬Ｒ（Ｓｉ）とし、提案が受け入れられていないと判断すると報酬Ｒ（Ｓｉ）を「０」とする。 Next, with reference to FIG. 4, the procedure of the reinforcement learning of the proposal timing by the center 12 is demonstrated. This process is performed when the center 12 receives a proposal acceptance result from the agent ECU 11. The timing learning unit 35 gives a reward R (Si) to the state S (i) based on the proposal result information transmitted from the agent ECU 11 (step S20). For example, when the timing learning unit 35 determines that the proposal is accepted, the timing learning unit 35 sets “1” as a reward R (Si), and when determining that the proposal is not accepted, the timing learning unit 35 sets the reward R (Si) as “0”.

次に、タイミング学習部３５は、状態価値関数Ｖ（Ｓｉ）を更新する（ステップＳ２１）。タイミング学習部３５は、状態Ｓ（ｉ）に対して算出した報酬と、状態Ｓ（ｉ）に既に付与されている報酬との平均「ｍｅａｎ（Ｒ（Ｓｉ））」を求めて、この平均を新たな状態価値関数Ｖ（Ｓｉ）とする。タイミング学習部３５は、状態価値関数Ｖ（Ｓｉ）を算出すると、強化学習情報１００として学習結果記憶部３３に記録する。 Next, the timing learning unit 35 updates the state value function V (Si) (step S21). The timing learning unit 35 calculates an average “mean (R (Si))” of the reward calculated for the state S (i) and the reward already given to the state S (i), and calculates the average. Let it be a new state value function V (Si). When the timing learning unit 35 calculates the state value function V (Si), the timing learning unit 35 records it as the reinforcement learning information 100 in the learning result storage unit 33.

次に図５を参照して、エージェントＥＣＵ１１による提案タイミングの判定処理について説明する。エージェントＥＣＵ１１のタイミング判定部２０は、ドライブについて提案内容があるか否かを判断する（ステップＳ１）。このとき、エージェントＥＣＵ１１は、センター１２からドライブ情報を受信したか否かを判断する。例えば、ドライブ情報は、目的地又は経由地、出発地から目的地までの経路、所要時間等を含む。 Next, with reference to FIG. 5, the process of determining the proposal timing by the agent ECU 11 will be described. The timing determination unit 20 of the agent ECU 11 determines whether there is a proposal for the drive (step S1). At this time, the agent ECU 11 determines whether or not drive information is received from the center 12. For example, the drive information includes a destination or waypoint, a route from the departure point to the destination, a required time, and the like.

エージェントＥＣＵ１１のタイミング判定部２０は、提案内容がないと判断すると（ステップＳ１：ＮＯ）、提案タイミングの判定処理を終了してステップＳ１に戻り、センター１２からのドライブ情報の提供を待機する。一方、エージェントＥＣＵ１１のタイミング判定部２０は、ドライブの提案内容があると判断すると（ステップＳ１：ＹＥＳ）、状態Ｓ（ｉ）を推定する（ステップＳ２）。具体的には、タイミング判定部２０は、ユーザ状態推定部１７及びユーザ行動推定部１８に対して推定情報の出力を要求する。ユーザ状態推定部１７及びユーザ行動推定部１８は、画像認識部１６から認識結果を取得し、車載機器５４及び家電機器５５から各種情報を取得する。タイミング判定部２０は、ユーザ状態推定部１７からユーザ状態推定情報を取得し、ユーザ行動推定部１８からユーザ行動推定情報を取得して、センター１２に送信する。 When the timing determination unit 20 of the agent ECU 11 determines that there is no proposal content (step S1: NO), it ends the proposal timing determination process, returns to step S1, and waits for provision of drive information from the center 12. On the other hand, when the timing determination unit 20 of the agent ECU 11 determines that there is a drive proposal content (step S1: YES), it estimates the state S (i) (step S2). Specifically, the timing determination unit 20 requests the user state estimation unit 17 and the user behavior estimation unit 18 to output estimation information. The user state estimation unit 17 and the user behavior estimation unit 18 obtain a recognition result from the image recognition unit 16 and obtain various information from the in-vehicle device 54 and the home appliance 55. The timing determination unit 20 acquires user state estimation information from the user state estimation unit 17, acquires user behavior estimation information from the user behavior estimation unit 18, and transmits it to the center 12.

センター１２の通信部３０が、エージェントＥＣＵ１１からユーザ状態推定情報及びユーザ行動推定情報を取得すると、ユーザ状態学習部３１及びユーザ行動学習部３２が状態Ｓ（ｉ）を特定する。ユーザ状態学習部３１及びユーザ行動学習部３２により特定された状態Ｓ（ｉ）はタイミング学習部３５に出力される。タイミング学習部３５は、強化学習情報１００の中から、特定された状態Ｓ（ｉ）を検索する。タイミング学習部３５は、検索の結果、同じ状態Ｓ（ｉ）を抽出した場合には、その状態Ｓ（ｉ）の状態価値関数Ｖ（Ｓｉ）を強化学習結果としてエージェントＥＣＵ１１に送信する。一方、タイミング学習部３５は、同じ状態Ｓ（ｉ）を抽出できなかった場合には、同じ状態Ｓ（ｉ）がないことを示す検索結果を強化学習結果としてエージェントＥＣＵ１１に送信する。 When the communication unit 30 of the center 12 acquires the user state estimation information and the user behavior estimation information from the agent ECU 11, the user state learning unit 31 and the user behavior learning unit 32 specify the state S (i). The state S (i) specified by the user state learning unit 31 and the user behavior learning unit 32 is output to the timing learning unit 35. The timing learning unit 35 searches the reinforcement learning information 100 for the specified state S (i). When the same state S (i) is extracted as a result of the search, the timing learning unit 35 transmits the state value function V (Si) of the state S (i) to the agent ECU 11 as a reinforcement learning result. On the other hand, when the same state S (i) cannot be extracted, the timing learning unit 35 transmits a search result indicating that there is no same state S (i) to the agent ECU 11 as a reinforcement learning result.

エージェントＥＣＵ１１のタイミング判定部２０は、強化学習結果をセンター１２から取得したか否かを判断する（ステップＳ３）。タイミング判定部２０は、例えばユーザの居場所等が特定できなかった場合等、強化学習結果をセンター１２から取得できない場合には（ステップＳ３：ＮＯ）、ドライブ情報の提案内容に基づく提案を行う（ステップＳ５）。具体的には、タイミング判定部２０は、対話制御部２５に提案要求を出力する。対話制御部２５は、音声合成部２６を介して、スピーカ５２から提案内容に基づく音声を出力する。 The timing determination unit 20 of the agent ECU 11 determines whether or not the reinforcement learning result has been acquired from the center 12 (step S3). When the reinforcement learning result cannot be acquired from the center 12 (step S3: NO), for example, when the user's whereabouts cannot be specified, for example, the timing determination unit 20 makes a suggestion based on the proposal contents of the drive information (step S3). S5). Specifically, the timing determination unit 20 outputs a proposal request to the dialogue control unit 25. The dialogue control unit 25 outputs a voice based on the proposal content from the speaker 52 via the voice synthesis unit 26.

一方、タイミング判定部２０は、強化学習結果を取得すると（ステップＳ３：ＹＥＳ）、取得した強化学習結果に基づき、特定された状態Ｓ（ｉ）が提案に適したタイミングである否かを判断する（ステップＳ４）。このとき、タイミング判定部２０は、強化学習結果である状態価値関数Ｖ（Ｓｉ）が所定値以上であるか否かを判断する。そして、状態価値関数Ｖ（Ｓｉ）が所定値以上である場合には、状態Ｓ（ｉ）が提案に適したタイミングであるとして、ドライブ情報の提案内容に基づく提案を行う（ステップＳ５）。一方、タイミング判定部２０は、状態価値関数Ｖ（Ｓｉ）が所定値未満である場合には、ドライブ情報の提案を行わず、一旦処理を終了して、ステップＳ１に戻る。さらに、タイミング判定部２０は、同じ状態Ｓ（ｉ）がないことを示す検索結果を受信した場合には、提案に適したタイミングであるとみなして、ドライブ情報の提案内容に基づく提案を行う（ステップＳ５）。 On the other hand, when acquiring the reinforcement learning result (step S3: YES), the timing determination unit 20 determines whether or not the specified state S (i) is a timing suitable for the proposal based on the acquired reinforcement learning result. (Step S4). At this time, the timing determination unit 20 determines whether or not the state value function V (Si) that is the reinforcement learning result is equal to or greater than a predetermined value. If the state value function V (Si) is equal to or greater than a predetermined value, it is proposed that the state S (i) is at a timing suitable for the proposal based on the proposed contents of the drive information (step S5). On the other hand, if the state value function V (Si) is less than the predetermined value, the timing determination unit 20 does not propose drive information, ends the process, and returns to step S1. Further, when receiving a search result indicating that there is no same state S (i), the timing determination unit 20 considers that the timing is suitable for the proposal and makes a proposal based on the proposal content of the drive information ( Step S5).

提案が実行されると、対話制御部２５は、ユーザとの対話を通じて提案結果を取得する（ステップＳ６）。音声認識部２７は、提案に対するユーザの発話音声を認識して、発話内容を対話制御部２５に出力する。対話制御部２５は、発話内容に基づき、ユーザが提案を受け入れたか否かを判断し、提案結果情報をセンター１２に送信する。センター１２は、提案結果情報を受信して、受信した提案結果情報に基づき提案タイミングの強化学習を行う（ステップＳ７、図４参照）。 When the proposal is executed, the dialogue control unit 25 acquires the proposal result through dialogue with the user (step S6). The voice recognition unit 27 recognizes the user's utterance voice to the proposal and outputs the utterance content to the dialogue control unit 25. The dialogue control unit 25 determines whether the user has accepted the proposal based on the utterance content, and transmits the proposal result information to the center 12. The center 12 receives the proposal result information, and performs reinforcement learning of the proposal timing based on the received proposal result information (step S7, see FIG. 4).

このように、エージェントＥＣＵ１１は、常にドライブ情報が取得される都度、ユーザに提案を行うのではなく、ユーザが提案を受け入れやすいタイミングをユーザ毎に強化学習し、強化学習結果に基づき提案に適したタイミングのときに提案を行う。このため、ユーザは、ドライブ情報が取得される度に提案が行われる場合に比べ、煩わしさを感じない。また、潜在的にドライブの希望があるユーザに、提案に適したタイミングでドライブの提案を行うことで、ユーザにとって有用性の高い情報を提供できる。 In this way, the agent ECU 11 does not make a proposal to the user every time the drive information is acquired, but reinforcement learning is performed for each user at a timing at which the user can easily accept the proposal, and suitable for the proposal based on the reinforcement learning result. Make a suggestion at the timing. For this reason, the user does not feel bothered compared to the case where the proposal is made each time the drive information is acquired. In addition, information that is highly useful to the user can be provided to a user who has a hope of driving by proposing a drive at a timing suitable for the proposal.

以上説明したように、本実施形態によれば、以下の効果が得られるようになる。
（１）上記実施形態では、センター１２は、過去に行われた提案の受け入れ結果を、ユーザの居場所、居場所でのユーザ状態及びユーザ行動を特徴量として含む状態とともに強化学習する。そして、エージェントＥＣＵ１１によって、センター１２から取得された強化学習結果に基づいて、提案に適したタイミングであるか否かが判定され、提案に適したタイミングであるときに提案が行われる。このため、ユーザに対し、有用性の高い提案を、提案が受け入れられやすいタイミングで行うことができる。 As described above, according to the present embodiment, the following effects can be obtained.
(1) In the above embodiment, the center 12 learns the results of accepting proposals made in the past together with the user's whereabouts, the user state at the whereabouts, and the state including the user behavior as feature quantities. Then, based on the reinforcement learning result acquired from the center 12, the agent ECU 11 determines whether or not the timing is suitable for the proposal, and the proposal is made when the timing is suitable for the proposal. For this reason, a highly useful proposal can be made to the user at a timing at which the proposal is easily accepted.

（他の実施形態）
なお、上記実施形態は、以下のような形態をもって実施することもできる。
・上記実施形態では、対話制御部２５を介したユーザとの対話を通じて、提案が受け入れられたかを判断した。これに代えて、携帯情報端末のタッチパネルディスプレイ、又は操作ボタンが提案受け入れの際にオン操作されることによって、提案が受け入れられたかを判断するようにしてもよい。 (Other embodiments)
In addition, the said embodiment can also be implemented with the following forms.
In the above embodiment, it is determined whether the proposal has been accepted through dialogue with the user via the dialogue control unit 25. Instead, it may be determined whether the proposal is accepted by turning on the touch panel display of the portable information terminal or the operation button when accepting the proposal.

・上記実施形態では、センター１２のタイミング学習部３５が、ユーザ状態の特徴量及びユーザ行動の特徴量で定められる状態を特定するようにした。これ以外に、ユーザ状態学習部３１及びユーザ行動学習部３２のいずれか一方が、それらの特徴量で定められる状態を特定するようにしてもよい。又は、エージェントＥＣＵ１１が、画像認識結果に基づきユーザ状態及びユーザ行動を学習し、ユーザ状態の特徴量及びユーザ行動の特徴量で定められる状態を特定してもよい。 In the above-described embodiment, the timing learning unit 35 of the center 12 specifies the state determined by the feature amount of the user state and the feature amount of the user action. In addition to this, any one of the user state learning unit 31 and the user behavior learning unit 32 may specify a state defined by these feature amounts. Alternatively, the agent ECU 11 may learn the user state and the user behavior based on the image recognition result, and specify the state determined by the feature amount of the user state and the feature amount of the user behavior.

・上記実施形態では、エージェントＥＣＵ１１は、センター１２から強化学習結果が得られない場合に（ステップＳ３：ＮＯ）、提案を実行するようにした（ステップＳ５）。これに代えて、センター１２から強化学習結果が得られない場合には（ステップＳ３：ＮＯ）、提案を実行しないようにしてもよい。この態様においては、例えば、強化学習結果に基づかないタイミングでの提案が所定回数繰り返され、その提案の受け入れ結果を学習した強化学習結果を蓄積する。 In the above embodiment, the agent ECU 11 executes the proposal when the reinforcement learning result cannot be obtained from the center 12 (step S3: NO) (step S5). Instead, when the reinforcement learning result cannot be obtained from the center 12 (step S3: NO), the proposal may not be executed. In this aspect, for example, the proposal at a timing not based on the reinforcement learning result is repeated a predetermined number of times, and the reinforcement learning result obtained by learning the acceptance result of the proposal is accumulated.

・上記実施形態では、ユーザの居場所が「車内」又は「自宅」である場合に提案を行うようにした。これに加えて、ユーザの居場所が「車内」又は「自宅」以外である場合に提案を行うようにしてもよい。例えば、ユーザの居場所が「会社」、「電車」であって所定の時間帯の場合において提案を行うようにしてもよい。 In the above embodiment, the proposal is made when the user's location is “in the car” or “home”. In addition to this, a proposal may be made when the user's whereabouts are other than “in the car” or “home”. For example, the proposal may be made when the user's whereabouts are “company” or “train” and a predetermined time zone.

・上記実施形態では、タイミング学習部３５は、提案が受け入れられたときの報酬を例えば「１」として、状態Ｓ（ｉ）に対して付与するようにした。これ以外に、ユーザとの対話を通じて、提案に対するユーザの感情を推定し、ユーザの感情に応じて報酬を変更するようにしてもよい。例えば、提案が受け入れられたときの提案に対するユーザの感情が肯定的であって「喜び」等が含まれるものであれば報酬を高くし、提案を受け入れたものの提案に対するユーザの感情が「喜び」等の肯定的な感情が含まれないものであれば報酬を低くしてもよい。 In the above embodiment, the timing learning unit 35 gives the reward when the proposal is accepted, for example, as “1” to the state S (i). In addition to this, the user's feelings for the proposal may be estimated through dialogue with the user, and the reward may be changed according to the user's feelings. For example, if the user's emotion to the proposal when the proposal is accepted is positive and includes “joy” or the like, the reward is increased, and the user's emotion to the proposal that has accepted the proposal is “joy”. The reward may be lowered if it does not include positive emotions such as.

・上記実施形態では、強化学習を行うタイミング学習部３５をセンター１２に設けたが、エージェントＥＣＵ１１が強化学習を行うようにしてもよい。
・上記実施形態では、エージェントＥＣＵ１１は、ドライブ情報を提案するようにしたが、電車、自転車、徒歩により目的地又は経由地に向かうための情報（おでかけ情報）を提供するようにしてもよい。 In the above embodiment, the timing learning unit 35 that performs reinforcement learning is provided in the center 12, but the agent ECU 11 may perform reinforcement learning.
In the above embodiment, the agent ECU 11 proposes drive information. However, the agent ECU 11 may provide information (outing information) for traveling to a destination or waypoint by train, bicycle, or walking.

・上記実施形態では、状態空間を構成する状態を、ユーザ状態及びユーザ行動によって定めた。これに代えて、状態を、ユーザ状態のみで定めるようにしてもよい。
・上記実施形態では、強化学習を行うタイミング学習部３５をセンター１２に設けた。これに代えて、タイミング学習部３５を、エージェントＥＣＵ１１に設けるようにしてもよい。 -In the above-mentioned embodiment, the state which constitutes state space was defined by user state and user action. Instead of this, the state may be determined only by the user state.
In the above embodiment, the timing learning unit 35 that performs reinforcement learning is provided in the center 12. Instead of this, the timing learning unit 35 may be provided in the agent ECU 11.

・上記実施形態では、エージェントＥＣＵ１１を、携帯情報端末が備えるものとしたが、これに代えて、エージェントＥＣＵ１１を、車両に設けられたナビゲーションシステム等の車載機器５４に設けてもよい。この場合、車内でのユーザ状態及びユーザ行動と、提案結果とが学習される。又は、エージェントＥＣＵ１１を、家電機器５５に設けてもよい。この場合、自宅でのユーザ状態及びユーザ行動と、提案結果とが学習される。また、車載機器５４に設けられたエージェントＥＣＵ１１からの情報及び家電機器５５にも受けられたエージェントＥＣＵ１１からの情報をセンター１２が統合するようにしてもよい。若しくは、エージェントＥＣＵ１１を、携帯情報端末、車載機器５４、及び家電機器５５以外の装置又はシステムに設けるようにしてもよい。 In the above embodiment, the agent ECU 11 is provided in the portable information terminal, but instead, the agent ECU 11 may be provided in an in-vehicle device 54 such as a navigation system provided in the vehicle. In this case, the user state and user behavior in the vehicle and the proposal result are learned. Alternatively, the agent ECU 11 may be provided in the home appliance 55. In this case, the user state and user behavior at home and the proposal result are learned. Further, the center 12 may integrate the information from the agent ECU 11 provided in the in-vehicle device 54 and the information from the agent ECU 11 received by the home appliance 55. Alternatively, the agent ECU 11 may be provided in a device or system other than the portable information terminal, the in-vehicle device 54, and the home appliance 55.

１０…情報提供システム、１１…エージェントＥＣＵ、１２…センター、１５，３０…通信部、１６…画像認識部、１７…ユーザ状態推定部、１８…ユーザ行動推定部、２０…タイミング判定部、２５…対話制御部、２６…音声合成部、２７…音声認識部、３１…ユーザ状態学習部、３２…ユーザ行動学習部、３３…学習結果記憶部、３５…タイミング学習部、３６…提案履歴記憶部、３８…提案取得部、５０…カメラ、５１…マイク、５２…スピーカ、５４…車載機器、５５…家電機器、１００…強化学習情報。 DESCRIPTION OF SYMBOLS 10 ... Information provision system, 11 ... Agent ECU, 12 ... Center, 15, 30 ... Communication part, 16 ... Image recognition part, 17 ... User state estimation part, 18 ... User action estimation part, 20 ... Timing determination part, 25 ... Dialog control unit, 26 ... voice synthesis unit, 27 ... voice recognition unit, 31 ... user state learning unit, 32 ... user behavior learning unit, 33 ... learning result storage unit, 35 ... timing learning unit, 36 ... suggestion history storage unit, 38 ... Proposal acquisition unit, 50 ... Camera, 51 ... Microphone, 52 ... Speaker, 54 ... In-vehicle device, 55 ... Home appliance, 100 ... Reinforcement learning information.

Claims

A proposal acquisition unit for acquiring a proposal for the user;
A state identifying unit for identifying a state including the user's whereabouts and the user state at the whereabouts as a feature amount;
When a proposal is made, according to a result of acceptance of the proposal, a proposal result learning unit that gives a reward to the state when the proposal is made and sets it as learning information;
The specified state is suitable for the proposal with reference to the state newly specified by the state specifying unit and the learning information learned by the suggestion result learning unit and having a high degree of similarity with the specified state. A proposal timing determination unit that determines whether or not the timing corresponds to a timing and determines that the specified state corresponds to a timing suitable for the proposal;