JP2022045712A

JP2022045712A - Control device of lockup clutch

Info

Publication number: JP2022045712A
Application number: JP2020151449A
Authority: JP
Inventors: 淳田端; Atsushi Tabata; 弘一奥田; Koichi Okuda; 健今村; Takeshi Imamura
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2020-09-09
Filing date: 2020-09-09
Publication date: 2022-03-22

Abstract

To solve a problem in a relationship between hydraulic pressure supplied to a lockup clutch and a sound volume in a cabin, for which it is necessary for a skilled worker to take a large number of man-hours in order to create a map which is suitable for reducing the sound volume in the cabin.SOLUTION: A CPU performs state acquisition processing for acquiring a state s of a vehicle (S12). The CPU calculates an action variable a on the basis of relation regulation data correlated with the state s of the vehicle (S14). The CPU operates a lockup clutch so as to acquire a hydraulic pressure command value P* which is defined by the calculated action variable a (S16). The CPU acquires power transmission efficiency PTE and a sound volume Vsnd as a characteristic c of the vehicle (S18). When the characteristic c of the vehicle satisfies a prescribed standard, the CPU updates an action value function of the relation regulation data by imparting a remuneration which is larger than that in the case that the characteristic does not satisfy the standard (S20).SELECTED DRAWING: Figure 2

Description

本発明は、ロックアップクラッチの制御装置に関する。 The present invention relates to a lockup clutch control device.

特許文献１に記載の車両は、トルクコンバータを備えている。トルクコンバータは、入力側部材と出力側部材とを直接機械的に連結可能なロックアップクラッチを備えている。ロックアップクラッチの状態は、供給される油圧によって、係合状態、スリップ状態及び解放状態のいずれかに制御される。ロックアップクラッチに供給される油圧は、内燃機関からトルクコンバータを経て駆動輪に至るまでの駆動系の振動に起因する車室内での音量が大きくならないように、予め定められたマップに基づいて制御されている。 The vehicle described in Patent Document 1 includes a torque converter. The torque converter includes a lockup clutch that can directly and mechanically connect the input side member and the output side member. The state of the lockup clutch is controlled by the supplied hydraulic pressure to either an engaged state, a slip state, or an released state. The hydraulic pressure supplied to the lockup clutch is controlled based on a predetermined map so that the volume in the passenger compartment does not increase due to the vibration of the drive system from the internal combustion engine to the drive wheels via the torque converter. Has been done.

特開２０１０－１５６３５９号公報Japanese Unexamined Patent Publication No. 2010-156359

特許文献１に記載の車両において、ロックアップクラッチに供給される油圧と車室内での音量との関係は、必ずしも単純な関係になるとは限らないし、他のパラメータによっても影響を受け得る。したがって、車室内での音量を小さくできる好適なマップを作成するためには、熟練者が多くの工数をかける必要がある。 In the vehicle described in Patent Document 1, the relationship between the hydraulic pressure supplied to the lockup clutch and the volume in the vehicle interior is not always a simple relationship, and may be affected by other parameters. Therefore, in order to create a suitable map that can reduce the volume in the vehicle interior, it is necessary for a skilled person to spend a lot of man-hours.

上記課題を解決するため、本発明は、ロックアップクラッチを内蔵するトルクコンバータを備える車両に適用され、前記ロックアップクラッチの油圧指令値を制御する制御装置であって、記憶装置と、実行装置と、を備え、前記記憶装置には、前記車両の状態と前記ロックアップクラッチの操作に関する変数である行動変数との関係を規定するためのデータである関係規定データが記憶されており、前記実行装置は、センサの検出値に基づく前記車両の状態を取得する状態取得処理と、前記状態取得処理によって取得された前記車両の状態と前記関係規定データとに基づき前記行動変数を算出し、算出した前記行動変数によって定まる前記油圧指令値になるように前記ロックアップクラッチを操作する操作処理と、前記操作処理において前記ロックアップクラッチが操作された際の前記車両の特性を取得する特性取得処理と、前記特性取得処理によって取得された前記車両の特性が所定の基準を満たす場合に満たさない場合よりも大きい報酬を与える報酬算出処理と、前記状態取得処理によって取得された前記車両の状態、前記ロックアップクラッチの操作に用いられた前記行動変数の値、及び該操作に対応する前記報酬を予め定められた更新写像への入力とし、前記関係規定データを更新する更新処理と、を実行し、前記更新写像は、前記関係規定データに従って前記ロックアップクラッチが操作される場合の前記報酬についての期待収益を増加させるように更新された前記関係規定データを出力するものであり、前記車両の特性には、前記ロックアップクラッチにおける入力側の動力が出力側に伝達する効率を示す動力伝達効率と、前記車両の車室内での音量と、を含むロックアップクラッチの制御装置である。 In order to solve the above problems, the present invention is applied to a vehicle provided with a torque converter having a built-in lockup clutch, and is a control device for controlling a hydraulic command value of the lockup clutch, which includes a storage device and an execution device. , And the storage device stores relationship regulation data which is data for defining the relationship between the state of the vehicle and the action variable which is a variable related to the operation of the lockup clutch. Calculated the action variable based on the state acquisition process for acquiring the state of the vehicle based on the detection value of the sensor, the state of the vehicle acquired by the state acquisition process, and the relational regulation data. An operation process for operating the lockup clutch so as to have the hydraulic command value determined by an action variable, a characteristic acquisition process for acquiring the characteristics of the vehicle when the lockup clutch is operated in the operation process, and the above-mentioned The reward calculation process that gives a larger reward than when the characteristics of the vehicle acquired by the characteristic acquisition process do not meet the predetermined criteria, the state of the vehicle acquired by the state acquisition process, and the lockup clutch. The value of the action variable used in the operation and the reward corresponding to the operation are input to the predetermined update mapping, and the update process for updating the relevant specified data is executed, and the update mapping is executed. Outputs the relevant regulation data updated so as to increase the expected profit for the reward when the lockup clutch is operated according to the relevant regulation data, and the vehicle characteristics include the above. It is a control device of the lockup clutch including the power transmission efficiency indicating the efficiency of transmitting the power on the input side to the output side in the lockup clutch and the volume in the passenger compartment of the vehicle.

上記構成によれば、ロックアップクラッチの操作に伴う報酬に基づき、強化学習に従った更新写像によって関係規定データを更新する。このことにより、車両の状態と行動変数との関係を適切に設定できる。したがって、車両の状態と行動変数との関係の設定に際して、必ずしも熟練者の手を煩わせることはない。 According to the above configuration, the relevant regulation data is updated by the update mapping according to the reinforcement learning based on the reward associated with the operation of the lockup clutch. This makes it possible to appropriately set the relationship between the state of the vehicle and the behavior variable. Therefore, when setting the relationship between the state of the vehicle and the behavior variable, it does not necessarily bother the expert.

ここで、更新写像の入力には、車室内での音量だけでなく、ロックアップクラッチの動力伝達効率を含んでいる。そのため、車室内での音量のうち、ロックアップクラッチに起因する音量が所定の条件を満たす場合だけでなく動力伝達効率が所定の条件を満たす場合にも大きい報酬が与えられるように学習する。その結果、ロックアップクラッチの操作に関する変数である行動変数を、音量及び動力伝達効率が共に適切な値となるように規定した関係規定データを得られる。 Here, the input of the updated map includes not only the volume in the vehicle interior but also the power transmission efficiency of the lockup clutch. Therefore, it is learned that a large reward is given not only when the volume caused by the lockup clutch satisfies the predetermined condition but also when the power transmission efficiency satisfies the predetermined condition among the volumes in the vehicle interior. As a result, it is possible to obtain the relational regulation data that defines the behavioral variable, which is a variable related to the operation of the lockup clutch, so that both the volume and the power transmission efficiency become appropriate values.

車両及びその制御装置を示す概略図。The schematic which shows the vehicle and the control device thereof. 制御装置が実行する処理の手順を示す流れ図。A flow chart showing the procedure of processing executed by the control device. 制御装置が実行する処理の一部の詳細な手順を示す流れ図。A flow chart showing detailed procedures for some of the processes performed by the controller. 学習領域を示す表。A table showing the learning area.

以下、ロックアップクラッチの制御装置の実施形態について図面を参照して説明する。先ず、ロックアップクラッチの制御装置が適用される車両の全体構成について説明する。
図１に示すように、車両ＶＣには、内燃機関１０が搭載されている。内燃機関１０は、外気を吸入するための吸気通路１２を備えている。吸気通路１２には、弁開度の変更を通じて吸入空気量を調整するスロットルバルブ１４が収容されている。 Hereinafter, embodiments of the lockup clutch control device will be described with reference to the drawings. First, the overall configuration of the vehicle to which the lockup clutch control device is applied will be described.
As shown in FIG. 1, the internal combustion engine 10 is mounted on the vehicle VC. The internal combustion engine 10 includes an intake passage 12 for sucking outside air. The intake passage 12 accommodates a throttle valve 14 that adjusts the intake air amount by changing the valve opening degree.

吸気通路１２のスロットルバルブ１４よりも下流側は、吸気ポート１６を介して気筒１８に接続されている。吸気通路１２のスロットルバルブ１４よりも下流側には、吸気ポート１６に燃料を噴射する燃料噴射弁２０が取り付けられている。 The downstream side of the intake passage 12 from the throttle valve 14 is connected to the cylinder 18 via the intake port 16. A fuel injection valve 20 for injecting fuel into the intake port 16 is attached to the downstream side of the throttle valve 14 of the intake passage 12.

気筒１８の内部には、気筒１８内で往復運動するピストン２２が収容されている。気筒１８内には、ピストン２２によって、燃焼室Ｒが区画されている。気筒１８には、排気ポート２４を介して、排気通路２６が接続されている。排気通路２６には、排気中の一酸化炭素や窒素酸化物等を浄化するための排気浄化触媒２８が設けられている。 Inside the cylinder 18, a piston 22 that reciprocates in the cylinder 18 is housed. A combustion chamber R is defined in the cylinder 18 by a piston 22. An exhaust passage 26 is connected to the cylinder 18 via an exhaust port 24. The exhaust passage 26 is provided with an exhaust purification catalyst 28 for purifying carbon monoxide, nitrogen oxides, and the like in the exhaust.

気筒１８の内部において、吸気ポート１６と排気ポート２４との間には、燃料を点火するための点火プラグ３０が配置されている。また、内燃機関１０は、吸気ポート１６の燃焼室Ｒ側の開口を開閉するための吸気バルブ３２と、排気ポート２４の燃焼室Ｒ側の開口を開閉するための排気バルブ３４とを備えている。 Inside the cylinder 18, an ignition plug 30 for igniting fuel is arranged between the intake port 16 and the exhaust port 24. Further, the internal combustion engine 10 includes an intake valve 32 for opening and closing the opening of the intake port 16 on the combustion chamber R side, and an exhaust valve 34 for opening and closing the opening of the exhaust port 24 on the combustion chamber R side. ..

なお、図１では、気筒１８及びこれに接続された吸気ポート１６及び排気ポート２４等を１組のみ図示しているが、内燃機関１０には、気筒１８及びこれに接続された吸気ポート１６及び排気ポート２４等が複数組設けられている。 Although FIG. 1 shows only one set of the cylinder 18, the intake port 16 connected to the cylinder 18, the exhaust port 24, and the like, the internal combustion engine 10 includes the cylinder 18, the intake port 16 connected to the cylinder 18, and the intake port 16 connected thereto. A plurality of sets of exhaust ports 24 and the like are provided.

また、ピストン２２には、コネクティングロッド３５を介してクランク軸３６が連結されている。燃料噴射弁２０から燃料が噴射されるとともに吸気バルブ３２が開弁されると、燃焼室Ｒに空燃料と空気との混合気が流入する。燃焼室Ｒに流入した混合気は、点火プラグ３０の火花放電に伴って燃焼に供される。この燃焼によって生じたエネルギは、ピストン２２を介してクランク軸３６の回転エネルギに変換される。なお、燃焼に供された混合気は、排気バルブ３４が開弁したときに、排気ポート２４を介して排気通路２６に排出される。 Further, a crank shaft 36 is connected to the piston 22 via a connecting rod 35. When fuel is injected from the fuel injection valve 20 and the intake valve 32 is opened, an air-fuel mixture of empty fuel and air flows into the combustion chamber R. The air-fuel mixture that has flowed into the combustion chamber R is used for combustion as the spark plug 30 discharges sparks. The energy generated by this combustion is converted into the rotational energy of the crank shaft 36 via the piston 22. The air-fuel mixture used for combustion is discharged to the exhaust passage 26 through the exhaust port 24 when the exhaust valve 34 is opened.

クランク軸３６には、トルクコンバータ４０の入力軸４４が接続されている。トルクコンバータ４０の出力軸４６は、自動変速機５０の入力軸５２に接続されている。
詳細は省略するが、トルクコンバータ４０は、流体伝達機構であり、入力軸４４に接続されたポンプインペラと、自動変速機５０の入力軸５２に接続されているタービンインペラとを備えている。トルクコンバータ４０では、流体を介してポンプインペラとタービンインペラとの間でトルク伝達が行われることにより、トルクコンバータ４０の入力軸４４と出力軸４６との間でのトルク伝達が行われる。 The input shaft 44 of the torque converter 40 is connected to the crank shaft 36. The output shaft 46 of the torque converter 40 is connected to the input shaft 52 of the automatic transmission 50.
Although details are omitted, the torque converter 40 is a fluid transmission mechanism and includes a pump impeller connected to the input shaft 44 and a turbine impeller connected to the input shaft 52 of the automatic transmission 50. In the torque converter 40, torque is transmitted between the input shaft 44 and the output shaft 46 of the torque converter 40 by transmitting torque between the pump impeller and the turbine impeller via a fluid.

また、トルクコンバータ４０は、ロックアップクラッチ４２を備えている。ロックアップクラッチ４２は、トルクコンバータ４０の入力軸４４と、出力軸４６とを、直結可能なクラッチである。ロックアップクラッチ４２は、直結状態、解放状態、及びスリップ状態のいずれかの状態をとる。直結状態は、入力軸４４と出力軸４６とが直結された状態である。解放状態は、入力軸４４と出力軸４６との間でロックアップクラッチ４２を介したトルクの伝達が略無い状態である。スリップ状態は、入力軸４４と出力軸４６との間でロックアップクラッチ４２を介したトルクの伝達をしつつ互いに相対回転可能な状態である。ロックアップクラッチ４２は、図示を省略するオイルポンプから供給されるオイルを作動油としている。作動油の圧力は、ロックアップクラッチ４２の油圧制御回路によって調整される。作動油の圧力が調整されることにより、ロックアップクラッチ４２は、上記３つの状態のいずれかに制御される。油圧制御回路は、ソレノイドバルブを備えており、ソレノイドバルブの通電によって、作動油の流通状態や作動油の油圧を制御する回路である。 Further, the torque converter 40 includes a lockup clutch 42. The lockup clutch 42 is a clutch capable of directly connecting the input shaft 44 of the torque converter 40 and the output shaft 46. The lockup clutch 42 takes any of a direct connection state, an release state, and a slip state. The direct connection state is a state in which the input shaft 44 and the output shaft 46 are directly connected. The released state is a state in which torque is not transmitted between the input shaft 44 and the output shaft 46 via the lockup clutch 42. The slip state is a state in which torque can be transmitted between the input shaft 44 and the output shaft 46 via the lockup clutch 42 and can rotate relative to each other. The lockup clutch 42 uses oil supplied from an oil pump (not shown) as hydraulic oil. The hydraulic oil pressure is adjusted by the hydraulic control circuit of the lockup clutch 42. By adjusting the pressure of the hydraulic oil, the lockup clutch 42 is controlled to one of the above three states. The hydraulic pressure control circuit is provided with a solenoid valve, and is a circuit that controls the flow state of hydraulic oil and the hydraulic pressure of hydraulic oil by energizing the solenoid valve.

自動変速機５０は、入力軸５２の回転速度と、出力軸５４の回転速度との比である変速比を可変とする装置である。詳細は省略するが、自動変速機５０は、複数の摩擦係合要素を備えており、これらの摩擦係合要素の係合及び解放によって、変速比が切り替わる。自動変速機５０の出力軸５４には、図示しないディファレンシャルギア等を介して、駆動輪６０が機械的に連結されている。 The automatic transmission 50 is a device that makes the gear ratio variable, which is the ratio between the rotation speed of the input shaft 52 and the rotation speed of the output shaft 54. Although details are omitted, the automatic transmission 50 includes a plurality of friction engaging elements, and the gear ratio is switched by engaging and disengaging these friction engaging elements. The drive wheels 60 are mechanically connected to the output shaft 54 of the automatic transmission 50 via a differential gear or the like (not shown).

車両ＶＣには、制御装置７０が備わっている。制御装置７０は、内燃機関１０を制御対象とし、その制御量であるトルクや排気成分比率等を制御すべく、内燃機関１０の各種操作部を操作する。例えば、制御装置７０は、燃料噴射弁２０からの燃料噴射の停止による内燃機関１０への燃料供給の停止処理であるフューエルカット制御を行う。 The vehicle VC is equipped with a control device 70. The control device 70 targets the internal combustion engine 10 as a control target, and operates various operation units of the internal combustion engine 10 in order to control torque, an exhaust component ratio, and the like, which are control amounts thereof. For example, the control device 70 performs fuel cut control, which is a process of stopping the supply of fuel to the internal combustion engine 10 by stopping the fuel injection from the fuel injection valve 20.

より具体的には、フューエルカット制御は、燃料噴射制御の一環として、例えばアクセル操作量ＡＣＣＰが「０」になっている減速中に、燃料の噴射を停止して燃焼室Ｒへの燃料の供給を停止し、燃料消費率の低減を図る制御である。 More specifically, the fuel cut control is a part of the fuel injection control, for example, during deceleration when the accelerator operation amount ACCP is "0", the fuel injection is stopped and the fuel is supplied to the combustion chamber R. It is a control to stop the fuel consumption rate and reduce the fuel consumption rate.

また、制御装置７０は、自動変速機５０を制御対象とし、自動変速機５０の変速比を制御すべく、自動変速機５０の摩擦係合要素を操作する。さらに、制御装置７０は、トルクコンバータ４０を制御対象とし、ロックアップクラッチ４２の係合状態を制御すべく、ロックアップクラッチ４２を操作する。この実施形態では、制御装置７０は、ロックアップクラッチ４２に対する油圧指令値Ｐ＊を設定する。そして、制御装置７０は、この油圧指令値Ｐ＊を実現するべく油圧制御回路のソレノイドバルブ等を制御することにより、ロックアップクラッチ４２を操作する。なお、図１には、スロットルバルブ１４、燃料噴射弁２０、点火プラグ３０、ロックアップクラッチ４２、及び自動変速機５０のそれぞれの操作信号ＭＳ１～ＭＳ５を記載している。 Further, the control device 70 controls the automatic transmission 50 and operates the friction engagement element of the automatic transmission 50 in order to control the gear ratio of the automatic transmission 50. Further, the control device 70 controls the torque converter 40 and operates the lockup clutch 42 in order to control the engaged state of the lockup clutch 42. In this embodiment, the control device 70 sets the hydraulic pressure command value P * for the lockup clutch 42. Then, the control device 70 operates the lockup clutch 42 by controlling the solenoid valve or the like of the hydraulic pressure control circuit in order to realize the hydraulic pressure command value P *. Note that FIG. 1 shows the operation signals MS1 to MS5 of the throttle valve 14, the fuel injection valve 20, the spark plug 30, the lockup clutch 42, and the automatic transmission 50, respectively.

制御装置７０は、制御量の制御のために、エアフロメータ８０によって検出される吸入空気量Ｇａや、スロットルセンサ８２によって検出されるスロットルバルブ１４の開口度であるスロットル開口度ＴＡ、クランク角センサ８４の出力信号Ｓｃｒを参照する。また、制御装置７０は、アクセルセンサ８８によって検出されるアクセルペダル８６の踏み込み量であるアクセル操作量ＡＣＣＰや、加速度センサ９０によって検出される車両ＶＣの前後方向の加速度Ｇを参照する。さらに、車速センサ９２によって検出される車両ＶＣの車速Ｖや、水温センサ９４によって検出されるエンジン水温Ｔｗ、マイク９６によって検出されるこもり音の大きさを示す音量Ｖｓｎｄを参照する。 The control device 70 has an intake air amount Ga detected by the air flow meter 80, a throttle opening degree TA which is the opening degree of the throttle valve 14 detected by the throttle sensor 82, and a crank angle sensor 84 for controlling the control amount. Refer to the output signal Scr of. Further, the control device 70 refers to the accelerator operation amount ACCP, which is the amount of depression of the accelerator pedal 86 detected by the accelerator sensor 88, and the acceleration G in the front-rear direction of the vehicle VC detected by the acceleration sensor 90. Further, the vehicle speed V of the vehicle VC detected by the vehicle speed sensor 92, the engine water temperature Tw detected by the water temperature sensor 94, and the volume Vsnd indicating the loudness of the muffled sound detected by the microphone 96 are referred to.

なお、マイク９６は、例えば車両ＶＣの運転席近傍に取り付けられている。すなわち、マイク９６は、車室内の音量Ｖｓｎｄを検知する。また、マイク９６は、内燃機関１０の駆動に起因するこもり音を検知する。こもり音は、内燃機関１０や、内燃機関１０から駆動輪までの動力伝達系の振動に起因して車室内に発生する音であり、周波数帯域が２０Ｈｚ以上２５０Ｈｚ以下の低周波音である。マイク９６は、例えば２０Ｈｚ以上２５０Ｈｚ以下の周波数帯域を透過させて他の周波数帯域を遮断するバンドパスフィルタ処理等を行うことにより、上記こもり音を検知する。 The microphone 96 is attached, for example, near the driver's seat of the vehicle VC. That is, the microphone 96 detects the volume Vsnd in the vehicle interior. Further, the microphone 96 detects a muffled sound caused by driving the internal combustion engine 10. The muffled sound is a sound generated in the vehicle interior due to the vibration of the internal combustion engine 10 and the power transmission system from the internal combustion engine 10 to the drive wheels, and is a low frequency sound having a frequency band of 20 Hz or more and 250 Hz or less. The microphone 96 detects the muffled sound by, for example, performing a bandpass filter process for transmitting a frequency band of 20 Hz or more and 250 Hz or less and blocking other frequency bands.

また、制御装置７０は、入力側回転速度センサ９８によって検出されるトルクコンバータ４０の入力軸４４の回転速度ＲＳ１と、出力側回転速度センサ９９によって検出されるトルクコンバータ４０の出力軸４６の回転速度ＲＳ２と、を参照する。 Further, the control device 70 includes the rotation speed RS1 of the input shaft 44 of the torque converter 40 detected by the input side rotation speed sensor 98 and the rotation speed of the output shaft 46 of the torque converter 40 detected by the output side rotation speed sensor 99. Refer to RS2 and.

また、制御装置７０は、フューエルカット制御をしているか否かを示す信号Ｖｆｃを参照する。信号Ｖｆｃは、制御装置７０がフューエルカット制御を行う場合にオン状態を示し、制御装置７０がフューエルカット制御を行わない場合にオフ状態を示す。なお、信号Ｖｆｃは、燃料噴射弁２０の操作信号ＭＳ２に含まれており、制御装置７０内にて生成される信号である。 Further, the control device 70 refers to a signal Vfc indicating whether or not fuel cut control is performed. The signal Vfc indicates an on state when the control device 70 performs fuel cut control, and indicates an off state when the control device 70 does not perform fuel cut control. The signal Vfc is included in the operation signal MS2 of the fuel injection valve 20 and is a signal generated in the control device 70.

制御装置７０は、実行装置であるＣＰＵ７２及びＲＯＭ７４、電気的に書き換え可能な不揮発性メモリである記憶装置７６、及び周辺回路７８を備え、それらが内部バス７９を介して通信可能とされている。ここで、周辺回路７８は、内部の動作を規定するクロック信号を生成する回路や、電源回路、リセット回路等を含む。制御装置７０は、ＲＯＭ７４に記憶されたプログラムをＣＰＵ７２が実行することにより制御量を制御する。 The control device 70 includes a CPU 72 and a ROM 74 which are execution devices, a storage device 76 which is an electrically rewritable non-volatile memory, and a peripheral circuit 78, and they can communicate with each other via an internal bus 79. Here, the peripheral circuit 78 includes a circuit that generates a clock signal that defines the internal operation, a power supply circuit, a reset circuit, and the like. The control device 70 controls the control amount by executing the program stored in the ROM 74 by the CPU 72.

次に、ロックアップクラッチ４２の係合状態を制御するために、制御装置７０が実行する処理の手順を説明する。図２に示す処理は、ＲＯＭ７４に記憶されている学習プログラムＤＰＬをＣＰＵ７２が例えば所定周期で繰り返し実行することにより実現される。すなわち、ＣＰＵ７２は、学習プログラムＤＰＬに従って、ロックアップクラッチ４２の油圧の学習方法を実行する。なお、以下では、先頭に「Ｓ」が付与された数字によって各処理のステップ番号を表現する。 Next, a procedure of processing executed by the control device 70 in order to control the engaged state of the lockup clutch 42 will be described. The process shown in FIG. 2 is realized by the CPU 72 repeatedly executing the learning program DPL stored in the ROM 74, for example, at a predetermined cycle. That is, the CPU 72 executes the method of learning the hydraulic pressure of the lockup clutch 42 according to the learning program DPL. In the following, the step number of each process is represented by a number prefixed with "S".

図２に示す一連の処理が開始されると、先ず、ＣＰＵ７２は、加速度センサ９０によって検出される加速度Ｇが「０」以上か否かを判定する（Ｓ１０）。そして、この判定結果に基づき、ＣＰＵ７２は、図１に示す記憶装置７６に関係規定データＤＲとして記憶されている第１関係規定データＤＲ１及び第２関係規定データＤＲ２のうち、いずれか一方を選択する。具体的には、加速度センサ９０によって検出される加速度Ｇが「０」以上の場合には、第１関係規定データＤＲ１を選択する。一方で、加速度Ｇが「０」未満の場合には、第２関係規定データＤＲ２を選択する。 When the series of processes shown in FIG. 2 is started, the CPU 72 first determines whether or not the acceleration G detected by the acceleration sensor 90 is “0” or more (S10). Then, based on this determination result, the CPU 72 selects either the first relational regulation data DR1 or the second relational regulation data DR2 stored as the relational regulation data DR in the storage device 76 shown in FIG. .. Specifically, when the acceleration G detected by the acceleration sensor 90 is "0" or more, the first related regulation data DR1 is selected. On the other hand, when the acceleration G is less than "0", the second related regulation data DR2 is selected.

次に、ＣＰＵ７２は、車両ＶＣの状態ｓとしてのアクセル操作量ＡＣＣＰ、車速Ｖ、及びエンジン水温Ｔｗを取得する状態取得処理を行う（Ｓ１２）。状態ｓは、図１に示す記憶装置７６に記憶されている関係規定データＤＲによって行動変数との関係が規定される変数の値である。ここで、本実施形態では、行動変数として、ロックアップクラッチ４２に供給するオイルの圧力である油圧指令値Ｐ＊の補正値ΔＰを例示する。 Next, the CPU 72 performs a state acquisition process for acquiring the accelerator operation amount ACCP, the vehicle speed V, and the engine water temperature Tw as the state s of the vehicle VC (S12). The state s is a value of a variable whose relationship with the action variable is defined by the relational regulation data DR stored in the storage device 76 shown in FIG. Here, in the present embodiment, as an action variable, a correction value ΔP of the hydraulic pressure command value P *, which is the pressure of the oil supplied to the lockup clutch 42, is exemplified.

油圧指令値Ｐ＊は、ベース値Ｐｂａ＊と補正値ΔＰとの和で算出される。ベース値Ｐｂａ＊は、アクセル操作量ＡＣＣＰ、車速Ｖ、及びエンジン水温Ｔｗを入力変数とし、ベース値Ｐｂａ＊を出力変数とするマップデータがＲＯＭ７４に予め記憶されている状態で、ＣＰＵ７２によりマップ演算することにより算出される。 The hydraulic pressure command value P * is calculated by adding the base value Pba * and the correction value ΔP. The base value Pba * is calculated by the CPU 72 in a state where the map data having the accelerator operation amount ACCP, the vehicle speed V, and the engine water temperature Tw as input variables and the base value Pba * as the output variable is stored in the ROM 74 in advance. It is calculated by.

ベース値Ｐｂａ＊は、アクセル操作量ＡＣＣＰが相応に大きい場合、又は車速Ｖが相応に小さい場合には、ロックアップクラッチ４２が解放状態となるように、小さい値として算出される。一方で、ベース値Ｐｂａ＊は、アクセル操作量ＡＣＣＰが相応に小さく、且つ車速Ｖが相応に大きい場合には、ロックアップクラッチ４２が直結状態となるように、大きい値として算出される。また、ベース値Ｐｂａ＊は、アクセル操作量ＡＣＣＰ及び車速Ｖによっては、ロックアップクラッチ４２がスリップ状態となるように算出される。ベース値Ｐｂａ＊は、エンジン水温Ｔｗが小さいほど大きい値となるように補正される。なお、油圧指令値Ｐ＊は、変速段毎に異なるマップデータによりマップ演算される。 The base value Pba * is calculated as a small value so that the lockup clutch 42 is in the released state when the accelerator operation amount ACCP is correspondingly large or the vehicle speed V is correspondingly small. On the other hand, the base value Pba * is calculated as a large value so that the lockup clutch 42 is in a directly connected state when the accelerator operation amount ACCP is correspondingly small and the vehicle speed V is correspondingly large. Further, the base value Pba * is calculated so that the lockup clutch 42 is in a slip state depending on the accelerator operation amount ACCP and the vehicle speed V. The base value Pba * is corrected so that the smaller the engine water temperature Tw, the larger the value. The hydraulic pressure command value P * is calculated by map data different for each shift stage.

そして、関係規定データＤＲは、行動価値関数Ｑを含む。行動価値関数Ｑは、状態ｓ及び行動ａを独立変数とし、それら状態ｓ及び行動ａに対して期待される収益を従属変数とする関数である。本実施形態では、行動価値関数Ｑを、テーブル形式の関数とする。なお、本実施形態では、第１関係規定データＤＲ１と第２関係規定データＤＲ２とは異なる別のデータである。したがって、第１関係規定データＤＲ１及び第２関係規定データＤＲ２は、いずれも行動価値関数Ｑを有しているが、各関係規定データＤＲの行動価値関数Ｑは互いに連動しない独立した値である。 Then, the relational regulation data DR includes the action value function Q. The action value function Q is a function in which the state s and the action a are independent variables, and the expected profit for the state s and the action a is the dependent variable. In this embodiment, the action value function Q is a table-type function. In this embodiment, the first relational regulation data DR1 and the second relational regulation data DR2 are different data. Therefore, the first relational regulation data DR1 and the second relational regulation data DR2 both have an action value function Q, but the behavioral value function Q of each relational regulation data DR is an independent value that is not linked to each other.

次に、ＣＰＵ７２は、関係規定データＤＲによって規定される方策πに基づき、行動変数の値、すなわち油圧指令値Ｐ＊の補正値ΔＰを算出する（Ｓ１４）。本実施形態では、方策として、εグリーディ方策を例示する。すなわち、状態ｓが与えられたときに、独立変数が与えられた状態ｓとなる行動価値関数Ｑのうち最大となる行動であるグリーディ行動ａｇを優先的に選択しつつも、所定の確率で、それ以外の行動を選択する規則を定める方策を例示する。具体的には、行動がとりうる値の総数を「｜Ａ｜」にて表記する場合、グリーディ行動以外の行動をとる確率を、それぞれ「ε／｜Ａ｜」とする。 Next, the CPU 72 calculates the value of the action variable, that is, the correction value ΔP of the hydraulic pressure command value P *, based on the policy π defined by the relational regulation data DR (S14). In this embodiment, the ε-greedy policy is exemplified as the policy. That is, when the state s is given, the greedy action ag, which is the maximum action among the action value functions Q in which the independent variable is given, is preferentially selected, but with a predetermined probability. Illustrate measures to establish rules for selecting other actions. Specifically, when the total number of possible values of an action is expressed by "| A |", the probability of taking an action other than the greedy action is "ε / | A |", respectively.

ちなみに、本実施形態では行動価値関数Ｑをテーブル形式のデータとしていることに鑑み、独立変数としての状態ｓは、一定の幅を有するものとする。すなわち、例えばアクセル操作量ＡＣＣＰについては１０％間隔で行動価値関数Ｑを定義する場合、アクセル操作量ＡＣＣＰが「３％」である場合と、「６％」である場合とは、それのみによって異なる状態ｓとされることはない。 Incidentally, in view of the fact that the action value function Q is used as tabular data in the present embodiment, the state s as an independent variable has a certain range. That is, for example, when the action value function Q is defined at 10% intervals for the accelerator operation amount ACCP, the case where the accelerator operation amount ACCP is "3%" and the case where the accelerator operation amount ACCP is "6%" differ only by that. It is not considered to be the state s.

次にＣＰＵ７２は、油圧指令値Ｐ＊をベース値Ｐｂａ＊と補正値ΔＰとの加算によって算出し、ソレノイドバルブの通電電流Ｉが油圧指令値Ｐ＊に基づき定まる値となるように通電電流Ｉを操作する操作処理を行う（Ｓ１６）。 Next, the CPU 72 calculates the hydraulic pressure command value P * by adding the base value Pba * and the correction value ΔP, and sets the energization current I so that the energization current I of the solenoid valve becomes a value determined based on the hydraulic pressure command value P *. Perform the operation process to be operated (S16).

そして、ＣＰＵ７２は、車両ＶＣの特性ｃを取得する特性取得処理を行う（Ｓ１８）。本実施形態では、ＣＰＵ７２は、車両ＶＣの特性ｃとして、動力伝達効率ＰＴＥ、音量Ｖｓｎｄ、フューエルカットの有無を示す信号Ｖｆｃ、加速度Ｇを取得する。 Then, the CPU 72 performs a characteristic acquisition process for acquiring the characteristic c of the vehicle VC (S18). In the present embodiment, the CPU 72 acquires the power transmission efficiency PTE, the volume Vsnd, the signal Vfc indicating the presence or absence of the fuel cut, and the acceleration G as the characteristics c of the vehicle VC.

動力伝達効率ＰＴＥは、ロックアップクラッチ４２の係合状態によって、ロックアップクラッチ４２の入力側の動力が、ロックアップクラッチ４２の出力側に伝達する効率を示す値である。動力伝達効率ＰＴＥは、Ｓ１６の処理を行った際の、ロックアップクラッチ４２の入力側の回転速度ＲＳ１と、出力側の回転速度ＲＳ２との差が大きいほど小さくなる。例えば、動力伝達効率ＰＴＥが、トルクコンバータ４０の入力軸４４の回転速度ＲＳ１に対するトルクコンバータ４０の出力軸４６の回転速度ＲＳ２の割合として算出されることで、ＣＰＵ７２は、動力伝達効率ＰＴＥを取得する。 The power transmission efficiency PTE is a value indicating the efficiency at which the power on the input side of the lockup clutch 42 is transmitted to the output side of the lockup clutch 42 depending on the engaged state of the lockup clutch 42. The power transmission efficiency PTE becomes smaller as the difference between the rotation speed RS1 on the input side and the rotation speed RS2 on the output side of the lockup clutch 42 when the processing of S16 is performed is larger. For example, the power transmission efficiency PTE is calculated as the ratio of the rotation speed RS2 of the output shaft 46 of the torque converter 40 to the rotation speed RS1 of the input shaft 44 of the torque converter 40, so that the CPU 72 acquires the power transmission efficiency PTE. ..

次にＣＰＵ７２は、強化学習によって、関係規定データＤＲに対する学習処理を行う（Ｓ２０）。ＣＰＵ７２は、Ｓ２０の処理を完了する場合には、図２に示す一連の処理を一旦終了する。 Next, the CPU 72 performs learning processing on the relational regulation data DR by reinforcement learning (S20). When the CPU 72 completes the process of S20, the CPU 72 temporarily ends the series of processes shown in FIG.

図３に、Ｓ２０の処理の詳細を示す。
図３に示す一連の処理において、ＣＰＵ７２は、先ず、学習領域を判定する（Ｓ３０）。図４に示すように、Ｓ１２の取得処理によって取得したアクセル操作量ＡＣＣＰ及び車速Ｖを入力変数とし、学習領域を出力変数とするマップデータによって、学習領域を判定する。例えば、アクセル操作量ＡＣＣＰが１５％で、車速Ｖが２５ｋｍ／ｈの場合、学習領域Ｂ３と判定される。なお、アクセル操作量ＡＣＣＰ及び車速Ｖは、小数点以下の値が四捨五入されて、学習領域を出力変数とするマップデータに入力される。 FIG. 3 shows the details of the processing of S20.
In the series of processes shown in FIG. 3, the CPU 72 first determines the learning area (S30). As shown in FIG. 4, the learning area is determined by the map data in which the accelerator operation amount ACCP and the vehicle speed V acquired by the acquisition process of S12 are used as input variables and the learning area is used as an output variable. For example, when the accelerator operation amount ACCP is 15% and the vehicle speed V is 25 km / h, it is determined to be the learning area B3. The accelerator operation amount ACCP and the vehicle speed V are rounded off to the nearest whole number and input to the map data using the learning area as an output variable.

次に、ＣＰＵ７２は、Ｓ３０において判定された学習領域が学習する領域か否かを判定する（Ｓ３２）。図４に示す学習領域のうち、領域ＮＬで示す領域は、学習しない領域である。一方で、領域Ａ１～Ｄ６で示す領域は、学習する学習領域である。なお、図４では、車速Ｖが５０ｋｍ／ｈよりも大きい場合については、図示を省略している。 Next, the CPU 72 determines whether or not the learning area determined in S30 is a learning area (S32). Of the learning areas shown in FIG. 4, the area indicated by the area NL is a non-learning area. On the other hand, the regions shown by the regions A1 to D6 are learning regions to be learned. In FIG. 4, the case where the vehicle speed V is larger than 50 km / h is not shown.

Ｓ３２において、学習する領域と判定された場合（Ｓ３２：ＹＥＳ）、ＣＰＵ７２は、Ｓ１０において加速度Ｇが「０」以上であると判定されたか、「０」未満であると判定されたかを確認する（Ｓ３４）。換言すれば、ＣＰＵ７２は、第１関係規定データＤＲ１を選択している状態であるか第２関係規定データＤＲ２を選択している状態であるかを判定する。 When it is determined in S32 that the area is to be learned (S32: YES), the CPU 72 confirms whether the acceleration G is determined to be "0" or more or less than "0" in S10 (S32: YES). S34). In other words, the CPU 72 determines whether the first relational regulation data DR1 is selected or the second relational provision data DR2 is selected.

Ｓ３４において、Ｓ１０での判定が肯定であると判定された場合（Ｓ３４：ＹＥＳ）、ＣＰＵ７２は、動力伝達効率ＰＴＥに応じた報酬ｒ１を算出する報酬算出処理を行う（Ｓ３６）。詳しくは、ＣＰＵ７２は、動力伝達効率ＰＴＥが大きい場合に小さい場合よりも報酬ｒ１を大きい値に算出する。 When it is determined in S34 that the determination in S10 is affirmative (S34: YES), the CPU 72 performs a reward calculation process for calculating the reward r1 according to the power transmission efficiency PTE (S36). Specifically, the CPU 72 calculates the reward r1 to a larger value when the power transmission efficiency PTE is large than when it is small.

次に、ＣＰＵ７２は、音量Ｖｓｎｄに応じた報酬ｒ２を算出する報酬算出処理を行う（Ｓ３８）。詳しくは、ＣＰＵ７２は、音量Ｖｓｎｄが小さい場合に大きい場合よりも報酬ｒ２を大きい値に算出する。そして、ＣＰＵ７２は、Ｓ１６の処理において用いた行動に対する報酬ｒに、報酬ｒ１と報酬ｒ２との和を代入する（Ｓ４０）。 Next, the CPU 72 performs a reward calculation process for calculating the reward r2 according to the volume Vsnd (S38). Specifically, the CPU 72 calculates the reward r2 to a larger value than when the volume Vsnd is low and high. Then, the CPU 72 substitutes the sum of the reward r1 and the reward r2 into the reward r for the action used in the process of S16 (S40).

一方で、Ｓ３４において、Ｓ１０での判定が否定であると判定された場合（Ｓ３４：ＮＯ）、ＣＰＵ７２は、フューエルカットの有無を示す信号Ｖｆｃに応じた報酬ｒ３を算出する報酬算出処理を行う（Ｓ４２）。詳しくは、フューエルカットの有無を示す信号Ｖｆｃが、フューエルカットの有を示す場合に無を示す場合よりも報酬ｒ３を大きい値に算出する。 On the other hand, when it is determined in S34 that the determination in S10 is negative (S34: NO), the CPU 72 performs a reward calculation process for calculating the reward r3 according to the signal Vfc indicating the presence or absence of the fuel cut (S34: NO). S42). Specifically, when the signal Vfc indicating the presence or absence of the fuel cut indicates the presence or absence of the fuel cut, the reward r3 is calculated to be larger than the case where the signal Vfc indicates no fuel cut.

次に、ＣＰＵ７２は、加速度Ｇに応じた報酬ｒ４を算出する報酬算出処理を行う（Ｓ４４）。詳しくは、ＣＰＵ７２は、加速度Ｇが大きい場合に小さい場合よりも報酬ｒ４を大きい値に算出する。すなわち、急な減速の場合よりも緩やかな減速の場合に、報酬ｒ４は大きく算出される。そして、ＣＰＵ７２は、Ｓ１６の処理において用いた行動に対する報酬ｒに、報酬ｒ３と報酬ｒ４との和を代入する（Ｓ４６）。 Next, the CPU 72 performs a reward calculation process for calculating the reward r4 according to the acceleration G (S44). Specifically, the CPU 72 calculates the reward r4 to a larger value when the acceleration G is large than when it is small. That is, the reward r4 is calculated to be larger in the case of a gradual deceleration than in the case of a sudden deceleration. Then, the CPU 72 substitutes the sum of the reward r3 and the reward r4 into the reward r for the action used in the process of S16 (S46).

ＣＰＵ７２は、Ｓ４０の処理又はＳ４６の処理が完了すると、Ｓ１４の処理において用いた関係規定データＤＲの行動価値関数Ｑ（ｓ，ａ）を、報酬ｒに基づき更新する更新処理を行う（Ｓ４８）。Ｓ１４の処理において用いた行動価値関数Ｑ（ｓ，ａ）とは、Ｓ１２の処理によって取得した状態ｓとＳ１４の処理によって設定された行動ａとを独立変数とする行動価値関数Ｑ（ｓ，ａ）のことである。すなわち、Ｓ４０の処理を経てＳ４８に至った場合には、第１関係規定データＤＲ１の行動価値関数Ｑ（ｓ，ａ）の更新処理を行う。一方、Ｓ４６の処理を経てＳ４８に至った場合には第２関係規定データＤＲ２の行動価値関数Ｑ（ｓ，ａ）の更新処理を行う。 When the process of S40 or the process of S46 is completed, the CPU 72 performs an update process of updating the action value function Q (s, a) of the relational regulation data DR used in the process of S14 based on the reward r (S48). The action value function Q (s, a) used in the processing of S14 is an action value function Q (s, a) in which the state s acquired by the processing of S12 and the action a set by the processing of S14 are independent variables. ). That is, when S48 is reached through the processing of S40, the action value function Q (s, a) of the first relational regulation data DR1 is updated. On the other hand, when S48 is reached through the processing of S46, the action value function Q (s, a) of the second relational regulation data DR2 is updated.

本実施形態では、方策オフ型のＴＤ法であるいわゆるＱ学習によって行動価値関数Ｑ（ｓ，ａ）を更新する。具体的には、以下の式（ｃ１）にて行動価値関数Ｑ（ｓ，ａ）を更新する。 In this embodiment, the action value function Q (s, a) is updated by so-called Q-learning, which is a policy-off type TD method. Specifically, the action value function Q (s, a) is updated by the following equation (c1).

Ｑ（ｓ，ａ）
←Ｑ＋α・｛ｒ＋γ・ｍａｘＱ（ｓ＋１，Ａ）－Ｑ（ｓ，ａ）｝ …（ｃ１）
ここで、行動価値関数Ｑ（ｓ，ａ）の更新量「α・｛ｒ＋γ・ｍａｘＱ（ｓ＋１，Ａ）－Ｑ（ｓ，ａ）｝」には、割引率γ及び学習率αを用いている。なお、割引率γ及び学習率αは、「０」よりも大きく「１」以下の定数である。また、「ｍａｘＱ（ｓ＋１，ａ）」は、図２に示す一連の処理の次回のＳ１２の処理によって取得されるべき状態ｓ＋１を独立変数とする行動価値関数Ｑのうちの最大値を意味する。 Q (s, a)
← Q + α ・ {r + γ ・ maxQ (s + 1, A) -Q (s, a)} ... (c1)
Here, the discount rate γ and the learning rate α are used for the update amount “α ・ {r + γ ・ maxQ (s + 1, A) −Q (s, a)}” of the action value function Q (s, a). .. The discount rate γ and the learning rate α are constants larger than “0” and less than or equal to “1”. Further, "maxQ (s + 1, a)" means the maximum value of the action value function Q whose independent variable is the state s + 1 to be acquired by the next processing of S12 of the series of processing shown in FIG.

次に、ＣＰＵ７２は、Ｓ４８の処理が完了する場合、図３に示す一連の処理を一旦終了する。ちなみに、車両ＶＣの出荷時における関係規定データＤＲは、車両ＶＣと同一の仕様の試作車等において図２の処理と同様の処理によって学習がなされたデータとする。すなわち、図２の処理は、車両ＶＣの出荷前に設定された油圧指令値Ｐ＊を、車両ＶＣが実際に道路を走行する際に適切な値に強化学習によって更新するための処理である。なお、ＣＰＵ７２は、学習しない領域であった場合（Ｓ３２：ＮＯ）、図３に示す一連の処理を一旦終了する。 Next, when the processing of S48 is completed, the CPU 72 temporarily ends the series of processing shown in FIG. Incidentally, the relational regulation data DR at the time of shipment of the vehicle VC is data that has been learned by the same processing as that of FIG. 2 in a prototype vehicle or the like having the same specifications as the vehicle VC. That is, the process of FIG. 2 is a process for updating the hydraulic pressure command value P * set before the shipment of the vehicle VC to an appropriate value when the vehicle VC actually travels on the road by reinforcement learning. If the area is not learned (S32: NO), the CPU 72 temporarily ends a series of processes shown in FIG.

次に、上記実施形態の作用を説明する。
上記実施形態において、ＣＰＵ７２は、ロックアップクラッチ４２に対する油圧指令値Ｐ＊を算出するうえで、グリーディ行動ａｇを選択してソレノイドバルブの通電電流を操作しつつも、所定の確率でグリーディ行動以外の行動を用いてよりよい油圧指令値Ｐ＊を探索する。そして、ＣＰＵ７２は、油圧指令値Ｐ＊を定めるために利用した行動価値関数ＱをＱ学習によって更新する。 Next, the operation of the above embodiment will be described.
In the above embodiment, in calculating the hydraulic pressure command value P * for the lockup clutch 42, the CPU 72 selects the greedy action ag and operates the energizing current of the solenoid valve, but has a predetermined probability other than the greedy action. Search for a better hydraulic command value P * using action. Then, the CPU 72 updates the action value function Q used for determining the hydraulic pressure command value P * by Q learning.

次に、上記実施形態の効果を説明する。
（１）上記実施形態によれば、油圧指令値Ｐ＊を定めるために利用した行動価値関数ＱをＱ学習によって更新するので、当該学習処理により、アクセル操作量ＡＣＣＰ及び車速Ｖと油圧指令値Ｐ＊との関係を適切に設定できる。そして、この一連の学習処理においては、必ずしも熟練者の手を煩わせることはないので、比較的に簡便にアクセル操作量ＡＣＣＰ及び車速Ｖと油圧指令値Ｐ＊との関係を規定できる。 Next, the effect of the above embodiment will be described.
(1) According to the above embodiment, the action value function Q used to determine the hydraulic pressure command value P * is updated by Q-learning. Therefore, by the learning process, the accelerator operation amount ACCP, the vehicle speed V, and the hydraulic pressure command value P are updated. The relationship with * can be set appropriately. In this series of learning processes, since it does not necessarily bother the hands of a skilled person, the relationship between the accelerator operation amount ACCP and the vehicle speed V and the hydraulic pressure command value P * can be defined relatively easily.

特に、上記実施形態においては、車両ＶＣの特性ｃには、音量Ｖｓｎｄだけでなく、動力伝達効率ＰＴＥが含まれている。動力伝達効率ＰＴＥは、ロックアップクラッチ４２が、ロックアップクラッチ４２として成立するために極めて重要なパラメータである。上記実施形態では、動力伝達効率ＰＴＥが可能な限り大きくなる一方で、音量Ｖｓｎｄが可能な限り小さくなるように、油圧指令値Ｐ＊が算出される。その結果、ロックアップクラッチ４２が本来の機能を失うことなく、音量Ｖｓｎｄが小さくなるように、油圧指令値Ｐ＊が設定される。 In particular, in the above embodiment, the characteristic c of the vehicle VC includes not only the volume Vsnd but also the power transmission efficiency PTE. The power transmission efficiency PTE is an extremely important parameter for the lockup clutch 42 to be established as the lockup clutch 42. In the above embodiment, the hydraulic pressure command value P * is calculated so that the power transmission efficiency PTE is as large as possible while the volume Vsnd is as small as possible. As a result, the hydraulic pressure command value P * is set so that the volume Vsnd is reduced without losing the original function of the lockup clutch 42.

（２）上記実施形態によれば、加速度Ｇが「０」以上の場合には、動力伝達効率ＰＴＥ及び音量Ｖｓｎｄに対して、報酬ｒを与える。音量Ｖｓｎｄは、内燃機関１０由来のこもり音を検知する値であるため、加速度Ｇが「０」以上の場合に、相応に大きくなる。このようにこもり音が発生し得る状況下で、音量Ｖｓｎｄに関する報酬ｒを与えることで、こもり音の低減に関する学習効果を高めることができる。 (2) According to the above embodiment, when the acceleration G is "0" or more, a reward r is given to the power transmission efficiency PTE and the volume Vsnd. Since the volume Vsnd is a value for detecting the muffled sound derived from the internal combustion engine 10, it becomes correspondingly large when the acceleration G is “0” or more. In such a situation where a muffled sound can be generated, the learning effect regarding the reduction of the muffled sound can be enhanced by giving the reward r regarding the volume Vsnd.

（３）上記実施形態によれば、加速度Ｇが「０」未満の場合には、フューエルカット制御の有無及び加速度Ｇに対して、報酬ｒを与える。車両ＶＣが減速している場合には、ロックアップクラッチ４２が解放状態である、すなわち動力伝達効率ＰＴＥが「０」である場合があり得る。また、減速中には内燃機関１０の負荷が小さかったり、ロックアップクラッチ４２が解放状態であったりするので、大きなこもり音が発生する可能性は低い。つまり、減速時には、動力伝達効率ＰＴＥを大きくしたり、音量Ｖｓｎｄを小さくしたりする必要性が低い。そこで、減速時には、音量Ｖｓｎｄ及び動力伝達効率ＰＴＥに代えて、フューエルカットの有無及び加速度Ｇに対して報酬ｒを与えることで、急な減速を抑えつつ、適切にフューエルカット制御を実行できる。 (3) According to the above embodiment, when the acceleration G is less than "0", a reward r is given to the presence / absence of the fuel cut control and the acceleration G. When the vehicle VC is decelerating, the lockup clutch 42 may be in the released state, that is, the power transmission efficiency PTE may be "0". Further, since the load of the internal combustion engine 10 is small and the lockup clutch 42 is in the released state during deceleration, it is unlikely that a loud muffled sound is generated. That is, at the time of deceleration, there is little need to increase the power transmission efficiency PTE or decrease the volume Vsnd. Therefore, at the time of deceleration, fuel cut control can be appropriately executed while suppressing sudden deceleration by giving a reward r to the presence / absence of fuel cut and the acceleration G instead of the volume Vsnd and the power transmission efficiency PTE.

（５）上記実施形態によれば、アクセル操作量ＡＣＣＰが８１％以上である領域は、学習しない領域ＮＬである。アクセル操作量ＡＣＣＰが相応に大きい場合には、ロックアップクラッチ４２を解放状態とする場合が多いため、動力伝達効率ＰＴＥを算出しても「０」と算出される。このような場合に学習が進んで、動力伝達効率ＰＴＥに基づいてロックアップクラッチ４２に対する油圧指令値Ｐ＊の学習が特異な状況に適合されることを回避できる。 (5) According to the above embodiment, the region where the accelerator operation amount ACCP is 81% or more is the non-learning region NL. When the accelerator operation amount ACCP is correspondingly large, the lockup clutch 42 is often in the released state, so even if the power transmission efficiency PTE is calculated, it is calculated as "0". In such a case, the learning progresses, and it is possible to avoid that the learning of the hydraulic pressure command value P * for the lockup clutch 42 is adapted to a peculiar situation based on the power transmission efficiency PTE.

（６）上記実施形態によれば、車速Ｖが１０ｋｍ／ｈ以下である領域は、学習しない領域ＮＬである。車速Ｖが相応に小さい場合には、ロックアップクラッチ４２を解放状態とする場合が多いため、動力伝達効率ＰＴＥを算出しても「０」と算出される。このような場合に学習が進んで、動力伝達効率ＰＴＥに基づいてロックアップクラッチ４２に対する油圧指令値Ｐ＊の学習が特異な状況に適合されることを回避できる。 (6) According to the above embodiment, the region where the vehicle speed V is 10 km / h or less is the region NL which is not learned. When the vehicle speed V is correspondingly small, the lockup clutch 42 is often released, so even if the power transmission efficiency PTE is calculated, it is calculated as "0". In such a case, the learning progresses, and it is possible to avoid that the learning of the hydraulic pressure command value P * for the lockup clutch 42 is adapted to a peculiar situation based on the power transmission efficiency PTE.

（７）上記実施形態によれば、ロックアップクラッチ４２に対する油圧指令値Ｐ＊を算出するうえで、ベース値Ｐｂａ＊は、エンジン水温Ｔｗが小さいほど大きい値となるように補正される。エンジン水温Ｔｗが大きいほど、ロックアップクラッチ４２の入力側の回転速度が大きくなる。そのため、ロックアップクラッチ４２をスリップ状態としたときに、同一の油圧指令値Ｐ＊であっても、エンジン水温Ｔｗが大きいほど、ロックアップクラッチ４２の出力側の回転速度が、過度に大きくなる虞がある。そこで、エンジン水温Ｔｗが大きいほど、ベース値Ｐｂａ＊を小さい値となるように補正することで、ロックアップクラッチ４２の入力側と出力側との回転速度の差を大きくできる。その結果、ロックアップクラッチ４２の入力側の回転速度が大きくなっても、ロックアップクラッチ４２の出力側の回転速度が過度に大きくなることを抑制できる。 (7) According to the above embodiment, in calculating the hydraulic pressure command value P * for the lockup clutch 42, the base value Pba * is corrected so that the smaller the engine water temperature Tw, the larger the value. The larger the engine water temperature Tw, the higher the rotation speed on the input side of the lockup clutch 42. Therefore, when the lockup clutch 42 is in the slip state, the rotation speed on the output side of the lockup clutch 42 may become excessively large as the engine water temperature Tw is larger even if the hydraulic pressure command value is the same P *. There is. Therefore, by correcting the base value Pba * to be smaller as the engine water temperature Tw is larger, the difference in rotational speed between the input side and the output side of the lockup clutch 42 can be increased. As a result, even if the rotation speed on the input side of the lockup clutch 42 increases, it is possible to prevent the rotation speed on the output side of the lockup clutch 42 from becoming excessively high.

なお、本実施形態は、以下のように変更して実施することができる。本実施形態及び以下の変更例は、技術的に矛盾しない範囲で互いに組み合わせて実施することができる。
・上記実施形態において、関係規定データＤＲに基づく行動変数の値の選択に用いられる車両ＶＣの状態ｓは、上記実施形態において例示したものに限られない。例えば、内燃機関１０の負荷が含まれていてもよい。 In addition, this embodiment can be changed and carried out as follows. The present embodiment and the following modified examples can be implemented in combination with each other within a technically consistent range.
-In the above embodiment, the state s of the vehicle VC used for selecting the value of the action variable based on the relational regulation data DR is not limited to that exemplified in the above embodiment. For example, the load of the internal combustion engine 10 may be included.

・上記実施形態における行動変数は、ロックアップクラッチ４２に供給するオイルの圧力である油圧指令値Ｐ＊の補正値ΔＰに限られない。例えば、行動変数が、油圧指令値Ｐ＊そのものや、ソレノイドバルブへの通電電流の指令値や、指令値の変化速度であってもよい。これらの場合であっても、行動変数によって直接的又は間接的に油圧指令値Ｐ＊が定まることには違いない。 The action variable in the above embodiment is not limited to the correction value ΔP of the hydraulic pressure command value P *, which is the pressure of the oil supplied to the lockup clutch 42. For example, the action variable may be the hydraulic pressure command value P * itself, the command value of the energizing current to the solenoid valve, or the rate of change of the command value. Even in these cases, the hydraulic pressure command value P * must be determined directly or indirectly by the action variable.

・関係規定データＤＲについて、上記実施形態では、行動価値関数Ｑを、テーブル形式の関数としたが、これに限られない。例えば、関数近似器を用いてもよい。
・また例えば、関係規定データＤＲについて、行動価値関数Ｑを用いる代わりに、方策πを、状態ｓ及び行動ａを独立変数とし、行動ａをとる確率を従属変数とする関数近似器にて表現し、関数近似器を定めるパラメータを、報酬ｒに応じて更新してもよい。 -Regarding the related regulation data DR, in the above embodiment, the action value function Q is a table-type function, but the present invention is not limited to this. For example, a function approximator may be used.
-For example, for the relational regulation data DR, instead of using the action value function Q, the policy π is expressed by a function approximation device in which the state s and the action a are independent variables and the probability of taking the action a is the dependent variable. , The variables that determine the function approximator may be updated according to the reward r.

・上記実施形態において、更新処理においては、方策オフ型ＴＤ法であるいわゆるＱ学習を例示したが、これに限られない。例えば、方策オン型ＴＤ法であるいわゆるＳＡＲＳＡ法によるものであってもよい。もっとも、ＴＤ法によるものに限らず、例えば、モンテカルロ法を用いたり、適格度トレース法を用いたりしてもよい。 -In the above embodiment, in the update process, so-called Q-learning, which is a policy-off type TD method, is exemplified, but the present invention is not limited to this. For example, it may be based on the so-called SARSA method, which is a policy-on type TD method. However, the method is not limited to the TD method, and for example, the Monte Carlo method or the appropriateness tracing method may be used.

・また例えば、更新写像は、行動価値関数Ｑと方策πとのうちのいずれか一方のみを、報酬ｒによる直接の更新対象とするものに限られない。例えば、アクター・クリティック法のように、行動価値関数Ｑ及び方策πをそれぞれ更新してもよい。また、アクター・クリティック法においては、これに限らず、例えば行動価値関数Ｑに代えて価値関数を更新対象としてもよい。 -For example, the update map is not limited to one in which only one of the action value function Q and the policy π is directly updated by the reward r. For example, the action value function Q and the policy π may be updated, respectively, as in the actor-critic method. Further, in the actor-critic method, the value function is not limited to this, and the value function may be updated instead of the action value function Q, for example.

・上記実施形態において、加速度Ｇ及びフューエルカットの有無を車両ＶＣの特性ｃとして算出する報酬ｒを用いた学習処理を省いてもよい。すなわち、動力伝達効率ＰＴＥと、音量Ｖｓｎｄと、を車両ＶＣの特性ｃとして算出する報酬ｒを用いた学習処理が行われるのであれば他の学習処理は必須でない。なお、加速度Ｇ及びフューエルカットの有無を車両ＶＣの特性ｃとして算出する報酬ｒを用いた学習処理を省く場合、加速度Ｇが「０」未満の場合には学習を行わずに、補正値ΔＰが一定値に固定されていてもよいし、加速度Ｇが「０」以上の場合と同様に学習をしてもよい。さらに、加速度Ｇが「０」未満の場合に、学習を行わない場合や、加速度Ｇが「０」以上の場合と同様に学習する場合には、第２関係規定データＤＲ２を省いてもよい。 -In the above embodiment, the learning process using the reward r that calculates the presence / absence of the acceleration G and the fuel cut as the characteristic c of the vehicle VC may be omitted. That is, if the learning process using the reward r that calculates the power transmission efficiency PTE and the volume Vsnd as the characteristic c of the vehicle VC is performed, other learning processes are not essential. When the learning process using the reward r that calculates the presence / absence of the acceleration G and the fuel cut as the characteristic c of the vehicle VC is omitted, if the acceleration G is less than "0", the correction value ΔP is not performed. It may be fixed to a constant value, or learning may be performed in the same manner as when the acceleration G is “0” or more. Further, when the acceleration G is less than "0" and the learning is not performed, or when the learning is performed in the same manner as when the acceleration G is "0" or more, the second related regulation data DR2 may be omitted.

・動力伝達効率ＰＴＥ及び音量Ｖｓｎｄに加えて他の車両ＶＣの特性ｃに基づいて報酬ｒを算出して、学習処理を行ってもよい。他の車両ＶＣの特性ｃとしては、車両ＶＣの上下加速度、すなわち振動の大きさなどが挙げられる。 -The reward r may be calculated based on the characteristics c of another vehicle VC in addition to the power transmission efficiency PTE and the volume Vsnd, and the learning process may be performed. Examples of the characteristic c of the other vehicle VC include the vertical acceleration of the vehicle VC, that is, the magnitude of vibration and the like.

・上記実施形態において、加速度Ｇが「０」未満の場合であっても、音量Ｖｓｎｄに応じた報酬ｒ２を算出する報酬算出処理を行ってもよい。
・上記実施形態において、Ｓ３０において判定する学習領域の分け方は、上記実施形態の例に限られない。例えば、アクセル操作量ＡＣＣＰ及び車速Ｖに加えて、自動変速機５０の作業油の温度毎に分けられていてもよい。 -In the above embodiment, even when the acceleration G is less than "0", the reward calculation process for calculating the reward r2 according to the volume Vsnd may be performed.
-In the above embodiment, the method of dividing the learning area to be determined in S30 is not limited to the example of the above embodiment. For example, in addition to the accelerator operation amount ACCP and the vehicle speed V, the temperature of the work oil of the automatic transmission 50 may be divided.

・上記実施形態において、Ｓ３２の処理を省いてもよい。すなわち、全ての領域において、更新処理を行ってもよい。
・上記実施形態では、ロックアップクラッチの制御装置として、車両ＶＣを制御する制御装置７０を例示したが、これに限られない。例えば、車両ＶＣの外部に制御装置を備える車両ＶＣ用の制御システムにおいて、当該車両ＶＣの外部の制御装置が、Ｓ２０の学習処理を実行してもよい。このような車両ＶＣの外部の制御装置は、例えば、車両ＶＣからデータを取得できればよく、データ解析センターや、ユーザの携帯端末等であってもよい。 -In the above embodiment, the processing of S32 may be omitted. That is, the update process may be performed in all areas.
-In the above embodiment, the control device 70 for controlling the vehicle VC is exemplified as the control device for the lockup clutch, but the present invention is not limited to this. For example, in a control system for a vehicle VC provided with a control device outside the vehicle VC, the control device outside the vehicle VC may execute the learning process of S20. Such an external control device of the vehicle VC may be, for example, a data analysis center, a user's mobile terminal, or the like, as long as data can be acquired from the vehicle VC.

・実行装置は、ＣＰＵ７２と、ＲＯＭ７４とを備えて、ソフトウェア処理を実行するものに限られない。例えば、上記実施形態においてソフトウェア処理されたものの少なくとも一部を、ハードウェア処理する例えばＡＳＩＣ等の専用のハードウェア回路を備えていてもよい。すなわち、実行装置は、以下の（ａ）～（ｃ）のいずれかの構成であればよい。（ａ）上記処理の全てを、プログラムに従って実行する処理装置と、プログラムを記憶するＲＯＭ等のプログラム格納装置とを備える。（ｂ）上記処理の一部をプログラムに従って実行する処理装置およびプログラム格納装置と、残りの処理を実行する専用のハードウェア回路とを備える。（ｃ）上記処理の全てを実行する専用のハードウェア回路を備える。ここで、処理装置およびプログラム格納装置を備えたソフトウェア実行装置や、専用のハードウェア回路は複数であってもよい。 -The execution device is not limited to the one that includes the CPU 72 and the ROM 74 and executes software processing. For example, a dedicated hardware circuit such as an ASIC that performs hardware processing on at least a part of what has been software-processed in the above embodiment may be provided. That is, the execution device may have any of the following configurations (a) to (c). (A) A processing device that executes all of the above processing according to a program and a program storage device such as a ROM for storing the program are provided. (B) A processing device and a program storage device that execute a part of the above processing according to a program, and a dedicated hardware circuit for executing the remaining processing are provided. (C) A dedicated hardware circuit for executing all of the above processes is provided. Here, there may be a plurality of software execution devices including a processing device and a program storage device, and a plurality of dedicated hardware circuits.

・コンピュータとしては、ＣＰＵ７２に限らない。たとえば、車両ＶＣの出荷前の関係規定データＤＲを生成するためのコンピュータと、車両ＶＣに搭載されるＣＰＵ７２とであってもよい。ちなみに、車両の出荷前の関係規定データＤＲの生成処理においては、車両が存在せず、テストベンチにて内燃機関１０等を稼働させて車両の走行を模擬することによって、車両の状態を疑似的に生成し、センサの検出値等によって疑似的に生成された車両の状態を把握しつつ強化学習に用いてもよい。その場合、疑似的に生成された車両の状態を、センサの検出値に基づく車両の状態とみなす。 -The computer is not limited to the CPU 72. For example, it may be a computer for generating the relational regulation data DR before shipment of the vehicle VC and a CPU 72 mounted on the vehicle VC. By the way, in the process of generating the related regulation data DR before the vehicle is shipped, the vehicle does not exist, and the internal combustion engine 10 or the like is operated on the test bench to simulate the running of the vehicle, thereby simulating the state of the vehicle. It may be generated in the above and used for reinforcement learning while grasping the state of the vehicle pseudo-generated by the detection value of the sensor or the like. In that case, the pseudo-generated vehicle state is regarded as the vehicle state based on the detection value of the sensor.

・上記実施形態では、関係規定データＤＲが記憶される記憶装置７６と、学習プログラムＤＰＬが記憶されるＲＯＭ７４とを別の記憶装置としたが、これに限らない。
・車両ＶＣとしては、駆動源として、モータジェネレータを備えるものであってもよい。 In the above embodiment, the storage device 76 in which the related regulation data DR is stored and the ROM 74 in which the learning program DPL is stored are used as separate storage devices, but the present invention is not limited to this.
-The vehicle VC may be provided with a motor generator as a drive source.

１０…内燃機関
４０…トルクコンバータ
４２…ロックアップクラッチ
７０…制御装置
７２…ＣＰＵ
７４…ＲＯＭ
７６…記憶装置
ＤＲ…関係規定データ
Ｐ＊…油圧指令値
ＶＣ…車両 10 ... Internal combustion engine 40 ... Torque converter 42 ... Lockup clutch 70 ... Control device 72 ... CPU
74 ... ROM
76 ... Storage device DR ... Related regulation data P * ... Hydraulic pressure command value VC ... Vehicle

Claims

Applies to vehicles with torque converters with built-in lockup clutch,
A control device that controls the hydraulic pressure command value of the lockup clutch.
Equipped with a storage device and an execution device,
The storage device stores relationship regulation data, which is data for defining the relationship between the state of the vehicle and the action variable, which is a variable related to the operation of the lockup clutch.
The execution device is
A state acquisition process for acquiring the state of the vehicle based on the detection value of the sensor, and
An operation of calculating the action variable based on the state of the vehicle acquired by the state acquisition process and the related regulation data, and operating the lockup clutch so as to have the hydraulic pressure command value determined by the calculated action variable. Processing and
In the operation process, the characteristic acquisition process for acquiring the characteristics of the vehicle when the lockup clutch is operated, and the characteristic acquisition process.
A reward calculation process that gives a larger reward than when the characteristics of the vehicle acquired by the characteristic acquisition process do not meet the predetermined criteria, and
The state of the vehicle acquired by the state acquisition process, the value of the action variable used for the operation of the lockup clutch, and the reward corresponding to the operation are input to a predetermined update map, and the above is described. Update processing to update related regulation data and
And run
The updated map outputs the relevant regulation data updated so as to increase the expected return for the reward when the lockup clutch is operated according to the relevant regulation data.
The characteristics of the vehicle include a power transmission efficiency indicating the efficiency at which power on the input side in the lockup clutch is transmitted to the output side, and a volume in the passenger compartment of the vehicle, and a control device for the lockup clutch.