WO2024053566A1 - Information processing device, information processing method, and program - Google Patents

Information processing device, information processing method, and program Download PDF

Info

Publication number
WO2024053566A1
WO2024053566A1 PCT/JP2023/031965 JP2023031965W WO2024053566A1 WO 2024053566 A1 WO2024053566 A1 WO 2024053566A1 JP 2023031965 W JP2023031965 W JP 2023031965W WO 2024053566 A1 WO2024053566 A1 WO 2024053566A1
Authority
WO
WIPO (PCT)
Prior art keywords
vehicle
state
information
relationship
control
Prior art date
Application number
PCT/JP2023/031965
Other languages
French (fr)
Japanese (ja)
Inventor
直輝 山田
広昂 岡崎
昌弘 山田
Original Assignee
三菱重工業株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱重工業株式会社 filed Critical 三菱重工業株式会社
Publication of WO2024053566A1 publication Critical patent/WO2024053566A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60LPROPULSION OF ELECTRICALLY-PROPELLED VEHICLES; SUPPLYING ELECTRIC POWER FOR AUXILIARY EQUIPMENT OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRODYNAMIC BRAKE SYSTEMS FOR VEHICLES IN GENERAL; MAGNETIC SUSPENSION OR LEVITATION FOR VEHICLES; MONITORING OPERATING VARIABLES OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRIC SAFETY DEVICES FOR ELECTRICALLY-PROPELLED VEHICLES
    • B60L15/00Methods, circuits, or devices for controlling the traction-motor speed of electrically-propelled vehicles
    • B60L15/40Adaptation of control equipment on vehicle for remote actuation from a stationary place
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60LPROPULSION OF ELECTRICALLY-PROPELLED VEHICLES; SUPPLYING ELECTRIC POWER FOR AUXILIARY EQUIPMENT OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRODYNAMIC BRAKE SYSTEMS FOR VEHICLES IN GENERAL; MAGNETIC SUSPENSION OR LEVITATION FOR VEHICLES; MONITORING OPERATING VARIABLES OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRIC SAFETY DEVICES FOR ELECTRICALLY-PROPELLED VEHICLES
    • B60L7/00Electrodynamic brake systems for vehicles in general
    • B60L7/10Dynamic electric regenerative braking
    • B60L7/14Dynamic electric regenerative braking for vehicles propelled by ac motors
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60TVEHICLE BRAKE CONTROL SYSTEMS OR PARTS THEREOF; BRAKE CONTROL SYSTEMS OR PARTS THEREOF, IN GENERAL; ARRANGEMENT OF BRAKING ELEMENTS ON VEHICLES IN GENERAL; PORTABLE DEVICES FOR PREVENTING UNWANTED MOVEMENT OF VEHICLES; VEHICLE MODIFICATIONS TO FACILITATE COOLING OF BRAKES
    • B60T8/00Arrangements for adjusting wheel-braking force to meet varying vehicular or ground-surface conditions, e.g. limiting or varying distribution of braking force
    • B60T8/17Using electrical or electronic regulation means to control braking
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B61RAILWAYS
    • B61LGUIDING RAILWAY TRAFFIC; ENSURING THE SAFETY OF RAILWAY TRAFFIC
    • B61L27/00Central railway traffic control systems; Trackside control; Communication systems specially adapted therefor
    • B61L27/20Trackside control of safe travel of vehicle or train, e.g. braking curve calculation
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B61RAILWAYS
    • B61LGUIDING RAILWAY TRAFFIC; ENSURING THE SAFETY OF RAILWAY TRAFFIC
    • B61L27/00Central railway traffic control systems; Trackside control; Communication systems specially adapted therefor
    • B61L27/40Handling position reports or trackside vehicle data
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y10/00Economic sectors
    • G16Y10/40Transportation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y20/00Information sensed or collected by the things
    • G16Y20/20Information sensed or collected by the things relating to the thing itself
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y40/00IoT characterised by the purpose of the information processing
    • G16Y40/30Control

Definitions

  • the present disclosure relates to an information processing device, an information processing method, and a program.
  • This application claims priority to Japanese Patent Application No. 2022-142287 filed in Japan on September 7, 2022, the contents of which are incorporated herein.
  • control device for a vehicle that drives wheels by rotating a motor and travels on a track.
  • This control device controls the motor by outputting a motor torque command value according to the current state of the vehicle.
  • a vehicle equipped with the control device brakes the vehicle using regenerative braking of the motor.
  • the control device may perform braking using an air brake (mechanical brake) provided on the vehicle in parallel.
  • Patent Document 1 and Patent Document 2 disclose techniques that use the above-mentioned motor control function and regenerative brake for braking a vehicle. More specifically, Patent Document 1 discloses a technique for reducing the occurrence of regeneration failure and improving stopping accuracy. Moreover, Patent Document 2 discloses a technique of predicting regeneration failure and calculating a command value of regenerative braking force to obtain a braking force corresponding to the difference between a target braking force and a mechanical braking force.
  • the purpose of this disclosure is to provide an information processing device, an information processing method, and a program that solve the above-mentioned problems.
  • an information processing device indicates information about a state of a vehicle at a certain time and a control input for the vehicle that indicates a motor torque command for the vehicle to stop at a target position in that state.
  • a control model that estimates information on the state of the vehicle at the next time based on the information using a probability distribution is based on the relationship between the information on the state indicated by the braking control of the vehicle performed in the past and the control input, and the relationship between the information and the control input.
  • the evaluation is an evaluation function of an evaluation value regarding the vehicle condition, which is generated using the obtained vehicle condition at the next time, and the evaluation value worsens as the distance to the target position increases.
  • a function is determined, and a policy parameter for which the evaluation value is most improved is input into the policy function to generate a motor torque command for the vehicle in the state of the vehicle at the next time, and a trial is performed using the generated motor torque command.
  • the control model is updated using the relationship between the resulting vehicle state information and the control input, and the relationship between the vehicle state information used to generate the control model and the control input.
  • an information processing method indicates information on a state of a vehicle at a certain time and a control input for the vehicle that indicates a motor torque command for the vehicle to stop at a target position in that state.
  • a control model that estimates information on the state of the vehicle at the next time based on the information using a probability distribution is based on the relationship between the information on the state indicated by the braking control of the vehicle performed in the past and the control input, and the relationship between the information and the control input.
  • the evaluation is an evaluation function of an evaluation value regarding the vehicle condition, which is generated using the obtained vehicle condition at the next time, and the evaluation value worsens as the distance to the target position increases.
  • a function is determined, and a policy parameter for which the evaluation value is most improved is input into the policy function to generate a motor torque command for the vehicle in the state of the vehicle at the next time, and a trial is performed using the generated motor torque command.
  • the control model is updated using the relationship between the resulting vehicle state information and the control input, and the relationship between the vehicle state information used to generate the control model and the control input.
  • a program causes a computer of an information processing device to display information about a state of a vehicle at a certain time and a motor torque command for the vehicle to stop the vehicle at a target position in that state.
  • a control model that estimates information on the state of the vehicle at the next time using a probability distribution based on information indicating the control input is created by combining the information on the state indicated by the braking control of the vehicle performed in the past and the control input.
  • the control model is made to function as a means for updating the control model.
  • FIG. 1 is a schematic diagram of a vehicle control system including a vehicle control device and a server device according to the present embodiment.
  • FIG. 2 is a block diagram showing a control mechanism including a vehicle control device according to the present embodiment.
  • FIG. 2 is a functional block diagram of a server device according to the present embodiment.
  • FIG. 3 is a diagram showing an overview of processing of the server device according to the present embodiment.
  • FIG. 3 is a diagram showing a processing flow of the server device according to the present embodiment.
  • FIG. 2 is a hardware configuration diagram of a server device according to the present embodiment.
  • FIG. 1 is a schematic diagram of a vehicle control system including a vehicle control device and a server device according to this embodiment.
  • FIG. 2 is a block diagram showing the configuration of a control mechanism including a vehicle control device according to this embodiment.
  • a vehicle 50 partially includes a vehicle control device 1, an inverter 2, and a motor 3 as an example of a control mechanism.
  • the vehicle control device 1 outputs a torque command value Tri (t) according to the state to control the motor.
  • Inverter 2 outputs a current to motor 3 according to torque command value Tri (t) .
  • the motor 3 is driven by a current based on the torque command value Tri (t) .
  • the vehicle control device 1 is communicatively connected to a server device 10, which is one aspect of an information processing device.
  • the vehicle control device 1 controls the self-position px (t) of the vehicle 50, the speed v (t) of the vehicle 50, the pressure pa (t) of the air spring that suppresses the shaking between the bogie of the vehicle 50 and the passenger car, and the motor voltage.
  • Status information indicating the status of the vehicle 50 including V (t) and the torque output Tro (t) of the motor 3 is acquired.
  • the vehicle control device 1 uses the state information and the control model to calculate a torque command value Tri (t) , which is a control input, and outputs it to the inverter 2. Thereby, the vehicle control device 1 controls the vehicle based on the torque command value Tri (t) according to the state of the vehicle. Further, the vehicle control device 1 outputs information on the acquired state information and the torque command value Tri (t) calculated based on the state information to the server device 10.
  • the server device 10 acquires and stores the information.
  • FIG. 3 is a functional block diagram of the server device.
  • the server device 10 performs each function of the learning section 12, the policy evaluation section 13, and the policy improvement section 14 by activating a program stored in advance.
  • the server device 10 also includes a storage unit 11 such as a database.
  • the storage unit 11 stores the self-position px (t) of the vehicle 50, the speed v (t) of the vehicle 50, the pressure pa (t) of the air spring that suppresses the shaking between the bogie of the vehicle 50 and the passenger car, and the motor voltage V (t) , stores the relationship between state information indicating the state of the vehicle 50 including the torque output Tro (t) of the motor 3 and the torque command value Tri (t) in the state of the vehicle indicated by the state information.
  • This stored information is information transmitted from the vehicle control device 1 of the vehicle 50 and recorded.
  • the learning unit 12 performs the following based on initial data indicating information on the state of the vehicle 50 at time t and a control input for the vehicle 50 indicating a motor torque command for the vehicle 50 to stop at the target position in that state.
  • a control model is generated that estimates information on the state of the vehicle 50 at time t+1 using a probability distribution.
  • the policy evaluation unit 13 evaluates the policy using the evaluation function J ⁇ ( ⁇ ).
  • the policy improvement unit 14 searches for a parameter ⁇ that makes the evaluation function J ⁇ ( ⁇ ) small.
  • the policy is updated by the policy improvement unit 14 updating the value of the parameter ⁇ .
  • FIG. 4 is a diagram showing an outline of processing of the server device.
  • the server device 10 is equipped with functions such as PILCO (Probabilistic Inference for Learning Control), which is one type of model reinforcement learning, and performs the following processing.
  • PILCO Probabilistic Inference for Learning Control
  • Model learning The server device 10 provides an initial information indicating the state of the vehicle 50 at time t and a control input for the vehicle 50 indicating a motor torque command for the vehicle 50 to stop at the target position in that state. Based on the data, a control model is generated that estimates information on the state of the vehicle 50 at the next time t+1 using a probability distribution.
  • the server device 10 is an evaluation function of the evaluation value regarding the state of the vehicle 50, and the server device 10 is an evaluation function of the evaluation value regarding the state of the vehicle 50. Optimization calculations are performed using an evaluation function in which the evaluation value worsens as the distance increases.
  • the server device 10 sets the policy parameter that improves the evaluation value most in the policy function, inputs the state information into the policy function, and generates a motor torque command for the state of the vehicle 50 at the next time.
  • the server device 10 instructs the vehicle control device 1 to perform a trial operation of the vehicle 50 based on the generated motor torque command.
  • the server device 10 updates the relationship between the information on the state of the vehicle 50 and the control input, which is the result of the trial in the vehicle control device 1, and the information on the vehicle 50 used to generate the control model.
  • the control model is updated using the state information and the relationship between the control inputs.
  • FIG. 5 is a diagram showing the processing flow of the server device.
  • the learning unit 12 generates a control model using a machine learning method (step S101).
  • the server device 10 determines in advance the relationship between the value of each state indicated by the state information during control of the vehicle 50 and the torque command value Tri (t), which is a control amount output as a control input on the vehicle 50 side in that state. , based on the relationship, information on the state of the vehicle 50 at the next time when the vehicle 50 is driven is linked and stored in large quantities in the storage unit 11 or the like.
  • This torque command value Tri (t) is information when the driver of the vehicle 50 or the like performs control so that regeneration failure does not occur when stopping at the target position.
  • the storage unit 11 stores the state information x (t) , the torque command value Tri (t) which is a control input, and the state information x ( t+1) when the vehicle 50 is controlled using the torque command Tri (t ).
  • the relationship and information on a flag (initial data) indicating whether or not regeneration has expired in that relationship are stored in association with each other.
  • the learning unit 12 determines the relationship between such status information x (t) and status information x (t+1) and the torque command value Tri (t) , which is a control input, and whether or not regeneration failure has occurred in the relationship.
  • the information on the indicated flag and the information on the state of the vehicle 50 at the next time are learned using a method such as Gaussian process regression, and a control model is generated.
  • the control model is based on the state information of the vehicle 50 at time t and the torque command value Tri (t) of the vehicle 50 suitable for stopping at the target position without regeneration failure in each state indicated by the state information. , is a learning model that estimates information on the state of the vehicle 50 at the next time t+1 using a probability distribution.
  • the control model is shown in equation (1).
  • the control model is an example of a dynamics model.
  • x (t) is the state information at time t
  • u (t) is the torque command value Tri (t), which is the control input at time t, which corresponds to equation (2) and equation (3), respectively.
  • indicates noise.
  • x (t+1) is state information at time t+1.
  • N(0, ⁇ ⁇ ) represents a Gaussian distribution with a mean of 0 and a covariance matrix ⁇ ⁇ .
  • the noise ⁇ is determined stochastically according to the Gaussian distribution.
  • the control model allows the distribution of state information at the next time t+1 to be estimated based on the state information x (t) at the current time t and the control input u (t) .
  • the learning unit 12 calculates the value of each state indicating state information acquired in the past under conditions in which regeneration failure is unlikely to occur, and the torque command value Tri (t) which is a control amount output as a control input on the vehicle side in that state.
  • a control model may be learned using the relationship.
  • the learning unit 12 calculates the value of each state indicating state information acquired in the past under conditions where regeneration failure is likely to occur, and the torque command value Tri (t ) may be used to learn the control model.
  • the condition under which regeneration failure is unlikely to occur is when the voltage of the power system is lower than a predetermined threshold value at which regeneration failure is likely to occur.
  • condition where regeneration failure is likely to occur is a case where the voltage of the power system is higher than a predetermined threshold value at which regeneration failure is likely to occur.
  • the policy evaluation unit 13 determines a policy parameter ⁇ that reduces the value of the evaluation function J ⁇ ( ⁇ ) (step S102). In this process, the policy evaluation unit 13 arbitrarily sets the initial value of the parameter ⁇ .
  • the evaluation function J ⁇ ( ⁇ ) is shown in equation (4).
  • Equation (4) c(x (t) ) is expressed by equation (5) and indicates the evaluation value of state information x (t) at time t.
  • H indicates an arbitrarily set timing after that time.
  • E indicates the expected value of the evaluation value c(x (t) ).
  • ⁇ c 2 indicates the variance of the evaluation value c.
  • the policy evaluation unit 13 samples the initial state information x (0) according to a normal distribution N ( ⁇ (0) , ⁇ (0) ).
  • the policy evaluation unit 13 acquires the state information x (t) at time t, calculates the evaluation value c(x (t) ) expected value E, and similarly calculates each evaluation value c(
  • the evaluation function J ⁇ ( ⁇ ) shown in equation (4) is calculated by integrating the expected value E of x (t) ).
  • the policy improvement unit 14 has the function of an RBF (Radial Basis Function) controller, for example, and performs the following processing.
  • the RBF controller is a nonlinear controller having a network structure of a neural network with a Gaussian function in the intermediate layer.
  • the policy improvement unit 14 searches for and updates the policy parameter ⁇ that minimizes the evaluation function J ⁇ ( ⁇ ) calculated by the policy evaluation unit 13 (step S103).
  • the policy improvement unit 14 calculates a policy gradient from the evaluation function J ⁇ ( ⁇ ), and performs optimization calculation using the policy parameter ⁇ that constitutes the policy as a solution search target based on the policy gradient.
  • the policy gradient of the evaluation function J ⁇ ( ⁇ ) can be calculated using equation (7).
  • the policy improvement unit 14 searches for the policy parameter ⁇ using a gradient method, such as backpropagation, in the direction in which the value of the policy gradient becomes the smallest.
  • equation (10) is satisfied in formula (8) and formula (9). Also, in equation (9), I represents a unit matrix.
  • the policy improvement unit 14 inputs the state information x (t) into the policy function ⁇ (x (t) , ⁇ ) using the optimized policy parameter ⁇ , and then adjusts the control input u as shown in equation (12). (t+1) is calculated (step S104).
  • the policy improvement unit 14 outputs the torque command value Tri (t+1 ) indicated by the calculated control input u (t+1) to the vehicle control device 1 (step S105).
  • the vehicle control device 1 outputs the torque command value Tri (t+1) to the inverter 2, and as a result, attempts to control the motor 3.
  • the vehicle 50 operates, and state information x (t+1) at the next time can be observed.
  • the vehicle control device 1 transmits the relationship between the state information x (t) and x (t+1) observed at that time, the control input u (t) , and information on whether regeneration has expired to the server device 10, and the server device 10
  • the information is linked and recorded in the storage unit 11.
  • the presence or absence of regeneration failure is determined by the vehicle control device 1 calculating the difference between torque command value Tri (t) and torque output Tro (t) , and if this difference is greater than or equal to a predetermined threshold value, regeneration failure has occurred, and if it is less than the threshold value, regeneration failure has occurred.
  • the vehicle control device 1 may determine that the registered name has not expired. This recorded information is used to update the control model.
  • the learning unit 12 uses state information x (t) and control input u (t) used in the initial control model, and state information x (t) and control input u (t ) newly recorded after optimization by the policy improvement unit 14. (t) , iterative learning is performed using a method such as Gaussian process regression, and the control model is updated (step S106). Note that the state information x (t) and control input u (t) newly recorded after optimization by the policy improvement unit 14 are also values corresponding to conditions where regeneration failure is unlikely to occur or conditions where regeneration failure is likely to occur. The learning unit 12 may repeatedly learn the state information x (t) and control input u (t) under each environment and update the control model by using a method such as Gaussian process regression.
  • the server device 10 uses the updated control model to calculate a torque command value Tri (t ), which is a control input u (t) according to the state information x (t) , and outputs it to the vehicle control device 1. .
  • the server device 10 determines whether to terminate the process (step S107), and repeats the processes from step S102 to step S106 until an instruction to terminate is received.
  • a method for specifying an appropriate control input (run curve) at each position of the vehicle 50 that allows the vehicle 50 to stop at a target stop position without causing regeneration failure is applied to the vehicle 50 a small number of times. It can be earned through a trial run.
  • the relationship between control input and information on the state of the vehicle 50 when it is stopped with random acceleration added to a constant deceleration may be used.
  • FIG. 6 is a hardware configuration diagram of the server device.
  • the server device 10 includes a CPU (Central Processing Unit) 101, a ROM (Read Only Memory) 102, a RAM (Random Access Memory) 103, a storage unit 104 such as an HDD or SDD, and a communication module 105. It may be equipped with each hardware such as.
  • CPU Central Processing Unit
  • ROM Read Only Memory
  • RAM Random Access Memory
  • each process described above is stored in a computer-readable recording medium in the form of a program, and the above-mentioned processes are performed by reading and executing this program by the computer.
  • the computer-readable recording medium refers to a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, a semiconductor memory, and the like.
  • this computer program may be distributed to a computer via a communication line, and the computer receiving the distribution may execute the program.
  • the above program may be for realizing some of the functions described above. Furthermore, it may be a so-called difference file (difference program) that can realize the above-mentioned functions in combination with a program already recorded in the computer system.
  • difference file difference program
  • the information processing device (server device 10) indicates information on the state of the vehicle 50 at a certain time and a motor torque command for the vehicle 50 to stop at the target position in that state.
  • a control model that estimates information on the state of the vehicle 50 at the next time using a probability distribution based on information indicating the control input of the vehicle 50 is used to estimate the state indicated by the braking control of the vehicle 50 performed in the past.
  • control model is updated using the following.
  • the evaluation function is calculated from a certain time as a reference to a set timing after the reference time. This is a function indicating an integral value of expected values of evaluation values regarding the state of the vehicle 50 at a plurality of times.
  • the information on the state of the vehicle 50 includes the position, speed, and air spring pressure of the vehicle 50. , motor voltage, and motor torque output information.
  • the information processing device (server device 10) according to the second aspect, sampling the initial arbitrary state of the vehicle 50 according to the policy function and normal distribution, and calculating the evaluation function by integrating the expected value according to the state; The policy parameter for which the evaluation function is the smallest is searched.
  • the information on the state acquired under conditions where regeneration failure is unlikely to occur and the regeneration The control model is generated using the state information acquired under conditions where invalidation is likely to occur.
  • the vehicle 50 is driven by adding random acceleration to a constant acceleration.
  • the control model is generated using the relationship between state information and the control input.
  • the information processing method includes: Based on the information on the state of the vehicle 50 at a certain time and the control input for the vehicle 50 indicating the motor torque command for the vehicle 50 to stop at the target position in that state, the vehicle 50 at the next time is determined.
  • a control model that estimates the state information of the vehicle 50 using a probability distribution based on the relationship between the state information indicated by past braking control of the vehicle 50 and the control input, and the vehicle at the next time obtained from that relationship. generated using 50 states, Determine an evaluation function of an evaluation value regarding the state of the vehicle, in which the evaluation value worsens as the distance to the target position increases, and select a policy parameter that will most improve the evaluation value.
  • control model is updated using the following.
  • the evaluation function is based on the evaluation function at a plurality of times up to a set timing after the reference time, with a certain time as a reference. This is a function that indicates the integral value of the expected value of the evaluation value regarding the state of the vehicle 50.
  • the information on the state of the vehicle 50 includes the position, speed, air spring pressure, motor voltage, motor Contains at least information on torque output.
  • the initial state of the arbitrary vehicle 50 is sampled according to the policy function and the normal distribution, and the expectation according to the state is sampled.
  • the evaluation function is calculated by integrating the values, and the policy parameter for which the evaluation function becomes the smallest is searched.
  • the eleventh aspect in the information processing method according to the seventh to tenth aspects, the information on the state acquired under conditions where regeneration lapse is unlikely to occur and the information under conditions where regeneration lapse is likely to occur.
  • the control model is generated using the acquired state information.
  • the program causes the computer of the information processing device to: Based on the information on the state of the vehicle 50 at a certain time and the control input for the vehicle 50 indicating the motor torque command for the vehicle 50 to stop at the target position in that state, the vehicle 50 at the next time is determined.
  • a control model that estimates the state information of the vehicle 50 using a probability distribution based on the relationship between the state information indicated by past braking control of the vehicle 50 and the control input, and the vehicle at the next time obtained from that relationship. 50 states;
  • a measure that improves the value of the evaluation function most by determining an evaluation function of evaluation values related to the state of the vehicle, in which the evaluation value worsens as the distance to the target position increases.

Landscapes

  • Engineering & Computer Science (AREA)
  • Mechanical Engineering (AREA)
  • Transportation (AREA)
  • Computing Systems (AREA)
  • Power Engineering (AREA)
  • Accounting & Taxation (AREA)
  • Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Electric Propulsion And Braking For Vehicles (AREA)
  • Regulating Braking Force (AREA)
  • Train Traffic Observation, Control, And Security (AREA)

Abstract

According to the present invention, on the basis of information indicating information on the state of a vehicle at a certain time point and a control input for the vehicle to stop at a target position in said state, a control model for inferring information on the state of the vehicle at the next time point by using a probability distribution is generated. A motor torque instruction in the state of the vehicle at the next time point is generated by determining an evaluation function for an evaluation value relating to the state of the vehicle, and inputting, to a policy function, a policy parameter with which the evaluation value of the evaluation function is improved the most. The control model is updated by using the relationship between the control input and vehicle state information which relates to results of trials on the vehicle using the motor torque instruction, and the relationship between the control input and information, on the state of the vehicle, used for generating the control mode.

Description

情報処理装置、情報処理方法、プログラムInformation processing device, information processing method, program
 本開示は、情報処理装置、情報処理方法、プログラムに関する。
 本願は、2022年9月7日に日本に出願された特願2022-142287号について優先権を主張し、その内容をここに援用する。
The present disclosure relates to an information processing device, an information processing method, and a program.
This application claims priority to Japanese Patent Application No. 2022-142287 filed in Japan on September 7, 2022, the contents of which are incorporated herein.
 モータの回転によって車輪を駆動し軌道上を走行する車両の制御装置がある。この制御装置は、現在の車両の状態に応じたモータトルクの指令値を出力してモータを制御する。また当該制御装置を備えた車両はモータの回生ブレーキを用いて車両を制動する。制御装置は、回生ブレーキの他に、車両に設けられた空気ブレーキ(機械ブレーキ)を平行して用い制動してよい。 There is a control device for a vehicle that drives wheels by rotating a motor and travels on a track. This control device controls the motor by outputting a motor torque command value according to the current state of the vehicle. Further, a vehicle equipped with the control device brakes the vehicle using regenerative braking of the motor. In addition to the regenerative brake, the control device may perform braking using an air brake (mechanical brake) provided on the vehicle in parallel.
 上述のようなモータを制御する機能と回生ブレーキを車両の制動に用いる技術が特許文献1、特許文献2に開示されている。より詳細には、特許文献1には、回生失効の発生を低減して停止精度を向上させる技術が開示されている。また特許文献2には、回生失効を予測して、目標ブレーキ力と機械ブレーキ力との差分に相当するブレーキ力を得るための回生ブレーキ力の指令値を算出する技術が開示されている。 Patent Document 1 and Patent Document 2 disclose techniques that use the above-mentioned motor control function and regenerative brake for braking a vehicle. More specifically, Patent Document 1 discloses a technique for reducing the occurrence of regeneration failure and improving stopping accuracy. Moreover, Patent Document 2 discloses a technique of predicting regeneration failure and calculating a command value of regenerative braking force to obtain a braking force corresponding to the difference between a target braking force and a mechanical braking force.
特開2001-204102号公報Japanese Patent Application Publication No. 2001-204102 特開2017-99172号公報JP2017-99172A
 上述のような回生ブレーキを用いて車両を制動する技術において、モータの回生ブレーキを使用して車両を制動させるためのモータの指令値を自動で算出することのできる技術が求められている。 In the technology of braking a vehicle using regenerative braking as described above, there is a need for a technology that can automatically calculate a motor command value for braking a vehicle using regenerative braking of the motor.
 そこでこの開示は、上述の課題を解決する情報処理装置、情報処理方法、プログラムを提供することを目的としている。 Therefore, the purpose of this disclosure is to provide an information processing device, an information processing method, and a program that solve the above-mentioned problems.
 本開示の一態様によれば、情報処理装置は、ある時刻における車両の状態の情報と、その状態において目標位置で停止するための前記車両のモータトルク指令を示す当該車両の制御入力とを示す情報に基づいて次の時刻における車両の状態の情報を確率分布により推定する制御モデルを、過去に行われた前記車両の制動制御が示す前記状態の情報と前記制御入力との関係とその関係により得られた次の時刻における車両の状態とを用いて生成し、前記車両の状態に関する評価値の評価関数であって、前記目標位置までの距離が離れているほど前記評価値が悪化する前記評価関数を決定して、前記評価値が最も改善される方策パラメータを方策関数に入力して次の時刻の車両の状態における前記車両のモータトルク指令を生成し、当該生成したモータトルク指令による試行の結果である前記車両の状態の情報と前記制御入力との関係と、前記制御モデルの生成に用いた前記車両の状態の情報と前記制御入力との関係とを用いて前記制御モデルを更新する。 According to one aspect of the present disclosure, an information processing device indicates information about a state of a vehicle at a certain time and a control input for the vehicle that indicates a motor torque command for the vehicle to stop at a target position in that state. A control model that estimates information on the state of the vehicle at the next time based on the information using a probability distribution is based on the relationship between the information on the state indicated by the braking control of the vehicle performed in the past and the control input, and the relationship between the information and the control input. The evaluation is an evaluation function of an evaluation value regarding the vehicle condition, which is generated using the obtained vehicle condition at the next time, and the evaluation value worsens as the distance to the target position increases. A function is determined, and a policy parameter for which the evaluation value is most improved is input into the policy function to generate a motor torque command for the vehicle in the state of the vehicle at the next time, and a trial is performed using the generated motor torque command. The control model is updated using the relationship between the resulting vehicle state information and the control input, and the relationship between the vehicle state information used to generate the control model and the control input.
 本開示の一態様によれば、情報処理方法は、ある時刻における車両の状態の情報と、その状態において目標位置で停止するための前記車両のモータトルク指令を示す当該車両の制御入力とを示す情報に基づいて次の時刻における車両の状態の情報を確率分布により推定する制御モデルを、過去に行われた前記車両の制動制御が示す前記状態の情報と前記制御入力との関係とその関係により得られた次の時刻における車両の状態とを用いて生成し、前記車両の状態に関する評価値の評価関数であって、前記目標位置までの距離が離れているほど前記評価値が悪化する前記評価関数を決定して、前記評価値が最も改善される方策パラメータを方策関数に入力して次の時刻の車両の状態における前記車両のモータトルク指令を生成し、当該生成したモータトルク指令による試行の結果である前記車両の状態の情報と前記制御入力との関係と、前記制御モデルの生成に用いた前記車両の状態の情報と前記制御入力との関係とを用いて前記制御モデルを更新する。 According to one aspect of the present disclosure, an information processing method indicates information on a state of a vehicle at a certain time and a control input for the vehicle that indicates a motor torque command for the vehicle to stop at a target position in that state. A control model that estimates information on the state of the vehicle at the next time based on the information using a probability distribution is based on the relationship between the information on the state indicated by the braking control of the vehicle performed in the past and the control input, and the relationship between the information and the control input. The evaluation is an evaluation function of an evaluation value regarding the vehicle condition, which is generated using the obtained vehicle condition at the next time, and the evaluation value worsens as the distance to the target position increases. A function is determined, and a policy parameter for which the evaluation value is most improved is input into the policy function to generate a motor torque command for the vehicle in the state of the vehicle at the next time, and a trial is performed using the generated motor torque command. The control model is updated using the relationship between the resulting vehicle state information and the control input, and the relationship between the vehicle state information used to generate the control model and the control input.
 本開示の一態様によれば、プログラムは、情報処理装置のコンピュータを、ある時刻における車両の状態の情報と、その状態において目標位置で停止するための前記車両のモータトルク指令を示す当該車両の制御入力とを示す情報に基づいて次の時刻における車両の状態の情報を確率分布により推定する制御モデルを、過去に行われた前記車両の制動制御が示す前記状態の情報と前記制御入力との関係とその関係により得られた次の時刻における車両の状態とを用いて生成する手段、前記車両の状態に関する評価値の評価関数であって、前記目標位置までの距離が離れているほど前記評価値が悪化する前記評価関数を決定して、前記評価値が最も改善される方策パラメータを方策関数に入力して次の時刻の車両の状態における前記車両のモータトルク指令を生成する手段、当該生成したモータトルク指令による試行の結果である前記車両の状態の情報と前記制御入力との関係と、前記制御モデルの生成に用いた前記車両の状態の情報と前記制御入力との関係とを用いて前記制御モデルを更新する手段、として機能させる。 According to one aspect of the present disclosure, a program causes a computer of an information processing device to display information about a state of a vehicle at a certain time and a motor torque command for the vehicle to stop the vehicle at a target position in that state. A control model that estimates information on the state of the vehicle at the next time using a probability distribution based on information indicating the control input is created by combining the information on the state indicated by the braking control of the vehicle performed in the past and the control input. Means for generating an evaluation function of an evaluation value regarding the state of the vehicle using a relationship and a state of the vehicle at the next time obtained from the relationship, wherein the evaluation function increases as the distance to the target position increases. Means for determining the evaluation function whose value deteriorates and inputting into the strategy function a policy parameter for which the evaluation value is most improved to generate a motor torque command for the vehicle in the state of the vehicle at the next time; using the relationship between the vehicle state information and the control input, which is the result of a trial based on the motor torque command, and the relationship between the vehicle state information and the control input used to generate the control model. The control model is made to function as a means for updating the control model.
 本開示によれば、モータの回生ブレーキを使用して車両を制動させるためのモータの指令値を自動で算出することができる。 According to the present disclosure, it is possible to automatically calculate a motor command value for braking a vehicle using regenerative braking of the motor.
本実施形態による車両制御装置とサーバ装置とを含む車両制御システムの概略図である。1 is a schematic diagram of a vehicle control system including a vehicle control device and a server device according to the present embodiment. 本実施形態による車両制御装置を含む制御機構を示すブロック図である。FIG. 2 is a block diagram showing a control mechanism including a vehicle control device according to the present embodiment. 本実施形態によるサーバ装置の機能ブロック図である。FIG. 2 is a functional block diagram of a server device according to the present embodiment. 本実施形態によるサーバ装置の処理概要を示す図である。FIG. 3 is a diagram showing an overview of processing of the server device according to the present embodiment. 本実施形態によるサーバ装置の処理フローを示す図である。FIG. 3 is a diagram showing a processing flow of the server device according to the present embodiment. 本実施形態によるサーバ装置のハードウェア構成図である。FIG. 2 is a hardware configuration diagram of a server device according to the present embodiment.
 以下、本開示の一実施形態による車両制御装置およびサーバ装置について図面を参照して説明する。
 図1は本実施形態による車両制御装置とサーバ装置とを含む車両制御システムの概略図である。
 図2は本実施形態による車両制御装置を含む制御機構の構成を示すブロック図である。
 この図で示すように、車両50は、制御機構の一例として、車両制御装置1、インバータ2、モータ3を一部に備える。車両制御装置1は、状態に応じたトルク指令値Tri(t)を出力してモータを制御する。インバータ2はトルク指令値Tri(t)に応じた電流をモータ3に出力する。モータ3はトルク指令値Tri(t)に基づいた電流により駆動する。車両制御装置1は、情報処理装置の一態様であるサーバ装置10と通信接続する。車両制御装置1は、車両50の自己位置px(t)、車両50の速度v(t)、車両50の台車と客車との間の揺れを抑制する空気ばねの圧力pa(t)、モータ電圧V(t)、モータ3のトルク出力Tro(t)を含む車両50の状態を示す状態情報を取得する。車両運行時に、車両制御装置1は、状態情報と制御モデルとを用いて、制御入力であるトルク指令値Tri(t)を算出しインバータ2へ出力する。これにより車両制御装置1は車両の状態に応じたトルク指令値Tri(t)に基づいて車両を制御する。また車両制御装置1は、取得した状態情報と、その状態情報に基づいて算出したトルク指令値Tri(t)との情報をサーバ装置10へ出力する。サーバ装置10はそれらの情報を取得して記憶する。
Hereinafter, a vehicle control device and a server device according to an embodiment of the present disclosure will be described with reference to the drawings.
FIG. 1 is a schematic diagram of a vehicle control system including a vehicle control device and a server device according to this embodiment.
FIG. 2 is a block diagram showing the configuration of a control mechanism including a vehicle control device according to this embodiment.
As shown in this figure, a vehicle 50 partially includes a vehicle control device 1, an inverter 2, and a motor 3 as an example of a control mechanism. The vehicle control device 1 outputs a torque command value Tri (t) according to the state to control the motor. Inverter 2 outputs a current to motor 3 according to torque command value Tri (t) . The motor 3 is driven by a current based on the torque command value Tri (t) . The vehicle control device 1 is communicatively connected to a server device 10, which is one aspect of an information processing device. The vehicle control device 1 controls the self-position px (t) of the vehicle 50, the speed v (t) of the vehicle 50, the pressure pa (t) of the air spring that suppresses the shaking between the bogie of the vehicle 50 and the passenger car, and the motor voltage. Status information indicating the status of the vehicle 50 including V (t) and the torque output Tro (t) of the motor 3 is acquired. During vehicle operation, the vehicle control device 1 uses the state information and the control model to calculate a torque command value Tri (t) , which is a control input, and outputs it to the inverter 2. Thereby, the vehicle control device 1 controls the vehicle based on the torque command value Tri (t) according to the state of the vehicle. Further, the vehicle control device 1 outputs information on the acquired state information and the torque command value Tri (t) calculated based on the state information to the server device 10. The server device 10 acquires and stores the information.
 図3はサーバ装置の機能ブロック図である。
 サーバ装置10は、予め記憶するプログラムを起動することにより、学習部12、方策評価部13、方策改善部14の各機能を発揮する。またサーバ装置10はデータベース等の記憶部11を備える。
 記憶部11は、車両50の自己位置px(t)、車両50の速度v(t)、車両50の台車と客車との間の揺れを抑制する空気ばねの圧力pa(t)、モータ電圧V(t)、モータ3のトルク出力Tro(t)を含む車両50の状態を示す状態情報と、その状態情報が示す車両の状態におけるトルク指令値Tri(t)との関係を記憶する。この記憶する情報は、車両50の車両制御装置1から送信されて記録した情報である。
 学習部12は、時刻tにおける車両50の状態の情報と、その状態において目標位置で停止するための車両50のモータトルク指令を示す当該車両50の制御入力とを示す初期データに基づいて、次の時刻t+1における車両50の状態の情報を確率分布により推定する制御モデルを生成する。
 方策評価部13は、評価関数Jπ(θ)を用いて方策の評価を行う。
 方策改善部14は、評価関数Jπ(θ)が小さくなるパラメータθを探索する。方策改善部14がパラメータθの値を更新することで、方策が更新される。
FIG. 3 is a functional block diagram of the server device.
The server device 10 performs each function of the learning section 12, the policy evaluation section 13, and the policy improvement section 14 by activating a program stored in advance. The server device 10 also includes a storage unit 11 such as a database.
The storage unit 11 stores the self-position px (t) of the vehicle 50, the speed v (t) of the vehicle 50, the pressure pa (t) of the air spring that suppresses the shaking between the bogie of the vehicle 50 and the passenger car, and the motor voltage V (t) , stores the relationship between state information indicating the state of the vehicle 50 including the torque output Tro (t) of the motor 3 and the torque command value Tri (t) in the state of the vehicle indicated by the state information. This stored information is information transmitted from the vehicle control device 1 of the vehicle 50 and recorded.
The learning unit 12 performs the following based on initial data indicating information on the state of the vehicle 50 at time t and a control input for the vehicle 50 indicating a motor torque command for the vehicle 50 to stop at the target position in that state. A control model is generated that estimates information on the state of the vehicle 50 at time t+1 using a probability distribution.
The policy evaluation unit 13 evaluates the policy using the evaluation function J π (θ).
The policy improvement unit 14 searches for a parameter θ that makes the evaluation function J π (θ) small. The policy is updated by the policy improvement unit 14 updating the value of the parameter θ.
 図4はサーバ装置の処理概要を示す図である。
 サーバ装置10はモデル強化学習の一つであるPILCO(Probabilistic Inference for Learning Control)等の機能を備えており、以下の処理を行う。
(1)モデルの学習
 サーバ装置10は、時刻tにおける車両50の状態の情報と、その状態において目標位置で停止するための車両50のモータトルク指令を示す当該車両50の制御入力とを示す初期データに基づいて、次の時刻t+1における車両50の状態の情報を確率分布により推定する制御モデルを生成する。
FIG. 4 is a diagram showing an outline of processing of the server device.
The server device 10 is equipped with functions such as PILCO (Probabilistic Inference for Learning Control), which is one type of model reinforcement learning, and performs the following processing.
(1) Model learning The server device 10 provides an initial information indicating the state of the vehicle 50 at time t and a control input for the vehicle 50 indicating a motor torque command for the vehicle 50 to stop at the target position in that state. Based on the data, a control model is generated that estimates information on the state of the vehicle 50 at the next time t+1 using a probability distribution.
(2)方策の評価,改善と試行
 サーバ装置10は、車両50の状態に関する評価値の評価関数であって、制動制御したことによる停止位置と目標位置との関係において停止位置が目標位置までの距離が離れているほど評価値が悪化する評価関数を用いて最適化計算を行う。サーバ装置10は、当該評価値が最も改善される方策パラメータを方策関数に設定し、また状態情報を方策関数に入力して次の時刻の車両50の状態におけるモータトルク指令を生成する。サーバ装置10はその生成したモータトルク指令により車両50の試行を行うよう車両制御装置1に指示する。
(2) Evaluation, improvement and trial of measures The server device 10 is an evaluation function of the evaluation value regarding the state of the vehicle 50, and the server device 10 is an evaluation function of the evaluation value regarding the state of the vehicle 50. Optimization calculations are performed using an evaluation function in which the evaluation value worsens as the distance increases. The server device 10 sets the policy parameter that improves the evaluation value most in the policy function, inputs the state information into the policy function, and generates a motor torque command for the state of the vehicle 50 at the next time. The server device 10 instructs the vehicle control device 1 to perform a trial operation of the vehicle 50 based on the generated motor torque command.
(3)試行結果を用いたモデルの更新
 サーバ装置10は、車両制御装置1における試行の結果である車両50の状態の情報と制御入力との関係と、制御モデルの生成に用いた車両50の状態の情報と制御入力との関係とを用いて制御モデルを更新する。
(3) Updating the model using trial results The server device 10 updates the relationship between the information on the state of the vehicle 50 and the control input, which is the result of the trial in the vehicle control device 1, and the information on the vehicle 50 used to generate the control model. The control model is updated using the state information and the relationship between the control inputs.
 図5はサーバ装置の処理フローを示す図である。
(制御モデルの生成)
 まず学習部12が機械学習の手法を用いて制御モデルを生成する(ステップS101)。サーバ装置10は、予め車両50の制御中の状態情報が示す各状態の値と、その状態時において車両50側の制御入力として出力した制御量であるトルク指令値Tri(t)との関係と、その関係に基づいて車両50が駆動した場合の次の時刻の車両50の状態の情報を紐づけて、記憶部11等に大量に記憶しておく。このトルク指令値Tri(t)は、車両50の運転手等によって、目標位置で停止する際に回生失効が発生しないように制御された場合の各情報である。記憶部11はこのような状態情報x(t)と制御入力であるトルク指令値Tri(t)と、当該トルク指令Tri(t)を用いて車両50を制御した場合の状態情報x(t+1)の関係と、その関係において回生失効が発生したか否かを示すフラグの情報(初期データ)を紐づけて記憶する。学習部12は、このような状態情報x(t)および状態情報x(t+1)と制御入力であるトルク指令値Tri(t)との関係と、その関係において回生失効が発生したか否かを示すフラグの情報と、次の時刻の車両50の状態の情報とを、例えばガウス過程回帰等の手法を用いて学習し、制御モデルを生成する。制御モデルは、時刻tにおける車両50の状態情報と、その状態情報が示す各状態において回生失効せずに目標位置で停止するために適した車両50のトルク指令値Tri(t)とに基づいて、次の時刻t+1における車両50の状態の情報を確率分布により推定する学習モデルである。式(1)に制御モデルを示す。当該制御モデルはダイナミクスモデルの一例である。
FIG. 5 is a diagram showing the processing flow of the server device.
(Generation of control model)
First, the learning unit 12 generates a control model using a machine learning method (step S101). The server device 10 determines in advance the relationship between the value of each state indicated by the state information during control of the vehicle 50 and the torque command value Tri (t), which is a control amount output as a control input on the vehicle 50 side in that state. , based on the relationship, information on the state of the vehicle 50 at the next time when the vehicle 50 is driven is linked and stored in large quantities in the storage unit 11 or the like. This torque command value Tri (t) is information when the driver of the vehicle 50 or the like performs control so that regeneration failure does not occur when stopping at the target position. The storage unit 11 stores the state information x (t) , the torque command value Tri (t) which is a control input, and the state information x ( t+1) when the vehicle 50 is controlled using the torque command Tri (t ). The relationship and information on a flag (initial data) indicating whether or not regeneration has expired in that relationship are stored in association with each other. The learning unit 12 determines the relationship between such status information x (t) and status information x (t+1) and the torque command value Tri (t) , which is a control input, and whether or not regeneration failure has occurred in the relationship. The information on the indicated flag and the information on the state of the vehicle 50 at the next time are learned using a method such as Gaussian process regression, and a control model is generated. The control model is based on the state information of the vehicle 50 at time t and the torque command value Tri (t) of the vehicle 50 suitable for stopping at the target position without regeneration failure in each state indicated by the state information. , is a learning model that estimates information on the state of the vehicle 50 at the next time t+1 using a probability distribution. The control model is shown in equation (1). The control model is an example of a dynamics model.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 式(1)においてx(t)は時刻tにおける状態情報、u(t)は時刻tにおける制御入力であるトルク指令値Tri(t)、であり、それぞれ式(2)、式(3)のように示される。ωはノイズを示す。式(1)においてx(t+1)は時刻t+1における状態情報である。また式(1)においてN(0,Σω)は、平均0、共分散行列Σωのガウス分布を示す。ノイズωは当該ガウス分布に従って確率的に求まる。式(1)に示すように制御モデルにより、現在の時刻tにおける状態情報x(t)と制御入力u(t)に基づいて、次の時刻t+1における状態情報の分布を推定することができる。 In equation (1), x (t) is the state information at time t, and u (t) is the torque command value Tri (t), which is the control input at time t, which corresponds to equation (2) and equation (3), respectively. It is shown as follows. ω indicates noise. In equation (1), x (t+1) is state information at time t+1. Further, in equation (1), N(0, Σ ω ) represents a Gaussian distribution with a mean of 0 and a covariance matrix Σ ω . The noise ω is determined stochastically according to the Gaussian distribution. As shown in Equation (1), the control model allows the distribution of state information at the next time t+1 to be estimated based on the state information x (t) at the current time t and the control input u (t) .
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 学習部12は、回生失効が生じにくい条件下において過去に取得した状態情報を示す各状態の値と、その状態時において車両側の制御入力として出力した制御量であるトルク指令値Tri(t)との関係を用いて、制御モデルを学習してもよい。または学習部12は、回生失効が生じ易い条件下において過去に取得した状態情報を示す各状態の値と、その状態時において車両側の制御入力として出力した制御量であるトルク指令値Tri(t)との関係を用いて、制御モデルを学習してもよい。回生失効が生じにくい条件下とは、電力系統の電圧が回生失効の生じ易い所定の閾値よりも低い場合である。また回生失効が生じ易い条件下とは、電力系統の電圧が回生失効の生じ易い所定の閾値よりも高い場合である。このように回生失効が生じ易い条件下や生じにくい条件下における状態情報を用いて制御モデルを学習することにより、回生失効が生じにくい条件でも、回生失効が生じやすい条件でも、精度よく制御するためのトルク指令値Tri(t)を出力することのできる制御モデルを生成することができる。 The learning unit 12 calculates the value of each state indicating state information acquired in the past under conditions in which regeneration failure is unlikely to occur, and the torque command value Tri (t) which is a control amount output as a control input on the vehicle side in that state. A control model may be learned using the relationship. Alternatively, the learning unit 12 calculates the value of each state indicating state information acquired in the past under conditions where regeneration failure is likely to occur, and the torque command value Tri (t ) may be used to learn the control model. The condition under which regeneration failure is unlikely to occur is when the voltage of the power system is lower than a predetermined threshold value at which regeneration failure is likely to occur. Further, the condition where regeneration failure is likely to occur is a case where the voltage of the power system is higher than a predetermined threshold value at which regeneration failure is likely to occur. In this way, by learning the control model using state information under conditions where regenerative failure is likely to occur or under conditions where regenerative failure is unlikely to occur, accurate control can be achieved even under conditions where regenerative failure is unlikely to occur or conditions where regenerative failure is likely to occur. A control model that can output the torque command value Tri (t) can be generated.
(方策の評価)
 方策評価部13は、評価関数Jπ(θ)の値が小さくなる方策パラメータθを決定する(ステップS102)。この処理において、方策評価部13は、パラメータθの初期値を任意に設定しておく。評価関数Jπ(θ)を式(4)に示す。
(Evaluation of measures)
The policy evaluation unit 13 determines a policy parameter θ that reduces the value of the evaluation function J π (θ) (step S102). In this process, the policy evaluation unit 13 arbitrarily sets the initial value of the parameter θ. The evaluation function J π (θ) is shown in equation (4).
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 式(4)において、c(x(t))は式(5)で表され、時刻tにおける状態情報x(t)の評価値を示す。Hは時刻tを基準時刻とした場合に、その時刻以降の任意に設定されたタイミングを示す。Eは評価値c(x(t))の期待値を示す。なお式(5)においてσ は評価値cについての分散を示す。 In equation (4), c(x (t) ) is expressed by equation (5) and indicates the evaluation value of state information x (t) at time t. When time t is used as a reference time, H indicates an arbitrarily set timing after that time. E indicates the expected value of the evaluation value c(x (t) ). Note that in equation (5), σ c 2 indicates the variance of the evaluation value c.
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 当該評価値c(x(t))の値は、軌道上の目標位置であるxtargetと、時刻tにおける車両50の起動における位置xtが近いほど1に近づき、遠いほど0に近づく値となる。式(4)において、 The value of the evaluation value c(x (t) ) approaches 1 as the target position x target on the orbit and the position xt at the start of the vehicle 50 at time t are closer, and approaches 0 as the position xt is farther apart. . In formula (4),
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
は、平均μ(t)、共分散行列Σ(t)のガウス分布である。 is a Gaussian distribution with mean μ (t) and covariance matrix Σ (t) .
 方策評価部13は、初期の状態情報x(0)を、正規分布N(μ(0),Σ(0))に従ってサンプリングする。方策評価部13は、時刻tにおける状態情報x(t)を取得し、評価値c(x(t))期待値Eを算出し、同様に予め設定された時刻Hまでの各評価値c(x(t))の期待値Eの積分により、式(4)で示した評価関数Jπ(θ)を算出する。 The policy evaluation unit 13 samples the initial state information x (0) according to a normal distribution N (μ (0) , Σ (0) ). The policy evaluation unit 13 acquires the state information x (t) at time t, calculates the evaluation value c(x (t) ) expected value E, and similarly calculates each evaluation value c( The evaluation function J π (θ) shown in equation (4) is calculated by integrating the expected value E of x (t) ).
(方策の改善)
 方策改善部14は、一例としてRBF(Radial Basis Function)コントローラの機能を有して以下の処理を行う。なおRBFコントローラは、中間層にガウス関数を持ったニューラルネットワークのネットワーク構造を有する非線形コントローラである。方策改善部14は、方策評価部13の算出する評価関数Jπ(θ)が最も小さくなる方策パラメータθを探索し、更新する(ステップS103)。この処理において、方策改善部14は、評価関数Jπ(θ)から方策勾配を計算し、その方策勾配を基に方策を構成する方策パラメータθを解探索の対象とする最適化計算を行う。評価関数Jπ(θ)の方策勾配は式(7)により計算することができる。方策改善部14は、この方策勾配の値が最も小さくなる方向に、勾配法、例えばバックプロパゲーション等の手法を用いて、方策パラメータθを探索する。
(Improvement of measures)
The policy improvement unit 14 has the function of an RBF (Radial Basis Function) controller, for example, and performs the following processing. Note that the RBF controller is a nonlinear controller having a network structure of a neural network with a Gaussian function in the intermediate layer. The policy improvement unit 14 searches for and updates the policy parameter θ that minimizes the evaluation function J π (θ) calculated by the policy evaluation unit 13 (step S103). In this process, the policy improvement unit 14 calculates a policy gradient from the evaluation function J π (θ), and performs optimization calculation using the policy parameter θ that constitutes the policy as a solution search target based on the policy gradient. The policy gradient of the evaluation function J π (θ) can be calculated using equation (7). The policy improvement unit 14 searches for the policy parameter θ using a gradient method, such as backpropagation, in the direction in which the value of the policy gradient becomes the smallest.
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 方策改善部14は、各時刻における評価値c(x(t))の期待値Ex(t)に対する状態分布p(x(t))=N(μ(t),Σ(t))の平均と共分散行列のそれぞれについての偏導関数から、評価値c(x(t))が小さくなる状態x(t)を求め、その状態x(t)が得られるような方策関数π(x(t),θ)を構成する方策のパラメータθを最適化の手法を用いて算出してもよい。 The policy improvement unit 14 calculates the average of the state distribution p(x (t) )=N(μ (t) ,Σ(t) ) for the expected value Ex(t) of the evaluation value c(x( t) ) at each time. From the partial derivatives of the and covariance matrices, find the state x (t) in which the evaluation value c(x (t) ) is small, and then calculate the policy function π ( x ( t) , θ) may be calculated using an optimization method.
 状態分布p(x(t))=N(μ(t),Σ(t))の平均μ(t)についての偏導関数を式(8)に示す。また状態分布p(x(t))=N(μ(t),Σ(t))の共分散行列Σ(t)についての偏導関数を式(9)に示す。 Equation (8) shows the partial derivative of the state distribution p(x (t) )=N(μ (t) , Σ (t) ) with respect to the average μ (t) . Further, the partial derivative of the state distribution p(x (t) ) = N (μ (t) , Σ (t) ) with respect to the covariance matrix Σ (t) is shown in equation (9).
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000009
ここで、式(8)、式(9)において、式(10)を満たす。また式(9)においてIは単位行列を示す。 Here, formula (10) is satisfied in formula (8) and formula (9). Also, in equation (9), I represents a unit matrix.
Figure JPOXMLDOC01-appb-M000010
Figure JPOXMLDOC01-appb-M000010
またT-1は対角成分が Also, for T -1 , the diagonal component is
Figure JPOXMLDOC01-appb-M000011
Figure JPOXMLDOC01-appb-M000011
となる行列である。 This is the matrix.
(試行)
 方策改善部14は、最適化した方策パラメータθを用いた方策関数π(x(t),θ)に、状態情報x(t)を入力して、式(12)で示すように制御入力u(t+1)を算出する(ステップS104)。
(trial)
The policy improvement unit 14 inputs the state information x (t) into the policy function π(x (t) , θ) using the optimized policy parameter θ , and then adjusts the control input u as shown in equation (12). (t+1) is calculated (step S104).
Figure JPOXMLDOC01-appb-M000012
Figure JPOXMLDOC01-appb-M000012
 方策改善部14は、算出した制御入力u(t+1)が示すトルク指令値Tri(t+1)を車両制御装置1へ出力する(ステップS105)。車両制御装置1は、そのトルク指令値Tri(t+1)をインバータ2へ出力し、その結果、モータ3の制御を試行する。これにより車両50が動作して、次の時刻における状態情報x(t+1)が観測できる。車両制御装置1は、その時に観測した状態情報x(t)およびx(t+1)と制御入力u(t)と回生失効の有無の情報との関係をサーバ装置10へ送信し、サーバ装置10がそれらの情報を紐づけて記憶部11に記録する。回生失効の有無は、車両制御装置1がトルク指令値Tri(t)からトルク出力Tro(t)との差を算出し、この差が所定の閾値以上の場合に回生失効有り、閾値未満の場合に入姓失効無しと、車両制御装置1が判定してよい。この記録した情報は、制御モデルの更新に利用する。 The policy improvement unit 14 outputs the torque command value Tri (t+1 ) indicated by the calculated control input u (t+1) to the vehicle control device 1 (step S105). The vehicle control device 1 outputs the torque command value Tri (t+1) to the inverter 2, and as a result, attempts to control the motor 3. As a result, the vehicle 50 operates, and state information x (t+1) at the next time can be observed. The vehicle control device 1 transmits the relationship between the state information x (t) and x (t+1) observed at that time, the control input u (t) , and information on whether regeneration has expired to the server device 10, and the server device 10 The information is linked and recorded in the storage unit 11. The presence or absence of regeneration failure is determined by the vehicle control device 1 calculating the difference between torque command value Tri (t) and torque output Tro (t) , and if this difference is greater than or equal to a predetermined threshold value, regeneration failure has occurred, and if it is less than the threshold value, regeneration failure has occurred. The vehicle control device 1 may determine that the registered name has not expired. This recorded information is used to update the control model.
(制御モデルの更新)
 学習部12は、初期の制御モデルに利用した状態情報x(t)や制御入力u(t)と、方策改善部14の最適化後に新たに記録された状態情報x(t)や制御入力u(t)とを用いて、ガウス過程回帰等の手法により繰り返し学習し、制御モデルを更新する(ステップS106)。なお方策改善部14の最適化後に新たに記録された状態情報x(t)や制御入力u(t)も、回生失効が生じにくい条件下や、回生失効が生じ易い条件下に対応する値であり、学習部12はそのような各環境下における状態情報x(t)や制御入力u(t)と、ガウス過程回帰等の手法を用いて繰り返し学習し、制御モデルを更新してよい。サーバ装置10は、更新された制御モデルを用いて、状態情報x(t)に応じた制御入力u(t)であるトルク指令値Tri(t)を算出して、車両制御装置1へ出力する。サーバ装置10は処理を終了するかを判定し(ステップS107)、終了の指示があるまでステップS102~ステップS106の処理を繰り返す。
(Control model update)
The learning unit 12 uses state information x (t) and control input u (t) used in the initial control model, and state information x (t) and control input u (t ) newly recorded after optimization by the policy improvement unit 14. (t) , iterative learning is performed using a method such as Gaussian process regression, and the control model is updated (step S106). Note that the state information x (t) and control input u (t) newly recorded after optimization by the policy improvement unit 14 are also values corresponding to conditions where regeneration failure is unlikely to occur or conditions where regeneration failure is likely to occur. The learning unit 12 may repeatedly learn the state information x (t) and control input u (t) under each environment and update the control model by using a method such as Gaussian process regression. The server device 10 uses the updated control model to calculate a torque command value Tri (t ), which is a control input u (t) according to the state information x (t) , and outputs it to the vehicle control device 1. . The server device 10 determines whether to terminate the process (step S107), and repeats the processes from step S102 to step S106 until an instruction to terminate is received.
 上述した、方策評価部13、方策改善部14の処理の繰り返しや、学習部12の制御モデルの更新の繰り返しが行われることにより、制御モデルの最適化を図ることができる。このような処理によれば、回生失効を発生させずに目標の停止位置で停止することのできる車両50の各位置における適切な制御入力(ランカーブ)を自動で算出することができる。また回生ブレーキを用いてそのような適切な制御入力を用いて車両50の制動制御ができるようになるため、機械ブレーキの使用が減少し、機械ブレーキの単位期間における消耗を低下させることで、機械ブレーキのメンテナンスに係るコストを削減することができる。またさらに上述の処理によれば、回生失効を発生させずに目標の停止位置で停止することのできる車両50の各位置における適切な制御入力(ランカーブ)を特定する手法を、少ない回数の車両50の試行運転で獲得することができる。 By repeating the processes of the policy evaluation unit 13 and policy improvement unit 14 and updating the control model of the learning unit 12 as described above, it is possible to optimize the control model. According to such processing, it is possible to automatically calculate appropriate control inputs (run curves) at each position of the vehicle 50 that can stop the vehicle 50 at the target stop position without causing regeneration failure. In addition, since regenerative braking can be used to control the braking of the vehicle 50 using such appropriate control inputs, the use of mechanical brakes is reduced, and the wear and tear of the mechanical brakes per unit period is reduced. The cost related to brake maintenance can be reduced. Further, according to the above-described process, a method for specifying an appropriate control input (run curve) at each position of the vehicle 50 that allows the vehicle 50 to stop at a target stop position without causing regeneration failure is applied to the vehicle 50 a small number of times. It can be earned through a trial run.
 なお制御モデルを生成するために利用した初期データにおいて、一定の減速度にランダムな加速度を付加して停車させた際の車両50の状態の情報と制御入力との関係を用いてもよい。このような多様な初期データを利用することで、少数の初期データで制御モデルの学習が可能となる。 Note that in the initial data used to generate the control model, the relationship between control input and information on the state of the vehicle 50 when it is stopped with random acceleration added to a constant deceleration may be used. By using such a variety of initial data, it becomes possible to learn a control model with a small amount of initial data.
 図6はサーバ装置のハードウェア構成図である。一例として、この図が示すようにサーバ装置10は、CPU(Central Processing Unit)101、ROM(Read Only Memory)102、RAM(Random Access Memory)103、HDDやSDDなどの記憶部104、通信モジュール105等の各ハードウェアを備えてよい。 FIG. 6 is a hardware configuration diagram of the server device. As an example, as shown in this figure, the server device 10 includes a CPU (Central Processing Unit) 101, a ROM (Read Only Memory) 102, a RAM (Random Access Memory) 103, a storage unit 104 such as an HDD or SDD, and a communication module 105. It may be equipped with each hardware such as.
 そして上述のサーバ装置10において、上述した各処理の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータが読み出して実行することによって、上記処理が行われる。ここでコンピュータ読み取り可能な記録媒体とは、磁気ディスク、光磁気ディスク、CD-ROM、DVD-ROM、半導体メモリ等をいう。また、このコンピュータプログラムを通信回線によってコンピュータに配信し、この配信を受けたコンピュータが当該プログラムを実行するようにしても良い。 In the server device 10 described above, each process described above is stored in a computer-readable recording medium in the form of a program, and the above-mentioned processes are performed by reading and executing this program by the computer. Here, the computer-readable recording medium refers to a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, a semiconductor memory, and the like. Alternatively, this computer program may be distributed to a computer via a communication line, and the computer receiving the distribution may execute the program.
 上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル(差分プログラム)であっても良い。 The above program may be for realizing some of the functions described above. Furthermore, it may be a so-called difference file (difference program) that can realize the above-mentioned functions in combination with a program already recorded in the computer system.
<付記>
 上述の実施形態は、例えば以下のように把握される。
<Additional notes>
The above embodiment can be understood, for example, as follows.
(1)第一の態様によれば、情報処理装置(サーバ装置10)は
 ある時刻における車両50の状態の情報と、その状態において目標位置で停止するための前記車両50のモータトルク指令を示す当該車両50の制御入力とを示す情報に基づいて、次の時刻における車両50の状態の情報を確率分布により推定する制御モデルを、過去に行われた前記車両50の制動制御が示す前記状態の情報と前記制御入力との関係と、その関係により得られた次の時刻における車両50の状態とを用いて生成し、
 前記車両の状態に関する評価値の評価関数であって、前記目標位置までの距離が離れているほど前記評価値が悪化する前記評価関数を決定して、当該評価関数の値が最も改善される方策パラメータを、当該方策パラメータを方策関数に入力して次の時刻の車両の状態における前記車両50のモータトルク指令を生成し、
 当該生成したモータトルク指令による試行の結果である前記車両50の状態の情報と前記制御入力との関係と、前記制御モデルの生成に用いた前記車両50の状態の情報と前記制御入力との関係とを用いて前記制御モデルを更新する。
(1) According to the first aspect, the information processing device (server device 10) indicates information on the state of the vehicle 50 at a certain time and a motor torque command for the vehicle 50 to stop at the target position in that state. A control model that estimates information on the state of the vehicle 50 at the next time using a probability distribution based on information indicating the control input of the vehicle 50 is used to estimate the state indicated by the braking control of the vehicle 50 performed in the past. Generated using the relationship between the information and the control input and the state of the vehicle 50 at the next time obtained from the relationship,
A measure that improves the value of the evaluation function most by determining an evaluation function of evaluation values related to the state of the vehicle, in which the evaluation value worsens as the distance to the target position increases. inputting the policy parameters into a policy function to generate a motor torque command for the vehicle 50 in the state of the vehicle at the next time;
A relationship between information on the state of the vehicle 50 that is the result of a trial based on the generated motor torque command and the control input, and a relationship between information on the state of the vehicle 50 used to generate the control model and the control input. The control model is updated using the following.
 このような処理によれば、回生失効を発生させずに目標の停止位置で停止することのできる車両50の各位置における適切な制御入力(ランカーブ)を自動で算出することができる。また回生ブレーキを用いてそのような適切な制御入力を用いて車両50の制動制御ができるようになるため、機械ブレーキの使用が減少し、機械ブレーキの単位期間における消耗を低下させることで、機械ブレーキのメンテナンスに係るコストを削減することができる。またこのような処理によれば、回生失効を発生させずに目標の停止位置で停止することのできる車両50の各位置における適切な制御入力(ランカーブ)を特定する手法を、少ない回数の車両50の試行運転で獲得することができる。 According to such processing, it is possible to automatically calculate appropriate control inputs (run curves) at each position of the vehicle 50 that can stop the vehicle 50 at the target stop position without causing regeneration failure. In addition, since regenerative braking can be used to control the braking of the vehicle 50 using such appropriate control inputs, the use of mechanical brakes is reduced, and the wear and tear of the mechanical brakes per unit period is reduced. The cost related to brake maintenance can be reduced. Further, according to such processing, a method for specifying an appropriate control input (run curve) at each position of the vehicle 50 that allows the vehicle 50 to stop at a target stop position without causing regeneration failure can be applied to the vehicle 50 a small number of times. It can be earned through a trial run.
(2)第二の態様によれば、第一の態様に係る情報処理装置(サーバ装置10)において、前記評価関数は、ある時刻を基準として、その基準の時刻以降の設定されたタイミングまでの複数の時刻における前記車両50の状態に関する評価値の期待値の積分値を示す関数である。 (2) According to the second aspect, in the information processing device (server device 10) according to the first aspect, the evaluation function is calculated from a certain time as a reference to a set timing after the reference time. This is a function indicating an integral value of expected values of evaluation values regarding the state of the vehicle 50 at a plurality of times.
(3)第三の態様によれば、第一または第二の態様に係る情報処理装置(サーバ装置10)において、前記車両50の状態の情報は、前記車両50の位置、速度、空気ばね圧力、モータ電圧、モータトルク出力の情報を少なくとも含む。 (3) According to the third aspect, in the information processing device (server device 10) according to the first or second aspect, the information on the state of the vehicle 50 includes the position, speed, and air spring pressure of the vehicle 50. , motor voltage, and motor torque output information.
(4)第四の態様によれば、第二の態様に係る情報処理装置(サーバ装置10)において、
 前記方策関数と正規分布とに従って初期の任意の前記車両50の状態をサンプリングし、当該状態に応じた前記期待値の積分により前記評価関数を算出し、
 前記評価関数が最も小さくなる前記方策パラメータを探索する。
(4) According to the fourth aspect, in the information processing device (server device 10) according to the second aspect,
sampling the initial arbitrary state of the vehicle 50 according to the policy function and normal distribution, and calculating the evaluation function by integrating the expected value according to the state;
The policy parameter for which the evaluation function is the smallest is searched.
(5)第五の態様によれば、第一から第四の何れかの態様に係る情報処理装置(サーバ装置10)において、回生失効が生じにくい条件下において取得した前記状態の情報と、回生失効が生じ易い条件下において取得した前記状態の情報とを用いて前記制御モデルを生成する。 (5) According to the fifth aspect, in the information processing device (server device 10) according to any one of the first to fourth aspects, the information on the state acquired under conditions where regeneration failure is unlikely to occur, and the regeneration The control model is generated using the state information acquired under conditions where invalidation is likely to occur.
 このような処理によれば、回生失効が生じにくい条件でも、回生失効が生じやすい条件でも、精度よく制御するための制御入力を出力することのできる制御モデルを生成することができる。 According to such processing, it is possible to generate a control model that can output control inputs for accurate control both under conditions where regeneration failure is unlikely to occur and under conditions where regeneration failure is likely to occur.
(6)第六の態様によれば、第一から第四の何れかの態様に係る情報処理装置(サーバ装置10)において、一定の加速度にランダムな加速度を付加して駆動した前記車両50の状態の情報と前記制御入力との関係を用いて前記制御モデルを生成する。 (6) According to the sixth aspect, in the information processing device (server device 10) according to any one of the first to fourth aspects, the vehicle 50 is driven by adding random acceleration to a constant acceleration. The control model is generated using the relationship between state information and the control input.
 このような処理によれば、多様な初期データを利用することで、少数の初期データで制御モデルの学習が可能となる。 According to such processing, by using a variety of initial data, it is possible to learn a control model with a small amount of initial data.
(7)第七の態様によれば、情報処理方法は、
 ある時刻における車両50の状態の情報と、その状態において目標位置で停止するための前記車両50のモータトルク指令を示す当該車両50の制御入力とを示す情報に基づいて、次の時刻における車両50の状態の情報を確率分布により推定する制御モデルを、過去に行われた前記車両50の制動制御が示す前記状態の情報と前記制御入力との関係とその関係により得られた次の時刻における車両50の状態とを用いて生成し、
 前記車両の状態に関する評価値の評価関数であって、前記目標位置までの距離が離れているほど前記評価値が悪化する前記評価関数を決定して、前記評価値が最も改善される方策パラメータを、当該方策パラメータを方策関数に入力して次の時刻の車両の状態における前記車両50のモータトルク指令を生成し、
 当該生成したモータトルク指令による試行の結果である前記車両50の状態の情報と前記制御入力との関係と、前記制御モデルの生成に用いた前記車両50の状態の情報と前記制御入力との関係とを用いて前記制御モデルを更新する。
(7) According to the seventh aspect, the information processing method includes:
Based on the information on the state of the vehicle 50 at a certain time and the control input for the vehicle 50 indicating the motor torque command for the vehicle 50 to stop at the target position in that state, the vehicle 50 at the next time is determined. A control model that estimates the state information of the vehicle 50 using a probability distribution based on the relationship between the state information indicated by past braking control of the vehicle 50 and the control input, and the vehicle at the next time obtained from that relationship. generated using 50 states,
Determine an evaluation function of an evaluation value regarding the state of the vehicle, in which the evaluation value worsens as the distance to the target position increases, and select a policy parameter that will most improve the evaluation value. , inputting the policy parameters into a policy function to generate a motor torque command for the vehicle 50 in the state of the vehicle at the next time;
A relationship between information on the state of the vehicle 50 that is the result of a trial based on the generated motor torque command and the control input, and a relationship between information on the state of the vehicle 50 used to generate the control model and the control input. The control model is updated using the following.
(8)第八の態様によれば、第七の態様に係る情報処理方法において、前記評価関数は、ある時刻を基準として、その基準の時刻以降の設定されたタイミングまでの複数の時刻における前記車両50の状態に関する評価値の期待値の積分値を示す関数である。 (8) According to an eighth aspect, in the information processing method according to the seventh aspect, the evaluation function is based on the evaluation function at a plurality of times up to a set timing after the reference time, with a certain time as a reference. This is a function that indicates the integral value of the expected value of the evaluation value regarding the state of the vehicle 50.
(9)第九の態様によれば、第七または第八の態様に係る情報処理方法において、前記車両50の状態の情報は、前記車両50の位置、速度、空気ばね圧力、モータ電圧、モータトルク出力の情報を少なくとも含む。 (9) According to a ninth aspect, in the information processing method according to the seventh or eighth aspect, the information on the state of the vehicle 50 includes the position, speed, air spring pressure, motor voltage, motor Contains at least information on torque output.
(10)第十の態様によれば、第八の態様に係る情報処理方法において、前記方策関数と正規分布とに従って初期の任意の前記車両50の状態をサンプリングし、当該状態に応じた前記期待値の積分により前記評価関数を算出し、前記評価関数が最も小さくなる前記方策パラメータを探索する。 (10) According to the tenth aspect, in the information processing method according to the eighth aspect, the initial state of the arbitrary vehicle 50 is sampled according to the policy function and the normal distribution, and the expectation according to the state is sampled. The evaluation function is calculated by integrating the values, and the policy parameter for which the evaluation function becomes the smallest is searched.
(11)第十一の態様によれば、第七から第十の態様に係る情報処理方法において、回生失効が生じにくい条件下において取得した前記状態の情報と、回生失効が生じ易い条件下において取得した前記状態の情報とを用いて前記制御モデルを生成する。 (11) According to the eleventh aspect, in the information processing method according to the seventh to tenth aspects, the information on the state acquired under conditions where regeneration lapse is unlikely to occur and the information under conditions where regeneration lapse is likely to occur. The control model is generated using the acquired state information.
(12)第十二の態様によれば、第七から第十の態様に係る情報処理方法において、一定の加速度にランダムな加速度を付加して駆動した前記車両50の状態の情報と前記制御入力との関係を用いて前記制御モデルを生成する。 (12) According to the twelfth aspect, in the information processing method according to the seventh to tenth aspects, information on the state of the vehicle 50 driven with random acceleration added to a constant acceleration and the control input The control model is generated using the relationship.
(13)第十三の態様によれば、プログラムは、情報処理装置のコンピュータを、
 ある時刻における車両50の状態の情報と、その状態において目標位置で停止するための前記車両50のモータトルク指令を示す当該車両50の制御入力とを示す情報に基づいて、次の時刻における車両50の状態の情報を確率分布により推定する制御モデルを、過去に行われた前記車両50の制動制御が示す前記状態の情報と前記制御入力との関係とその関係により得られた次の時刻における車両50の状態とを用いて生成する手段、
 前記車両の状態に関する評価値の評価関数であって、前記目標位置までの距離が離れているほど前記評価値が悪化する前記評価関数を決定して、当該評価関数の値が最も改善される方策パラメータを、当該方策パラメータを方策関数に入力して次の時刻の車両の状態における前記車両50のモータトルク指令を生成する手段、
 当該生成したモータトルク指令による試行の結果である前記車両50の状態の情報と前記制御入力との関係と、前記制御モデルの生成に用いた前記車両50の状態の情報と前記制御入力との関係とを用いて前記制御モデルを更新する手段、
 として機能させる。
(13) According to the thirteenth aspect, the program causes the computer of the information processing device to:
Based on the information on the state of the vehicle 50 at a certain time and the control input for the vehicle 50 indicating the motor torque command for the vehicle 50 to stop at the target position in that state, the vehicle 50 at the next time is determined. A control model that estimates the state information of the vehicle 50 using a probability distribution based on the relationship between the state information indicated by past braking control of the vehicle 50 and the control input, and the vehicle at the next time obtained from that relationship. 50 states;
A measure that improves the value of the evaluation function most by determining an evaluation function of evaluation values related to the state of the vehicle, in which the evaluation value worsens as the distance to the target position increases. means for inputting the policy parameters into a policy function to generate a motor torque command for the vehicle 50 in the state of the vehicle at the next time;
A relationship between information on the state of the vehicle 50 that is the result of a trial based on the generated motor torque command and the control input, and a relationship between information on the state of the vehicle 50 used to generate the control model and the control input. means for updating the control model using
function as
 本開示によれば、モータの回生ブレーキを使用して車両を制動させるためのモータの指令値を自動で算出することができる。 According to the present disclosure, it is possible to automatically calculate a motor command value for braking a vehicle using regenerative braking of the motor.
1・・・車両制御装置
2・・・インバータ
3・・・モータ
11・・・記憶部
12・・・学習部
13・・・方策評価部
14・・・方策改善部
1...Vehicle control device 2...Inverter 3...Motor 11...Storage section 12...Learning section 13...Policy evaluation section 14...Policy improvement section

Claims (13)

  1.  ある時刻における車両の状態の情報と、その状態において目標位置で停止するための前記車両のモータトルク指令を示す当該車両の制御入力とを示す情報に基づいて次の時刻における車両の状態の情報を確率分布により推定する制御モデルを、過去に行われた前記車両の制動制御が示す前記状態の情報と前記制御入力との関係とその関係により得られた次の時刻における車両の状態とを用いて生成し、
     前記車両の状態に関する評価値の評価関数であって、前記目標位置までの距離が離れているほど前記評価値が悪化する前記評価関数を決定して、前記評価値が最も改善される方策パラメータを方策関数に入力して次の時刻の車両の状態における前記車両のモータトルク指令を生成し、
     当該生成したモータトルク指令による試行の結果である前記車両の状態の情報と前記制御入力との関係と、前記制御モデルの生成に用いた前記車両の状態の情報と前記制御入力との関係とを用いて前記制御モデルを更新する
     情報処理装置。
    Information on the state of the vehicle at the next time is obtained based on information on the state of the vehicle at a certain time and information indicating a control input for the vehicle that indicates a motor torque command for the vehicle to stop at the target position in that state. A control model estimated by a probability distribution is created using the relationship between the information on the state indicated by past braking control of the vehicle and the control input, and the state of the vehicle at the next time obtained from that relationship. generate,
    Determine an evaluation function of an evaluation value regarding the state of the vehicle, in which the evaluation value worsens as the distance to the target position increases, and select a policy parameter that will most improve the evaluation value. input into a policy function to generate a motor torque command for the vehicle in the vehicle state at the next time;
    a relationship between the vehicle state information that is the result of a trial using the generated motor torque command and the control input; and a relationship between the vehicle state information used to generate the control model and the control input. an information processing apparatus that updates the control model using the information processing apparatus;
  2.  前記評価関数は、ある時刻を基準として、その基準の時刻以降の設定されたタイミングまでの複数の時刻における前記車両の状態に関する評価値の期待値の積分値を示す関数である
     請求項1に記載の情報処理装置。
    The evaluation function is a function that indicates an integral value of an expected value of an evaluation value regarding the state of the vehicle at a plurality of times from a certain time as a reference to a set timing after the reference time. information processing equipment.
  3.  前記車両の状態の情報は、前記車両の位置、速度、空気ばね圧力、モータ電圧、モータトルク出力の情報を少なくとも含む
     請求項1または請求項2に記載の情報処理装置。
    The information processing device according to claim 1 or 2, wherein the information on the state of the vehicle includes at least information on the position, speed, air spring pressure, motor voltage, and motor torque output of the vehicle.
  4.  前記方策関数と正規分布とに従って初期の任意の前記車両の状態をサンプリングし、当該状態に応じた前記期待値の積分により前記評価関数を算出し、
     前記評価関数が最も小さくなる前記方策パラメータを探索する
     請求項2に記載の情報処理装置。
    sampling an initial arbitrary state of the vehicle according to the policy function and normal distribution, and calculating the evaluation function by integrating the expected value according to the state;
    The information processing apparatus according to claim 2, wherein the policy parameter for which the evaluation function is the smallest is searched for.
  5.  回生失効が生じにくい条件下において取得した前記状態の情報と、回生失効が生じ易い条件下において取得した前記状態の情報とを用いて前記制御モデルを生成する
     請求項4に記載の情報処理装置。
    The information processing device according to claim 4, wherein the control model is generated using information on the state acquired under conditions where regeneration failure is unlikely to occur and information on the state acquired under conditions where regeneration failure is likely to occur.
  6.  一定の加速度にランダムな加速度を付加して駆動した前記車両の状態の情報と前記制御入力との関係を用いて前記制御モデルを生成する
     請求項4に記載の情報処理装置。
    The information processing device according to claim 4, wherein the control model is generated using a relationship between the control input and information on the state of the vehicle driven with random acceleration added to a constant acceleration.
  7.  ある時刻における車両の状態の情報と、その状態において目標位置で停止するための前記車両のモータトルク指令を示す当該車両の制御入力とを示す情報に基づいて次の時刻における車両の状態の情報を確率分布により推定する制御モデルを、過去に行われた前記車両の制動制御が示す前記状態の情報と前記制御入力との関係とその関係により得られた次の時刻における車両の状態とを用いて生成し、
     前記車両の状態に関する評価値の評価関数であって、前記目標位置までの距離が離れているほど前記評価値が悪化する前記評価関数を決定して、前記評価値が最も改善される方策パラメータを方策関数に入力して次の時刻の車両の状態における前記車両のモータトルク指令を生成し、
     当該生成したモータトルク指令による試行の結果である前記車両の状態の情報と前記制御入力との関係と、前記制御モデルの生成に用いた前記車両の状態の情報と前記制御入力との関係とを用いて前記制御モデルを更新する
     情報処理方法。
    Information on the state of the vehicle at the next time is obtained based on information on the state of the vehicle at a certain time and information indicating a control input for the vehicle that indicates a motor torque command for the vehicle to stop at the target position in that state. A control model estimated by a probability distribution is created using the relationship between the information on the state indicated by past braking control of the vehicle and the control input, and the state of the vehicle at the next time obtained from that relationship. generate,
    Determine an evaluation function of an evaluation value regarding the state of the vehicle, in which the evaluation value worsens as the distance to the target position increases, and select a policy parameter that will most improve the evaluation value. input into a policy function to generate a motor torque command for the vehicle in the vehicle state at the next time;
    a relationship between the vehicle state information that is the result of a trial using the generated motor torque command and the control input; and a relationship between the vehicle state information used to generate the control model and the control input. an information processing method that updates the control model using the information processing method.
  8.  前記評価関数は、ある時刻を基準として、その基準の時刻以降の設定されたタイミングまでの複数の時刻における前記車両の状態に関する評価値の期待値の積分値を示す関数である
     請求項7記載の情報処理方法。
    The evaluation function is a function that indicates an integral value of an expected value of the evaluation value regarding the state of the vehicle at a plurality of times from a certain time as a reference to a set timing after the reference time. Information processing method.
  9.  前記車両の状態の情報は、前記車両の位置、速度、空気ばね圧力、モータ電圧、モータトルク出力の情報を少なくとも含む
     請求項7または請求項8に記載の情報処理方法。
    The information processing method according to claim 7 or 8, wherein the information on the state of the vehicle includes at least information on the position, speed, air spring pressure, motor voltage, and motor torque output of the vehicle.
  10.  前記方策関数と正規分布とに従って初期の任意の前記車両の状態をサンプリングし、当該状態に応じた前記期待値の積分により前記評価関数を算出し、
     前記評価関数が最も小さくなる前記方策パラメータを探索する
     請求項8に記載の情報処理方法。
    sampling an initial arbitrary state of the vehicle according to the policy function and normal distribution, and calculating the evaluation function by integrating the expected value according to the state;
    The information processing method according to claim 8, further comprising searching for the policy parameter for which the evaluation function is the smallest.
  11.  回生失効が生じにくい条件下において取得した前記状態の情報と、回生失効が生じ易い条件下において取得した前記状態の情報とを用いて前記制御モデルを生成する
     請求項10に記載の情報処理方法。
    The information processing method according to claim 10, wherein the control model is generated using information on the state acquired under conditions where regeneration failure is unlikely to occur and information on the state acquired under conditions where regeneration failure is likely to occur.
  12.  一定の加速度にランダムな加速度を付加して駆動した前記車両の状態の情報と前記制御入力との関係を用いて前記制御モデルを生成する
     請求項10に記載の情報処理方法。
    The information processing method according to claim 10, wherein the control model is generated using a relationship between the control input and information on the state of the vehicle driven with random acceleration added to a constant acceleration.
  13.  情報処理装置のコンピュータを、
     ある時刻における車両の状態の情報と、その状態において目標位置で停止するための前記車両のモータトルク指令を示す当該車両の制御入力とを示す情報に基づいて次の時刻における車両の状態の情報を確率分布により推定する制御モデルを、過去に行われた前記車両の制動制御が示す前記状態の情報と前記制御入力との関係とその関係により得られた次の時刻における車両の状態とを用いて生成する手段、
     前記車両の状態に関する評価値の評価関数であって、前記目標位置までの距離が離れているほど前記評価値が悪化する前記評価関数を決定して、前記評価値が最も改善される方策パラメータを方策関数に入力して次の時刻の車両の状態における前記車両のモータトルク指令を生成する手段、
     当該生成したモータトルク指令による試行の結果である前記車両の状態の情報と前記制御入力との関係と、前記制御モデルの生成に用いた前記車両の状態の情報と前記制御入力との関係とを用いて前記制御モデルを更新する手段、
     として機能させるプログラム。
    The computer of the information processing equipment,
    Information on the state of the vehicle at the next time is obtained based on information on the state of the vehicle at a certain time and information indicating a control input for the vehicle that indicates a motor torque command for the vehicle to stop at the target position in that state. A control model estimated by a probability distribution is created using the relationship between the information on the state indicated by past braking control of the vehicle and the control input, and the state of the vehicle at the next time obtained from that relationship. means of generating;
    Determine an evaluation function of an evaluation value regarding the state of the vehicle, in which the evaluation value worsens as the distance to the target position increases, and select a policy parameter that will most improve the evaluation value. means for inputting into a policy function to generate a motor torque command for the vehicle in the state of the vehicle at a next time;
    a relationship between the vehicle state information that is the result of a trial using the generated motor torque command and the control input; and a relationship between the vehicle state information used to generate the control model and the control input. means for updating the control model using
    A program that functions as
PCT/JP2023/031965 2022-09-07 2023-08-31 Information processing device, information processing method, and program WO2024053566A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022-142287 2022-09-07
JP2022142287A JP2024037423A (en) 2022-09-07 2022-09-07 Information processing device, information processing method, program

Publications (1)

Publication Number Publication Date
WO2024053566A1 true WO2024053566A1 (en) 2024-03-14

Family

ID=90191082

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/031965 WO2024053566A1 (en) 2022-09-07 2023-08-31 Information processing device, information processing method, and program

Country Status (2)

Country Link
JP (1) JP2024037423A (en)
WO (1) WO2024053566A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021043146A (en) * 2019-09-13 2021-03-18 株式会社日立製作所 Obstacle detection system and obstacle detection method
WO2021084574A1 (en) * 2019-10-28 2021-05-06 日産自動車株式会社 Control method for electric motor vehicle and control device for electric motor vehicle

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021043146A (en) * 2019-09-13 2021-03-18 株式会社日立製作所 Obstacle detection system and obstacle detection method
WO2021084574A1 (en) * 2019-10-28 2021-05-06 日産自動車株式会社 Control method for electric motor vehicle and control device for electric motor vehicle

Also Published As

Publication number Publication date
JP2024037423A (en) 2024-03-19

Similar Documents

Publication Publication Date Title
CN108248609B (en) Hybrid vehicle and method of predicting driving pattern in hybrid vehicle
Alcala et al. Gain‐scheduling LPV control for autonomous vehicles including friction force estimation and compensation mechanism
US11161497B2 (en) Hybrid vehicle and method of controlling mode transition
Johannesson et al. Predictive energy management of a 4QT series-parallel hybrid electric bus
Colli et al. " Single Wheel" longitudinal traction control for electric vehicles
KR102313002B1 (en) Vehicle speed control device and vehicle speed control method
CN108284836A (en) A kind of longitudinal direction of car follow-up control method
CN107878457A (en) A kind of adaptive cruise torque control method, device and electric automobile
Zhang et al. Data-driven based cruise control of connected and automated vehicles under cyber-physical system framework
Shyrokau et al. Vehicle dynamics control with energy recuperation based on control allocation for independent wheel motors and brake system
CN105480228A (en) Enhanced vehicle speed control
JP4251095B2 (en) Vehicle control device
WO2024053566A1 (en) Information processing device, information processing method, and program
Farajpour et al. Novel energy management strategy for electric vehicles to improve driving range
CN109383503B (en) System and method for reducing device errors through use of propulsion torque
JP3959239B2 (en) Automatic train driving device
US20220170543A1 (en) Gear stage choosing apparatus, gear stage choosing method, and simulation apparatus
Sheykhi et al. Providing robust-adaptive fractional-order sliding mode control in hybrid adaptive cruise control systems in the presence of model uncertainties and external disturbances
Allende et al. Advanced shifting control of a two speed gearbox for an electric vehicle
CN115315621A (en) Automatic driving device for tested body, automatic driving method for tested body and test system for tested body
CN114620028A (en) Automatic parking and braking control method and system and vehicle
WO2021080012A1 (en) Speed profile creation device, operation assistance device, operation control device and speed profile creation method
Wong et al. Adaptive control of vehicle yaw rate with active steering system and extreme learning machine-A pilot study
El Hajjami et al. Vehicle adaptive cruise controller based on an optimal super-twisting sliding mode control
Dardanelli et al. Speed and acceleration controllers for a light electric two-wheeled vehicle

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23863104

Country of ref document: EP

Kind code of ref document: A1