WO2023159841A1 - 高速公路网联车协同匝道汇入多目标优化控制方法和系统 - Google Patents

高速公路网联车协同匝道汇入多目标优化控制方法和系统 Download PDF

Info

Publication number
WO2023159841A1
WO2023159841A1 PCT/CN2022/102755 CN2022102755W WO2023159841A1 WO 2023159841 A1 WO2023159841 A1 WO 2023159841A1 CN 2022102755 W CN2022102755 W CN 2022102755W WO 2023159841 A1 WO2023159841 A1 WO 2023159841A1
Authority
WO
WIPO (PCT)
Prior art keywords
vehicle
ramp
vehicles
data
auxiliary
Prior art date
Application number
PCT/CN2022/102755
Other languages
English (en)
French (fr)
Inventor
董瀚萱
丁璠
张海龙
谭华春
叶林辉
戴昀琦
Original Assignee
东南大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东南大学 filed Critical 东南大学
Priority to US18/112,541 priority Critical patent/US20230267829A1/en
Publication of WO2023159841A1 publication Critical patent/WO2023159841A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0137Measuring and analyzing of parameters relative to traffic conditions for specific applications
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/16Anti-collision systems
    • G08G1/166Anti-collision systems for active traffic, e.g. moving vehicles, pedestrians, bikes

Definitions

  • the invention belongs to the technical field of intelligent transportation vehicle-road coordination, and specifically relates to a multi-objective optimization control method and system for expressway network-connected vehicle coordinated ramp merge.
  • a coordinated ramp-in strategy is proposed, which can better control the ramp-in by adjusting the trajectory of vehicles to ensure traffic efficiency and safety.
  • Common import strategies can be divided into heuristic methods (rule-based methods or fuzzy methods) and optimal methods.
  • heuristic algorithms usually require domain-specific expertise for the fine design of certain driving rules, lack adaptability to other unknown situations, and are difficult to achieve optimal control.
  • the inventor has proposed a battery life-oriented on-ramp reinforcement learning method in related papers to initially solve the on-ramp import problem, but this method is mainly oriented to battery health issues and is an implementation case for specific application scenarios of new energy vehicles.
  • this method cannot be applied to the mixed ramp-in scene of traditional fuel vehicles and new energy vehicles; at the same time, the paper does not involve the selection of the ramp-in vehicle selection of the main road into the gap, and there is a lack of key links in the actual application of the ramp-in.
  • this patent proposes a multi-objective optimization control method and system for the coordinated ramp merging of expressway networked vehicles on the basis of the specific cases proposed in the paper. technical method system.
  • the present invention proposes a multi-objective optimization control method and system for expressway network-linked vehicle synergistic ramp merging, through which the vehicle trajectory in the coordinated ramp control area can ensure the overall traffic efficiency of the road while completing the ramp merging , safe and energy-saving operation.
  • the present invention provides a multi-objective optimization control method for expressway network-linked vehicle coordinated ramp merge, including the following steps:
  • Step 1 collect the state data of vehicles in the expressway control area, and analyze and process the state data;
  • the control area includes the main road and ramp intersection of the expressway, the merging area, part of the main road, and part of the ramp
  • the scope of the control area is the communication range of the roadside unit
  • the roadside unit is set at the intersection of the main road and the ramp of the expressway
  • the merging area is a pre-selected area, including the acceleration Parts of the lane and sections of the main road parallel to parts of the acceleration lane;
  • Step 2 according to the state data of vehicles in the control area, construct a set of options for ramp-in vehicles, auxiliary vehicles and leading vehicles;
  • Step 3 Input the alternative scheme sets into the artificial intelligence-based ramp import multi-objective control model, and further determine the selection of the auxiliary vehicle and the guiding vehicle through the optimal value strategy;
  • Step 4 Control and adjust the acceleration of the auxiliary vehicle and the leading vehicle according to the selected auxiliary vehicle, leading vehicle and merging vehicle to ensure that the ramp-merging vehicle safely merges from the acceleration lane to the main road in the selected merging area middle;
  • Step 5 collect the status data of the on-ramp vehicle and the auxiliary vehicle after the acceleration adjustment, and return to step 4 to perform the acceleration adjustment at the next moment.
  • the vehicle status data includes the vehicle's position, speed, power battery status and corresponding time information in the control area.
  • the state data is analyzed and processed, including but not limited to data analysis, feature extraction and information fusion.
  • the optimal value strategy for the selection of the incoming vehicle, the leading vehicle and the auxiliary vehicle is as follows:
  • z main road vehicles behind and in front of the incoming vehicles as candidates for auxiliary vehicles and leading vehicles, where z is a positive integer And less than or equal to 5;
  • the state information of all vehicles in the control area in the previous t time steps includes speed, position and acceleration.
  • the ramp merged multi-objective control model based on artificial intelligence, its successful merged in time
  • x l , v l and a l are the position, velocity and acceleration of the leading vehicle;
  • x f , v f and a f represent the position, velocity and acceleration of the auxiliary vehicle;
  • x m , v m and a m represent the ramp-in The position, velocity and acceleration of the vehicle;
  • is a constant time interval,
  • L 1 is the length of the vehicle,
  • s 0 is the pause gap,
  • d min and d max are the starting point and end point of the merging area, and the length of the merging area is d max -d min ;
  • the formula from top to bottom indicates that the on-ramp vehicle is behind the lead vehicle, the on-ramp vehicle is in front of the auxiliary vehicle, the speed of the on-ramp vehicle is consistent with that of the leading vehicle, the speed of the on-ramp vehicle is consistent with that of the auxiliary vehicle, and Ramp merging vehicles safely merge into the main road from the acceleration lane
  • step 4 of the optimal value strategy for the selection of merging vehicles, leading vehicles and auxiliary vehicles the artificial intelligence-based ramp merging multi-objective control model adopts reinforcement learning participants-
  • the evaluator algorithm is used to solve the problem, and the specific process is as follows:
  • the reward function is The reward r(s, a, g * ) of each time step includes the long-term goal import reward R m (t), and introduces at least two short-term goal rewards according to the requirements of safe, efficient and comfortable driving, among which must
  • the included long-term target remittance reward R m (t) is expressed as follows:
  • Data exploration expansion and target space optimization further propose a virtual target construction algorithm based on multi-experience replay, introduce virtual targets, and realize the expansion of data exploration while optimizing the target space;
  • the input control strategy is trained through the deep neural network with ⁇ A as the parameter, and the strategy directly outputs actions to control
  • ⁇ A as the parameter
  • the goal of strategy optimization is to find the optimal behavior strategy a, so as to maximize the expected return of the entire trip
  • the short-term goal rewards for the reward construction of the reinforcement learning participant-evaluator algorithm include but are not limited to: energy-saving rewards R e (t), comfort rewards R p (t), traffic efficiency rewards R s ( t) and battery status reward R b (t);
  • F c is the weighted Chebyshev norm, Represents the maximum deviation between each target and the super-ideal optimal value;
  • the reward function is constructed as:
  • the steps of the virtual target construction algorithm for multi-experience replay proposed in the reinforcement learning participant-evaluator algorithm reward construction are as follows:
  • Multi-experience virtual target construction and target space optimization at time step t based on the target g * and data chain (s t
  • Data exploration optimization based on optimized target space construct l virtual target data chains based on acquired l virtual targets: (s t
  • Full time step data exploration optimization for each time step, multi-experience virtual target construction and target space optimization under repeated steps and data exploration optimization based on optimized target space, complete data exploration and target space at all time steps optimization;
  • Effective selection of virtual targets based on artificial intelligence models train the artificial intelligence model based on the optimized data set, and select the optimal virtual target data chain at each time step according to the training results;
  • the present invention provides a system for a multi-objective optimization control method for expressway network-linked vehicle synergy ramp merge, including an information collection module, a data transmission module, a traffic control module, and an intelligent optimization module;
  • the information collection module is used to collect the status data of the vehicles in the control area, analyze and process the status data, and select ramp-in vehicles, auxiliary vehicles and guiding vehicles;
  • the information collection module includes a vehicle-mounted unit and a roadside unit.
  • the roadside unit is arranged at the intersection of the main road and the ramp of the expressway.
  • the roadside unit is used to collect the position, speed and corresponding information of the vehicle in the control area.
  • the time information is also used to collect the time when the vehicle enters the ramp and the time when the front bumper reaches the exit line of the ramp.
  • the on-board unit is used to collect the state of the power battery of the vehicle in the control area and the corresponding time information;
  • the data transmission module is used to use mobile communication technology as the main information transmission communication mode, assisting one or both of WiFi/BT and DSRC wireless communication modes to realize data transfer between the information collection module and the traffic control module, and between the traffic control module and the smart phone. Optimize the transmission between modules;
  • the traffic control module is used to obtain the real-time optimal behavior strategy a, target g and reward r according to the vehicle state data provided by the information collection module, and send the behavior strategy to the vehicle-mounted unit to realize real-time control of the vehicle, and at the same time the optimal Behavior strategy a, target g and reward r are sent to the intelligent optimization module;
  • the intelligent optimization module is used to store the incoming data of the traffic control module, and optimize the ramp-in multi-objective control model based on the candidate vehicle set AL selected in step 3 and the optimization algorithm proposed in step 4, and The optimized model is transmitted to the traffic control module.
  • the present invention adopts the above technical scheme and has the following technical effects:
  • the present invention promotes the ramp-in process by controlling the vehicles in the two lanes of the ramp-in area, avoids the queuing problem caused by the delay in the ramp-in vehicles caused by the traditional ramp control means only using flow rate control, and ensures the traffic capacity of the main line At the same time, it actively promotes the realization of ramp vehicles, which greatly improves the traffic efficiency in the ramp area; and the single-vehicle control method also ensures the safe driving between vehicles.
  • the present invention uses reinforcement learning to perform single-vehicle controlled ramp merging, which is different from other single-vehicle controlled ramp merging methods.
  • the method of reinforcement learning does not need to construct a complex model, and can continuously explore and optimize the control strategy from historical data. At the same time, due to the diversity of exploration, the algorithm also has certain robustness and adaptability.
  • the present invention introduces a multi-experience virtual target construction method, introduces target space optimization and data exploration steps, greatly improves the data exploration efficiency, and effectively solves the problem of sparse long-term rewards in multi-target ramp control and the gap between long and short rewards.
  • the coupling relationship of the problem ensures the safety and effectiveness of multi-objective ramp merging.
  • the improvement of the present invention mainly includes:
  • this paper proposes an import method applicable to mixed ramp scenarios of various vehicles (including but not limited to traditional fuel vehicles, new energy vehicles, etc.), and this paper
  • the invention provides a key technical link applied to the actual highway merge scene.
  • the present invention further optimizes the vehicle restriction conditions at the time of safe and successful merging, and at the same time, enriches the selection algorithm of the control area guide vehicle, auxiliary vehicle and merging vehicle.
  • the selection method of the present invention The scheme can obtain the optimal vehicle selection scheme, further improves the optimal solution of the algorithm, and makes the operation of the vehicle more energy-saving, efficient and stable.
  • the present invention further optimizes the importance levels of long-term rewards and short-term rewards, and reserves an interface for the introduction of the present invention's goals for different actual situations, and the present invention determines the coefficient relationship between short-term rewards, ensuring that the present invention can adapt to The situation of multiple different targets further enhances the scope of application of the present invention.
  • the present invention further optimizes the data exploration algorithm proposed in the article, introduces a multi-experience virtual target construction method, which is different from the simple selection of the next state as the target in the paper, further improves the utilization efficiency of data, and improves the direction and correctness of data exploration , Greatly avoid the occurrence of dangerous situations such as collisions.
  • Fig. 1 is a schematic diagram of a highway merge scene proposed by the present invention
  • Fig. 2 is a structure diagram of a multi-objective optimal control method for expressway network-linked vehicles coordinated ramp merge in the present invention.
  • the method proposed in the present invention is based on the following assumptions: 1) Internet-connected vehicles need to have the necessary information transmission and instruction realization capabilities, that is, the vehicle-mounted unit has the ability to exchange information, and the vehicle fully executes the control instructions; 2)
  • the ramp control area includes the main Road and ramp intersection, ramp-in area, and part of the main line and ramp, and the control range of the control area is the communication range of the roadside unit; 3)
  • the multi-objective optimization control system proposed by the present invention for expressway network-linked vehicle collaborative ramp integration includes an information collection module, a data transmission module, a traffic control module and an intelligent optimization module, wherein:
  • the information collection module collects the speed, position, power battery status and other state information of the vehicles in the control area in real time through the on-board unit and the roadside unit, as well as the corresponding time information and the start time information of the ramp vehicle import (selection of the import vehicle) time), and analyze and process the data, the data analysis and processing steps are not limited to data analysis, feature extraction, information fusion, etc.;
  • the information collection cooperation between the on-board unit and the roadside unit included in the information collection module is as follows:
  • the roadside unit is responsible for the collection of traffic status information, such as the speed and position of vehicles in the area and the start time of the ramp-in vehicle inflow;
  • the on-board unit is responsible for the collection of vehicle-related parameter information, such as engine status information, battery status information, and gear information.
  • vehicle-related parameter information such as engine status information, battery status information, and gear information.
  • the power components of the networked vehicles designed in the present invention include: fuel vehicles, pure electric vehicles, hydrogen energy vehicles and hybrid vehicles.
  • the data transmission module with the fifth generation mobile communication technology (5G) as the main information transmission communication mode, assisting one or more of wireless communication methods such as WiFi/BT, DSRC, etc. to realize data transfer between the vehicle unit, the drive test unit and Transmission between modules;
  • 5G fifth generation mobile communication technology
  • the traffic control module obtains the real-time optimal behavior strategy a, target g and feedback r according to the traffic state information provided by the information collection module, and sends the strategy to the on-board unit to realize real-time control of the vehicle; at the same time, collects the traffic state information Pack and send to the intelligent optimization module;
  • Intelligent optimization module including training and data sub-modules.
  • the data module stores the data passed in by the traffic control module, and uses the artificial intelligence data exploration method to obtain more data pairs for storage; the training module uses the reward function to import more data into the ramp based on artificial intelligence according to the data provided by the data module.
  • the target control model is developed and the optimized model is transferred to the traffic control module.
  • the control area includes the intersection of the main road and the ramp, the ramp-in area, and some sections of the main line and the ramp. It is worth noting that the range of the control area is the communication range of the roadside unit. See Figure 1 for details.
  • the multi-objective optimization control method for the coordinated ramp merge of expressway networked vehicles proposed by the present invention includes the following steps:
  • the S01 information collection module obtains vehicle status data in the control area
  • the S02 information collection module performs data analysis, information fusion and other processing on the collected data through the intelligent data analysis model;
  • S03 Construct an optional scheme set AL of ramp-in vehicles, auxiliary vehicles and leading vehicles according to the state data of the vehicles in the control area;
  • S04 Input the set of optional solutions into the artificial intelligence-based ramp into the multi-objective control model, and further determine the selection of the auxiliary vehicle and the guiding vehicle through the optimal value strategy;
  • the S05 transmission module transmits the vehicle selected in step S04 and the vehicle-related acquisition data to the control module;
  • the traffic control module confirms the vehicle sent by the instruction, and implements the decision-making based on the artificial intelligence-based ramp-merging multi-objective control model. -Critic) algorithm;
  • the transmission module transmits the decision-making information to the vehicle-mounted unit of the controlled vehicle, so as to realize the goal of vehicle safety, high efficiency, and energy-saving ramp-in;
  • the controlled vehicle is a ramp-in vehicle and an auxiliary vehicle;
  • S08 collects the controlled vehicle state information data pair under the control strategy, and stores the controlled vehicle state information data pair into the data sub-module to form a closed-loop control.
  • Steps S03-S04 the selection method of the import vehicle, auxiliary vehicle and guide vehicle is structured according to the optimal value strategy algorithm, as follows:
  • Selection of incoming vehicles Set the vehicle with the closest distance between the front bumper on the ramp and the exit line of the ramp as the incoming vehicle, and obtain the status information of all vehicles in the control area for the first t time steps, such as speed, position and acceleration ;
  • the multi-objective control model of ramp merge based on artificial intelligence is based on the acceleration control adjustment of the selected guide vehicle to the merge vehicle and the auxiliary vehicle, and solves the optimal control under the multi-objective. At the same time, it ensures the optimization of energy consumption of imported vehicles and the efficient and safe passage of roads. Specifically, it includes the construction of the mathematical model of the problem and the optimization solution based on reinforcement learning.
  • x l , v l and a l are the position, velocity and acceleration of the leading vehicle;
  • x f , v f and a f represent the position, velocity and acceleration of the auxiliary vehicle;
  • x m , v m and a m represent the ramp-in The position, velocity and acceleration of the vehicle;
  • is a constant time interval, L1 is the length of the vehicle, s 0 is the pause gap, d min and d max are the starting point and end point of the merging area, and the length of the merging area is d max -d min ;
  • the formula from top to bottom indicates that the on-ramp vehicle is behind the leading vehicle, the on-ramp vehicle is in front of the auxiliary vehicle, the speed of the on-ramp vehicle and the leading vehicle are the same, the speed of the on-ramp vehicle and the auxiliary vehicle is the same, and the ramp Merging vehicles safely merge into the main road from the acceleration lane within the selected merge area;
  • the multi-objective control model based on the artificial intelligence ramp is solved using the reinforcement learning participant-evaluator algorithm Actor-Critic, and the specific process is as follows:
  • the reward function is When the reward r(s, a, g * ) of each time step includes the long-term goal import reward R m (t), various short-term goal rewards can be introduced according to driving requirements such as safety, efficiency and comfort.
  • 2Other goals may include but are not limited to energy saving, stability, comfort, and high efficiency.
  • For construction please refer to:
  • V oc is the open circuit voltage
  • R int is the resistance
  • P b (t) is the battery power at time t
  • Q c is the capacity of the battery.
  • the energy-saving reward includes the SOC of the on-ramp vehicle and the auxiliary vehicle, as follows:
  • N is the number of battery cycles
  • E 0 (0) is the standard battery capacity energy
  • the battery life reward is the SOH situation of the on-ramp vehicle and the auxiliary vehicle, as follows:
  • F c is the weighted Chebyshev norm, Represents the maximum deviation between each target and the super-ideal optimal value;
  • the reward function is constructed as:
  • 3Full time step data exploration optimization Repeat 1 and 2 for each time step to complete data exploration and target space optimization under all time steps;
  • 4Validity selection of virtual target based on artificial intelligence model conduct artificial intelligence model training according to the optimized data set, and select the optimal virtual target data chain under each time step according to the training results;
  • 5Validation of the fully connected neural network H of the virtual target according to the optimal virtual target data link under each time length, the parameters of the fully connected neural network H of the virtual target Perform verification, continuously improve the accuracy of virtual target generation, and improve the algorithm training speed.
  • the inflow control strategy is trained through a deep neural network with ⁇ A as a parameter, and the strategy directly outputs actions to control the ramp sink
  • ⁇ A as a parameter
  • the goal of strategy optimization is to find the optimal behavior strategy a, so as to maximize the return expectation of the entire trip
  • Q ⁇ is the value function
  • is the loss factor
  • s, a, g are the state, behavior strategy, and goal of the current time step, respectively
  • s′, a′, g′ are the state of the next time step, Behavioral strategies, goals;
  • is the learning rate
  • J C ( ⁇ C ) (Q ⁇ (s, a, g
  • ⁇ C is the Critic network parameter
  • the optimal control strategy can be obtained, so as to realize the multi-objective optimization control method and system optimal control of the coordinated ramp integration of networked vehicles facing the expressway scene through each module, and realize efficient, safe and energy-saving driving in the ramp area .

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Traffic Control Systems (AREA)

Abstract

高速公路网联车协同匝道汇入多目标优化控制方法和系统。其中的方法包括:步骤1,采集高速公路控制区域内车辆的状态数据,并对状态数据进行分析和处理;步骤2,根据控制区域内车辆的状态数据构建匝道汇入车辆、辅助车辆和引导车辆的可选方案集合;步骤3,将可选方案集合分别输入基于人工智能的匝道汇入多目标控制模型中,通过最优价值策略进一步确定辅助车辆和引导车辆;步骤4,根据选择的辅助车、引导车和汇入车辆,对辅助车和引导车进行加速度控制调节,确保匝道汇入车辆在选定的汇入区域内从加速车道安全汇入到主路中;步骤5,采集进行加速度调节后的匝道汇入车辆和辅助车辆的状态数据,并返回步骤4,进行下一时刻的加速度调节。

Description

高速公路网联车协同匝道汇入多目标优化控制方法和系统 技术领域
本发明属于智能交通车路协同技术领域,具体为高速公路网联车协同匝道汇入多目标优化控制方法和系统。
背景技术
随着国民经济的飞速增长及城市化进程的不断推动,区域贸易增多促使城市间出行迅猛增长,为高速公路的高效、安全运行带来了挑战。其中,入口匝道区域作为影响高速公路交通效率的瓶颈,是交通管理中尤为关注的问题。常用匝道控制方法是通过调整宏观交通流状态变量(如密度或流量)控制入口匝道流入的速率。然而,匝道计量策略的目的是优先考虑干线的效率,并试图防止拥塞。由于无法控制单个车辆的移动来促进汇入过程,该方法可能在主线上车辆较多的情况下无法成功汇入,导致拥堵和安全问题。
考虑到智能网联车优异的动态控制能力,匝道协同汇入策略被提出,通过调节车辆的轨迹来更好地控制汇入,以保证交通效率和安全性。常见的汇入策略可分为启发式方法(基于规则的方法或模糊方法)和最优方法。然而,启发式算法通常需要领域特定的专业知识来进行某些驾驶规则的精细设计,缺乏对其他未知情况适应性且很难达到控制最优。优化算法例如PMP、DP等算法由于缺乏自学习能力,难以在入口匝道合并复杂环境下实现全局优化,缺乏对问题的适应性;基于强化学习的优化算法虽然能提高系统对于未知环境的适应性,但现有的策略过于注重奖励塑造和模型构建,无法从根本上解决协同匝道汇入问题。从本质上讲,协同匝道汇入需要同时满足多目标的长期和短期反馈。也就是说,短期动作执行时需要考虑到对长期目标的影响,但是对于汇入问题来说,入口匝道汇入评估的奖励是稀疏的和长期的。在这种情况下,传统的强化学习方法在处理稀疏奖励问题时很难避免局部最优和不稳定性,影响匝道区域车辆的通行效率和安全。
针对以上问题,发明人已提出了相关论文面向电池寿命的入口匝道强化学习方法,初步解决入口匝道汇入问题,但该方法主要面向电池健康问题,是针对新能源汽车特定应用场景下实施案例,但无法应用于现实传统燃油车与新能源汽车混合匝道汇入场景;同时,文中未涉及匝道汇入车辆选择主路汇入间隙选择,在实际匝道汇入实际应用缺乏关键环节。针对上述问题,本专利以论文所提出具体案例基础上,提出高速公路网联车协同匝道汇入多目标优化控制方法和系统,是针对入口匝道车辆汇入问题建立全面、完整的框架与更先进技术方法体系。
发明内容
为解决上述技术问题,本发明提出了高速公路网联车协同匝道汇入多目标优化控制方法和系统,通过协匝道控制区域内的车辆行驶轨迹,在完成匝道汇入的同时保证道路整体交通高效、安全、节能运行。
本发明提供高速公路网联车协同匝道汇入多目标优化控制方法,包括如下步骤:
步骤1,采集高速公路控制区域内车辆的状态数据,并对状态数据进行分析和处 理;所述控制区域包括高速公路的主路和匝道交叉点、汇入区域、主路部分路段、匝道部分路段以及加速车道部分路段,控制区域的范围即路侧单元的通信范围,所述路侧单元设置于高速公路的主路和匝道交叉点处,所述汇入区域为预先选定的区域,包括加速车道的部分路段以及与加速车道的部分路段平行的主路路段;
步骤2,根据控制区域内车辆的状态数据构建匝道汇入车辆、辅助车辆和引导车辆的可选方案集合;
步骤3,将可选方案集合分别输入基于人工智能的匝道汇入多目标控制模型中,通过最优价值策略进一步确定辅助车辆和引导车辆的选择;
步骤4,根据选择的辅助车、引导车和汇入车辆,对辅助车和引导车进行加速度进行控制调节,确保匝道汇入车辆在选定的汇入区域内从加速车道安全汇入到主路中;
步骤5,采集进行加速度调节后的匝道汇入车辆和辅助车辆的状态数据,并返回步骤4,进行下一时刻的加速度调节。
作为本发明控制方法进一步改进,所述步骤1中,车辆的状态数据包括控制区域内车辆的位置、速度、动力电池状态以及相应的时刻信息。
作为本发明控制方法进一步改进,所述步骤1中,对状态数据进行分析和处理,包括但不限于数据分析、特征提取和信息融合。
作为本发明控制方法进一步改进,所述步骤2-3中,所述对于汇入车辆、引导车辆和辅助车辆的选择的最优价值策略如下:
4.1汇入车辆的选择:将匝道上前保险杆和匝道出口线距离最近的车辆设为汇入车辆,并获取高速公路控制区域所有车辆前t个时间步长的状态信息;
4.2根据所有车辆的位置信息,依据主线车辆与汇入车辆的前后关系,初步选择汇入车辆后方和前方的各z辆主路车辆作为辅助车和引导车的备选车辆,其中z为正整数且小于等于5;
4.3在所有备选车辆中,选取邻近的两辆车作为一组引导车和辅助车,构建该汇入车辆的引导车和辅助车的备选集合AL;
4.4采用遍历法将备选集合AL中的组合分别代入基于人工智能的匝道汇入多目标控制模型中,依据模型的值函数Q π确定最终选取的汇入车辆、辅助车辆和引导车辆。
作为本发明控制方法进一步改进,所述控制区域所有车辆前t个时间步长的状态信息包括速度、位置及加速度。
作为本发明控制方法进一步改进,所述对于汇入车辆、引导车辆和辅助车辆的选择的最优价值策略的步骤4中,基于人工智能的匝道汇入多目标控制模型,其成功汇入时刻的目标函数及限制条件的构建如下:
6.1设定安全汇入时刻为
Figure PCTCN2022102755-appb-000001
构建安全汇入时刻引导车辆、辅助车辆和匝道汇入车辆需满足的位置和速度关系:
Figure PCTCN2022102755-appb-000002
Figure PCTCN2022102755-appb-000003
Figure PCTCN2022102755-appb-000004
Figure PCTCN2022102755-appb-000005
Figure PCTCN2022102755-appb-000006
式中,x l,v l和a l引导车辆的位置、速度和加速度;x f,v f和a f表示辅助车辆的位置、速度和加速度;x m,v m和a m表示匝道汇入车辆的位置、速度和加速度;τ为恒定的时间间隔,L 1为车辆的长度,s 0为停顿间隙,d min和d max分别为汇入区域的起点和终点,汇入区域的长度为d max-d min;公式从上到下依次表示匝道汇入车辆在引导车辆后面、匝道汇入车辆在辅助车辆前面、匝道汇入车辆和引导车辆速度一致、匝道汇入车辆和辅助车辆速度一致以及匝道汇入车辆在选定的汇入区域内从加速车道安全汇入到主路中;
6.2在满足6.1的条件下,进一步构建包含但不限于驾驶舒适性、车辆能耗、通行效率等目标的目标函数C如下:
Figure PCTCN2022102755-appb-000007
式中,
Figure PCTCN2022102755-appb-000008
表示不同目标的代价函数,c n表示参数。
作为本发明控制方法进一步改进,所述对于汇入车辆、引导车辆和辅助车辆的选择的最优价值策略的步骤4中,基于人工智能的匝道汇入多目标控制模型,采用强化学习参与者-评价者算法进行求解,具体过程如下:
7.1状态空间
Figure PCTCN2022102755-appb-000009
及行为空间
Figure PCTCN2022102755-appb-000010
建立:根据引导车辆、辅助车辆和匝道汇入车辆的状态数据选择六维状态信息s={x l,x m,x f,v l,v m,v f}表示环境中最相关的影响因素,
Figure PCTCN2022102755-appb-000011
依据控制对象选择控制行为策略
Figure PCTCN2022102755-appb-000012
7.2最优目标建立:根据安全汇入时刻
Figure PCTCN2022102755-appb-000013
时的车辆限制条件关系,构建匝道汇入最优目标集合
Figure PCTCN2022102755-appb-000014
其中,
Figure PCTCN2022102755-appb-000015
为目标空间的集合,
Figure PCTCN2022102755-appb-000016
表示满足公式(1)中汇入车辆在引导车辆后面,
Figure PCTCN2022102755-appb-000017
表示满足公式(2)中匝道汇入车辆在辅助车辆前面,
Figure PCTCN2022102755-appb-000018
表示满足公式(3)中匝道汇入车辆和引导车辆速度一致,
Figure PCTCN2022102755-appb-000019
表示满足公式(4)中匝道汇入车辆和辅助车辆速度一致,
Figure PCTCN2022102755-appb-000020
表示满足公式(5)中匝道汇入车辆在选定的汇入区域内从加速车道安全汇入到主路中;
7.3目标空间构建:依据7.2最优目标空间集合
Figure PCTCN2022102755-appb-000021
所包含的分类,建立目标空间集合
Figure PCTCN2022102755-appb-000022
满足g={g 1,g 2,g 3,g 4,g 5},g 1表示匝道汇入车辆与引导车辆的位置关系,g 2表示匝道汇入车辆和辅助车辆的位置关系,g 3表示匝道汇入车辆和引导车 辆速度关系,g 4表示匝道汇入车辆和辅助车辆速度关系,g 5表示匝道汇入车辆的位置与合并区域的关系;
7.4奖励构建:奖励函数为
Figure PCTCN2022102755-appb-000023
每个时间步长的奖励r(s,a,g *)在包括长期目标汇入奖励R m(t)的情况下,根据安全、高效及舒适行驶要求引入至少两种短期目标奖励,其中必须包含的长期目标汇入奖励R m(t)表示如下:
Figure PCTCN2022102755-appb-000024
7.5数据链的获取:根据7.1至7.3获取的第t个时间步长的状态、目标、策略和奖励数据得到数据链s t||g *,a,r,s t+1||g *并将数据存储入智能优化模块,其中S||g *表示状态s和目标g *的连接;
7.6数据探索拓展及目标空间优化:进一步提出基于多经验重放的虚拟目标构建算法,进行虚拟目标的引入,在对目标空间优化的同时实现数据探索的扩充;
7.7在每个时间步骤中,根据智能优化模块存储的数据链,基于参与者-评价者算法框架,通过以θ A为参数的深度神经网络来训练汇入控制策略,该策略直接输出动作来控制匝道汇入车辆和辅助车辆的加速度值与状态和目标输入,策略优化的目标是找到最优的行为策略a,使整个行程的回报期望最大化,最终,最优控制策略通过经过训练的网络的前向传递输出:a=π(s,g|θ A)。
作为本发明控制方法进一步改进,所述强化学习参与者-评价者算法奖励构建中短期目标奖励包括不限于:节能奖励R e(t)舒适性奖励R p(t)、通行高效奖励R s(t)和电池状态奖励R b(t);
所述强化学习参与者-评价者算法的基于耦合切比雪夫的多目标奖励优化方法具体步骤如下:
确定优化奖励项:假设汇入成功后引入多种实时短期目标的数量为n r个,则汇入问题的优化奖励项为n r个;
确定各奖励的朝理想最优值:构建各个目标的超理想最优值
Figure PCTCN2022102755-appb-000025
其中
Figure PCTCN2022102755-appb-000026
为理想值,根据经验数据选择,
Figure PCTCN2022102755-appb-000027
为一常数,表示超理想最优值比理想值好的程度;
构建多目标问题的广义加权切比雪夫最优化模型:设λ i为短期目标的切比雪夫权重,则得多目标问题的转化为广义加权切比雪夫但目标问题,如下式:
Figure PCTCN2022102755-appb-000028
Figure PCTCN2022102755-appb-000029
其中,F c为加权的切比雪夫范数,
Figure PCTCN2022102755-appb-000030
代表各个目标与超理想最优值之间的最大偏差;
Figure PCTCN2022102755-appb-000031
为保证算法稳定的项,通常ρ=0.001;λ i的计算公式参考如下:
Figure PCTCN2022102755-appb-000032
奖励函数构建为:
Figure PCTCN2022102755-appb-000033
作为本发明控制方法进一步改进,所述强化学习参与者-评价者算法奖励构建中所提出多经验重放的虚拟目标构建算法步骤如下:
t时间步长下的多经验虚拟目标构建及目标空间优化:依据t时间步长下的目标g *和数据链(s t||g *,a,r,s t+1||g *),构建以
Figure PCTCN2022102755-appb-000034
为参数的全连接神经网络
Figure PCTCN2022102755-appb-000035
获取当前状态下的l个虚拟目标为:
Figure PCTCN2022102755-appb-000036
基于优化目标空间的数据探索优化:依据获取的l个虚拟目标构建l个虚拟目标数据链:(s t||g 1,a,r,s t+1||g 1),(s t||g 2,a,r,s t+1||g 2),...,(s t||g l,a,r,s t+1||g l);
并将虚拟目标数据链存储入智能优化模块;
全时间步长数据探索优化:对每个时间步长重复步长下的多经验虚拟目标构建及目标空间优化和基于优化目标空间的数据探索优化,完成所有时间步长下的数据探索及目标空间优化;
基于人工智能模型的虚拟目标有效性选择:根据优化后的数据集合进行人工智能模型的训练,并根据训练结果选取每个时间步长下的最优虚拟目标数据链;
虚拟目标全连接神经网络H的校核:根据各个时长下最优虚拟目标数据链对虚拟 目标全连接神经网络H的参数
Figure PCTCN2022102755-appb-000037
进行校验,不断提升虚拟目标生成的准确性,以保证算法性能和训练速度。
本发明提供高速公路网联车协同匝道汇入多目标优化控制方法的系统,包括信息采集模块、数据传输模块、交通控制模块以及智能优化模块;
所述信息采集模块用于采集控制区域内车辆的状态数据,并对状态数据进行分析和处理,选定匝道汇入车辆、辅助车辆和引导车辆;
所述信息采集模块包括车载单元和路侧单元,所述路侧单元设置于高速公路的主路和匝道交叉点处,所述路侧单元用于采集控制区域内车辆的位置、速度及相应的时刻信息,还用于采集匝道汇入车辆确定的时间及前保险杆到达匝道出口线的时间,所述车载单元用于采集控制区域内车辆的动力电池状态及相应的时刻信息;
所述数据传输模块用于以移动通信技术为主体信息传输通信方式,辅助WiFi/BT、DSRC无线通信方式中的一种或两种实现数据在信息采集模块与交通控制模块、交通控制模块与智能优化模块之间的传输;
所述交通控制模块用于根据信息采集模块提供的车辆状态数据获取实时最优的行为策略a、目标g及奖励r,并将行为策略发送至车载单元,实现车辆实时控制,同时将最优的行为策略a、目标g及奖励r发送至智能优化模块;
所述智能优化模块用于存储所述交通控制模块传入的数据,并基于步骤3所选择的备选车辆集合AL和步骤4所提出的优化算法对匝道汇入多目标控制模型进行优化,并将优化后的模型传输至所述交通控制模块。
有益效果:
本发明采用以上技术方案与现有技术相比,具有以下技术效果:
1、本发明通过控制匝道汇入区域两个车道的车辆,促进了匝道汇入的过程,避免传统匝道控制手段仅使用流率控制导致匝道车辆汇入延迟引发的排队问题,在保证主线通行能力的前提下,同时积极促进了匝道车辆的汇入实现,极大的提高了匝道区域的交通通行效率;且单车控制的方法也保证了车辆之间的安全行驶。
2、本发明使用强化学习进行单车控制匝道汇入,区别于其他单车控制的匝道汇入方法。强化学习的方法无需进行复杂模型的构建,可以从历史数据不断探索并优化控制策略,同时由于探索的多样性,该算法还具有一定的鲁棒性和适应性。
3、本发明引入了多经验虚拟目标构建方法,引入目标空间优化和数据探索步骤,极大的提高了数据的探索效率,有效解决了多目标匝道控制中长期奖励稀疏的问题和长短奖励之间的耦合关系问题,保证了多目标匝道汇入的安全有效。
4、对比发明人所发表的论文,本发明的提升主要有:
1)区别于论文单纯针对新能源汽车面向电池健康问题的汇入方法,提出了适用于多种车辆(包括不限于传统燃油车、新能源汽车等)的混合匝道场景的汇入方法,且本发明提供了应用于现实高速公路汇入场景的关键技术环节。
2)本发明进一步优化了安全成功汇入时刻的车辆限制条件,同时,丰富了控制区域引导车辆、辅助车辆和汇入车辆的选择算法,与论文中简单的选择模式相比,本发明的选择方案可以获取最优的车辆选择方案,进一步提高了算法的最优解,使车辆的运行更为节能,高效、平稳。
3)本发明进一步优化了长期奖励和短期奖励的重要级,并为本发明针对不同实际情况的目标引入预留了接口,且本发明确定了短期奖励之间的系数关系,保证本发明可以适应多种不同目标的情况,进一步提升了本发明的适用范围。
4)本发明对文章提出的数据探索算法进一步优化,引入多经验虚拟目标构建方法,区别于论文简单的选取下一状态作为目标,进一步提高了数据的利用效率,提高数据探索的方向和正确性,极大的避免碰撞等危险情况的发生。
附图说明
图1是本发明提出的高速公路汇入场景示意图;
图2是本发明高速公路网联车协同匝道汇入多目标优化控制方法架构图。
具体实施方式
下面结合附图与具体实施方式对本发明作进一步详细描述:
本发明提出的方法基于假设如下:1)网联车辆需具备必要的信息传输、指令实现的能力,即通过车载单元具备信息交互的能力,且车辆完全执行控制指令;2)匝道控制区域包含主路和匝道交叉点、匝道汇入区域及主线和匝道部分路段,且控制区域的范围控制范围为路侧单元的通信范围;3)汇入区域的长度固定,即为d m=d max-d min;4)忽略信息传输、数据处理与计算、指令执行存在的延误,即假定各模块的运行的速度足够支撑系统运行;5)忽略车辆的横向运动及温度对车辆的影响。
本发明提出的高速公路网联车协同匝道汇入多目标优化控制系统,包括信息采集模块、数据传输模块、交通控制模块和智能优化模块,其中:
1)信息采集模块,通过车载单元和路侧单元实时采集控制区域内车辆的速度、位置、动力电池状态等状态信息及对应的时刻信息和匝道车辆汇入的开始时刻信息(汇入车辆的选定时刻),并对数据进行分析及处理,数据分析与处理步骤但不限于数据分析、特征提取、信息融合等;
信息采集模块所包含的车载单元和路侧单元信息采集配合如下:
①路侧单元负责交通状态信息的采集,例如区域内车辆的速度、位置以及匝道汇入车辆汇入的开始时刻;
②车载单元负责车辆相关参数信息的采集,例如发动机状态信息、电池状态信息、档位信息。本发明所设计网联汽车动力组成包括:燃油车、纯电动汽车、氢能源汽车及混合动力汽车。
2)数据传输模块,以第五代移动通信技术(5G)为主体信息传输通信方式,辅助WiFi/BT、DSRC等无线通信方式中的一种或多种实现数据在车载单元、路测单元和各个模块之间的传输;
3)交通控制模块,根据信息采集模块提供的交通状态信息获取实时最优的行为策略a、目标g及反馈r,并将策略发送至车载单元,实现车辆实时控制;同时,将交通状态信息集合打包发送至智能优化模块;
4)智能优化模块,包括训练和数据子模块。数据模块将交通控制模块传入的数据 进行存储,并应用人工智能数据探索方法获取更多的数据对进行存储;训练模块根据数据模块所提供的数据利用奖励函数对基于人工智能的匝道汇入多目标控制模型进行并将优化后的模型传输至交通控制模块。
控制区域包含主路和匝道交叉点,匝道汇入区域及主线和匝道部分路段,值得说明的是,控制区域的范围控制范围为路侧单元的通信范围,详见图1。
如图2所示,本发明提出的高速公路网联车协同匝道汇入多目标优化控制方法,包括以下步骤:
S01信息采集模块获取控制区域的车辆状态数据;
S02信息采集模块通过智能数据分析模型对采集数据进行数据分析、信息融合等处理;
S03根据控制区域内车辆的状态数据构建匝道汇入车辆、辅助车辆和引导车辆的可选方案集合AL;
S04将可选方案集合分别输入基于人工智能的匝道汇入多目标控制模型中,通过最优价值策略进一步确定辅助车辆和引导车辆的选择;
S05传输模块将S04步骤选定车辆及车辆相关采集数据传输至控制模块;
S06交通控制模块确认指令发送的车辆,基于人工智能的匝道汇入多目标控制模型进行实施决策,所述基于人工智能的匝道汇入多目标控制模型的框架为强化学习参与者-评价者(Actor-Critic)算法;
S07传输模块将决策信息传输至被控车辆的车载单元,实现车辆安全、高效、节能匝道汇入目标;所述被控车辆为匝道汇入车辆和辅助车辆;
S08采集控制策略下的被控车辆状态信息数据对,并将被控车辆状态信息数据对存入数据子模块,形成闭环控制。
步骤S03-S04,所述的汇入车辆、辅助车辆和引导车辆的选择方法依照最优价值策略算法构架,如下:
1)汇入车辆的选择:将匝道上前保险杆和匝道出口线距离最近的车辆设为汇入车辆,并获取控制区域所有车辆前t个时间步长的状态信息,例如速度、位置及加速度;
2)根据所有车辆的位置信息,依据主线车辆与汇入车辆的前后关系,初步选择汇入车辆后方和前方的各z辆主路车辆作为辅助车和引导车的备选车辆,其中z为正整数且小于等于5;
3)在所有备选车辆中,选取邻近的两辆车作为一组引导车和辅助车,构建该汇入车辆的引导车和辅助车的备选集合AL。
4)采用遍历法将备选集合AL中的组合分别代入基于人工智能的匝道汇入多目标控制模型中,依据模型的值函数Q π确定最终选取的汇入车辆、辅助车辆和引导车辆。
5)基于人工智能的匝道汇入多目标控制模型根据选定的引导车辆对汇入车辆、辅助车辆的加速度控制调节,求解多目标下的最优控制,在实现汇入车辆成功汇入主线的同时,保证汇入车辆的能耗优化和道路的高效、安全通行。具体包括问题数学模型构建及基于强化学习的优化求解。
问题数学模型构建如下:
1)设定安全汇入时刻为
Figure PCTCN2022102755-appb-000038
,构建安全汇入时刻引导车辆、辅助车辆和匝道汇入车 辆需满足的位置和速度关系:
Figure PCTCN2022102755-appb-000039
Figure PCTCN2022102755-appb-000040
Figure PCTCN2022102755-appb-000041
Figure PCTCN2022102755-appb-000042
Figure PCTCN2022102755-appb-000043
式中,x l,v l和a l引导车辆的位置、速度和加速度;x f,v f和a f表示辅助车辆的位置、速度和加速度;x m,v m和a m表示匝道汇入车辆的位置、速度和加速度;τ为恒定的时间间隔,L1为车辆的长度,s 0为停顿间隙,d min和d max分别为汇入区域的起点和终点,汇入区域的长度为d max-d min;公式从上到下依次表示匝道汇入车辆在引导车辆后面、匝道汇入车辆在辅助车辆前面、匝道汇入车辆和引导车辆速度一致、匝道汇入车辆和辅助车辆速度一致以及匝道汇入车辆在选定的汇入区域内从加速车道安全汇入到主路中;
2)在满足1)的条件下,进一步构建包含但不限于驾驶舒适性、车辆能耗、通行效率等目标的目标函数C如下:
Figure PCTCN2022102755-appb-000044
式中,
Figure PCTCN2022102755-appb-000045
表示不同目标的代价函数,c n表示参数。
作为本发明方法的优选方案,所述基于人工智能的匝道汇入多目标控制模型采用强化学习参与者-评价者算法Actor-Critic进行求解,具体过程如下:
1)状态空间
Figure PCTCN2022102755-appb-000046
及行为空间
Figure PCTCN2022102755-appb-000047
建立:根据引导车辆、辅助车辆和匝道汇入车辆的状态数据选择六维状态信息s={x l,x m,x f,v l,v m,v f}表示环境中最相关的影响因素,
Figure PCTCN2022102755-appb-000048
依据控制对象选择控制行为策略
Figure PCTCN2022102755-appb-000049
2)最优目标建立:根据安全汇入时刻
Figure PCTCN2022102755-appb-000050
时的车辆限制条件关系,构建匝道汇入最优目标集合
Figure PCTCN2022102755-appb-000051
其中,
Figure PCTCN2022102755-appb-000052
为目标空间的集合,
Figure PCTCN2022102755-appb-000053
表示满足公式(1)中汇入车辆在引导车辆后面,
Figure PCTCN2022102755-appb-000054
表示满足公式(2)中匝道汇入车辆在辅助车辆前面,
Figure PCTCN2022102755-appb-000055
表示满足公式(3)中匝道汇入车辆和引导车辆速度一致,
Figure PCTCN2022102755-appb-000056
表示满足公式(4)中匝道汇入车辆和辅助车辆速度一致,
Figure PCTCN2022102755-appb-000057
表示满足公式(5)中匝道汇入车辆在选定的汇入区域内从加速车道安全汇入到主路中。
3)目标空间构建:依据2)最优目标空间集合
Figure PCTCN2022102755-appb-000058
所包含的分类,建立目标空间集合
Figure PCTCN2022102755-appb-000059
满足g={g 1,g 2,g 3,g 4,g 5},g 1表示匝道汇入车辆与引导车辆的位置关系,g 2表示匝道汇入车辆和辅助车辆的位置关系,g 3表示匝道汇入车辆和引导车 辆速度关系,g 4表示匝道汇入车辆和辅助车辆速度关系,g 5表示匝道汇入车辆的位置与合并区域的关系。
4)奖励构建:奖励函数为
Figure PCTCN2022102755-appb-000060
每个时间步长的奖励r(s,a,g *)在包括长期目标汇入奖励R m(t)的情况下,可根据安全、高效及舒适等行驶要求引入多种短期目标奖励。
①必须包含的长期目标汇入奖励R m(t)表示如下:
Figure PCTCN2022102755-appb-000061
②其他目标可以包括不限于节能、平稳、舒适及高效等,构建可参考:
a.行驶能耗奖励R e(t)构建:
考虑电池效率与能耗,用安培-时间积分法构建车辆电荷状态(SOC)的下降关系模型:
Figure PCTCN2022102755-appb-000062
其中,V oc为开路电压,R int是电阻,P b(t)是t时刻的电池功率,Q c为电池的容量。
因此,节能奖励包括匝道汇入车辆和辅助车辆的SOC情况,如下式:
R e(t)=ΔSOC m(t)+ΔSOC f(t)      (9)
b.电池寿命奖励R b(t)构建:根据
Figure PCTCN2022102755-appb-000063
其中,N为电池循环次数,E 0(0)为标准电池容量能量。
因此,电池寿命奖励为匝道汇入车辆和辅助车辆的SOH情况,如下式:
R b(t)=ΔSOH m(t)+ΔSOH f(t)     (11)
c.平稳性奖励构建R s(t):在不考虑车辆横向移动的情况下,平稳性奖励可以看作实际加速度的变化情况,最大加速度a max=3m/s 2,具体模型如下:
Figure PCTCN2022102755-appb-000064
d.舒适性奖励构建R p(t):为了减少合并车辆的颠簸,提高乘客的舒适度,j max表 示可承受的最大颠簸值,单位取m/s 3,则模型如下:
Figure PCTCN2022102755-appb-000065
基于耦合切比雪夫的多目标奖励优化方法,具体步骤如下:
a.确定优化奖励项:假设汇入成功后引入多种实时短期目标的数量为n r个,则汇入问题的优化奖励项为n r个;
b.确定各奖励的朝理想最优值:构建各个目标的超理想最优值
Figure PCTCN2022102755-appb-000066
其中
Figure PCTCN2022102755-appb-000067
为理想值,根据经验数据选择,
Figure PCTCN2022102755-appb-000068
为一常数,表示超理想最优值比理想值好的程度;
c.构建多目标问题的广义加权切比雪夫最优化模型:设λ i为短期目标的切比雪夫权重,则可得多目标问题的转化为广义加权切比雪夫但目标问题,如下式:
Figure PCTCN2022102755-appb-000069
Figure PCTCN2022102755-appb-000070
其中,F c为加权的切比雪夫范数,
Figure PCTCN2022102755-appb-000071
代表各个目标与超理想最优值之间的最大偏差;
Figure PCTCN2022102755-appb-000072
为保证算法稳定的项,通常ρ=0.001;λ i的计算公式参考如下:
Figure PCTCN2022102755-appb-000073
④奖励函数构建为:
Figure PCTCN2022102755-appb-000074
5)数据链的获取:根据1)至4)获取的第t个时间步长的状态、目标、策略和奖励数据可得到数据链(s t||g *a,r,s t+1||g *)并将数据存储入智能优化模块,其中s||g *表示状态s和目标g *的连接,在没有完成g *(R m(t)取值等于1,安全成功汇入),r不包含短期 目标R e(t),R s(t)等。
6)数据探索拓展及目标空间优化:根据4)和5)可以发现,满足g *的数据是很难获取的,因为g *是一个固定的最终目标,对于实时的奖励的指导意义是十分有限的。因此,进一步提出基于多经验重放的虚拟目标构建算法,进行虚拟目标的引入,在对目标空间优化的同时实现数据探索的扩充。所提出多经验重放的虚拟目标构建算法步骤如下:
①t时间步长下的多经验虚拟目标构建及目标空间优化:依据t时间步长下的目标g *和数据链(s t||g *,a,r,s t+1||g *),构建以
Figure PCTCN2022102755-appb-000075
为参数的全连接神经网络
Figure PCTCN2022102755-appb-000076
获取当前状态下的l个虚拟目标为:
Figure PCTCN2022102755-appb-000077
②基于优化目标空间的数据探索优化:依据获取的l个虚拟目标构建l个虚拟目标数据链:
(s t||g 1,a,r,s t+1||g 1),(s t||g 2,a,r,s t+1||g 2),...,(s t||g l,a,r,s t+1||g l)。
并将虚拟目标数据链存储入智能优化模块;
③全时间步长数据探索优化:对每个时间步长重复①和②,完成所有时间步长下的数据探索及目标空间优化;
④基于人工智能模型的虚拟目标有效性选择:根据优化后的数据集合进行人工智能模型的训练,并根据训练结果选取每个时间步长下的最优虚拟目标数据链;
⑤虚拟目标全连接神经网络H的校核:根据各个时长下最优虚拟目标数据链对虚拟目标全连接神经网络H的参数
Figure PCTCN2022102755-appb-000078
进行校验,不断提升虚拟目标生成的准确性,提升算法训练速度。
7)在每个时间步骤中,根据智能优化模块存储的数据链,基于Actor-Critic框架,通过以θ A为参数的深度神经网络来训练汇入控制策略,该策略直接输出动作来控制匝道汇入车辆和辅助车辆的加速度值与状态和目标输入,策略优化的目标是找到最优的行为策略a,使整个行程的回报期望最大化,具体步骤如下:
①根据Bellman函数,构建最优价值函数表示为:
Figure PCTCN2022102755-appb-000079
式中,Q π为值函数,γ为折损因子,s,a,g分别为当前时间步的状态、行为策略、目标,s′,a′,g′分别为下一个时间步的状态、行为策略、目标;
②根据①,标准时间差更新方程为:
Figure PCTCN2022102755-appb-000080
式中,ζ为学习率;
②利用以θ C为参数的Critic网络根据公式(16)更新规则估计Q值,构建批评网络的损失函数如下:
J CC)=(Q π(s,a,g|θ C)-(r+γQ π(s′,g′π(s′,g′|θ A)|θ C))) 2      (20)
式中,为损失函数,θ C为Critic网络参数;
③利用随机梯度下降算法通过最小化损失函数来更新临界网络参数,如下:
Figure PCTCN2022102755-appb-000081
式中,
Figure PCTCN2022102755-appb-000082
表示梯度;
④该驾驶状态下的最优控制策略通过经过训练的网络的前向传递输出:
a=π(s,g|θ A)   (23)
8)由此可以获得最优控制策略,从而通过各个模块实现面向高速公路场景的网联车辆的协同匝道汇入多目标优化控制方法及系统的最优控制,实现匝道区域高效、安全、节能驾驶。
以上所述,仅是本发明的较佳实施例而已,并非是对本发明作任何其他形式的限制,而依据本发明的技术实质所作的任何修改或等同变化,仍属于本发明所要求保护的范围。

Claims (10)

  1. 高速公路网联车协同匝道汇入多目标优化控制方法,其特征在于,包括如下步骤:
    步骤1,采集高速公路控制区域内车辆的状态数据,并对状态数据进行分析和处理;所述高速公路控制区域包括高速公路的主路、匝道交叉点、汇入区域、主路部分路段、匝道部分路段以及加速车道部分路段,高速公路控制区域的范围即路侧单元的通信范围,所述路侧单元设置于高速公路的主路和匝道交叉点处,所述汇入区域为预先选定的区域,包括加速车道的部分路段以及与加速车道的部分路段平行的路段;
    步骤2,根据控制区域内车辆的状态数据构建匝道汇入车辆、辅助车辆和引导车辆的可选方案集合;
    步骤3,将可选方案集合分别输入基于人工智能的匝道汇入多目标控制模型中,通过最优价值策略进一步确定辅助车辆和引导车辆的选择;
    步骤4,根据选择的辅助车、引导车和汇入车辆,对辅助车和引导车进行加速度进行控制调节,确保匝道汇入车辆在选定的汇入区域内从加速车道安全汇入到主路中;
    步骤5,采集进行加速度调节后的匝道汇入车辆和辅助车辆的状态数据,并返回步骤4,进行下一时刻的加速度调节。
  2. 根据权利要求书1所述高速公路网联车协同匝道汇入多目标优化控制方法,其特征在于:所述步骤1中,车辆的状态数据包括控制区域内车辆的位置、速度、动力电池状态以及相应的时刻信息。
  3. 根据权利要求书1所述高速公路网联车协同匝道汇入多目标优化控制方法,其特征在于:所述步骤1中,对状态数据进行分析和处理,包括数据分析、特征提取和信息融合。
  4. 根据权利要求1所述的高速公路网联车协同匝道汇入多目标优化控制方法,其特征在于,所述步骤2-3中,对于汇入车辆、引导车辆和辅助车辆的选择的最优价值策略如下:
    4.1汇入车辆的选择:将匝道上前保险杆和匝道出口线距离最近的车辆设为汇入车辆,并获取高速公路控制区域所有车辆前t个时间步长的状态信息;
    4.2根据所有车辆的位置信息,依据主线车辆与汇入车辆的前后关系,初步选择汇入车辆后方和前方的各z辆主路车辆作为辅助车和引导车的备选车辆,其中z为正整数且小于等于5;
    4.3在所有备选车辆中,选取邻近的两辆车作为一组引导车和辅助车,构建该汇入车辆的引导车和辅助车的备选集合AL;
    4.4采用遍历法将备选集合AL中的组合分别代入基于人工智能的匝道汇入多目标控制模型中,依据模型的值函数Q π确定最终选取的汇入车辆、辅助车辆和引导车辆。
  5. 根据权利要求4所述的高速公路网联车协同匝道汇入多目标优化控制方法,其特征在于,所述高速公路控制区域所有车辆前t个时间步长的状态信息包括速度、位置及加速度。
  6. 根据权利要求4所述的高速公路网联车协同匝道汇入多目标优化控制方法,其特征在于,所述对于汇入车辆、引导车辆和辅助车辆的选择的最优价值策略的步骤4中,基于人工智能的匝道汇入多目标控制模型,其成功汇入时刻的目标函数及限制条件的构建如下:
    6.1设定安全汇入时刻为
    Figure PCTCN2022102755-appb-100001
    构建安全汇入时刻引导车辆、辅助车辆和匝道汇入车辆需满足的位置和速度关系:
    Figure PCTCN2022102755-appb-100002
    Figure PCTCN2022102755-appb-100003
    Figure PCTCN2022102755-appb-100004
    Figure PCTCN2022102755-appb-100005
    Figure PCTCN2022102755-appb-100006
    式中,x l,v l和a l引导车辆的位置、速度和加速度;x f,v f和a f表示辅助车辆的位置、速度和加速度;x m,v m和a m表示匝道汇入车辆的位置、速度和加速度;τ为恒定的时间间隔,L 1为车辆的长度,s 0为停顿间隙,d min和d max分别为汇入区域的起点和终点,汇入区域的长度为d max-d min;公式从上到下依次表示匝道汇入车辆在引导车辆后面、匝道汇入车辆在辅助车辆前面、匝道汇入车辆和引导车辆速度一致、匝道汇入车辆和辅助车辆速度一致以及匝道汇入车辆在选定的汇入区域内从加速车道安全汇入到主路中;
    6.2在满足6.1的条件下,进一步构建包含但不限于驾驶舒适性、车辆能耗、通行效率等目标的目标函数C如下:
    Figure PCTCN2022102755-appb-100007
    式中,
    Figure PCTCN2022102755-appb-100008
    表示不同目标的代价函数,c n表示参数。
  7. 根据权利要求4所述的高速公路网联车协同匝道汇入多目标优化控制方法,其特征在于,所述对于汇入车辆、引导车辆和辅助车辆的选择的最优价值策略的步骤4中,基于人工智能的匝道汇入多目标控制模型,采用强化学习参与者-评价者算法进行求解,具体过程如下:
    7.1状态空间
    Figure PCTCN2022102755-appb-100009
    及行为空间
    Figure PCTCN2022102755-appb-100010
    建立:根据引导车辆、辅助车辆和匝道汇入车辆的状态数据选择六维状态信息s={x l,x m,x f,v l,v m,v f}表示环境中最相关的影响因素,
    Figure PCTCN2022102755-appb-100011
    依据控制对象选择控制行为策略
    Figure PCTCN2022102755-appb-100012
    7.2最优目标建立:根据安全汇入时刻
    Figure PCTCN2022102755-appb-100013
    时的车辆限制条件关系,构建匝道汇入最优目标集合
    Figure PCTCN2022102755-appb-100014
    其中,
    Figure PCTCN2022102755-appb-100015
    为目标空间的集合,
    Figure PCTCN2022102755-appb-100016
    表示满足公式(1)中汇入车辆在引导车辆后面,
    Figure PCTCN2022102755-appb-100017
    表示满足公式(2)中匝道汇入车辆在辅助车辆前面,
    Figure PCTCN2022102755-appb-100018
    表示满足公式(3)中匝道汇入车辆和引导车辆速度一致,
    Figure PCTCN2022102755-appb-100019
    表示满足公式(4)中匝道汇入车辆和辅助车辆速度一致,
    Figure PCTCN2022102755-appb-100020
    表示满足公式(5)中匝道汇入车辆在选定的汇入区域内从加速车道安全汇入到主路中;
    7.3目标空间构建:依据7.2最优目标空间集合
    Figure PCTCN2022102755-appb-100021
    所包含的分类,建立目标空间集合
    Figure PCTCN2022102755-appb-100022
    满足g={g 1,g 2,g 3,g 4,g 5},g 1表示匝道汇入车辆与引导车辆的位置关 系,g 2表示匝道汇入车辆和辅助车辆的位置关系,g 3表示匝道汇入车辆和引导车辆速度关系,g 4表示匝道汇入车辆和辅助车辆速度关系,g 5表示匝道汇入车辆的位置与合并区域的关系;
    7.4奖励构建:奖励函数为
    Figure PCTCN2022102755-appb-100023
    每个时间步长的奖励r(s,a,g *)在包括长期目标汇入奖励R m(t)的情况下,根据安全、高效及舒适行驶要求引入至少两种短期目标奖励,其中必须包含的长期目标汇入奖励R m(t)表示如下:
    Figure PCTCN2022102755-appb-100024
    7.5数据链的获取:根据7.1至7.3获取的第t个时间步长的状态、目标、策略和奖励数据得到数据链s t||g *,a,r,s t+1||g *并将数据存储入智能优化模块,其中s||g *表示状态s和目标g *的连接;
    7.6数据探索拓展及目标空间优化:进一步提出基于多经验重放的虚拟目标构建算法,进行虚拟目标的引入,在对目标空间优化的同时实现数据探索的扩充;
    7.7在每个时间步骤中,根据智能优化模块存储的数据链,基于参与者-评价者算法框架,通过以θ A为参数的深度神经网络来训练汇入控制策略,该策略直接输出动作来控制匝道汇入车辆和辅助车辆的加速度值与状态和目标输入,策略优化的目标是找到最优的行为策略a,使整个行程的回报期望最大化,最终,最优控制策略通过经过训练的网络的前向传递输出:a=π(s,g|θ A)。
  8. 根据权利要求7所述的高速公路网联车协同匝道汇入多目标优化控制方法,其特征在于,所述强化学习参与者-评价者算法奖励构建中短期目标奖励包括:节能奖励R e(t)舒适性奖励R p(t)、通行高效奖励R s(t)和电池状态奖励R b(t);
    所述强化学习参与者-评价者算法的基于耦合切比雪夫的多目标奖励优化方法具体步骤如下:
    确定优化奖励项:假设汇入成功后引入多种实时短期目标的数量为n r个,则汇入问题的优化奖励项为n r个;
    确定各奖励的朝理想最优值:构建各个目标的超理想最优值
    Figure PCTCN2022102755-appb-100025
    ,其中
    Figure PCTCN2022102755-appb-100026
    为理想值,根据经验数据选择,
    Figure PCTCN2022102755-appb-100027
    为一常数,表示超理想最优值比理想值好的程度;
    构建多目标问题的广义加权切比雪夫最优化模型:设λ i为短期目标的切比雪夫权重,则得多目标问题的转化为广义加权切比雪夫但目标问题,如下式:
    Figure PCTCN2022102755-appb-100028
    满足以下条件
    Figure PCTCN2022102755-appb-100029
    r i为无约束变量1≤i≤n r  (15)
    F c≥0
    其中,F c为加权的切比雪夫范数,
    Figure PCTCN2022102755-appb-100030
    代表各个目标与超理想最优值之间的最大偏差;
    Figure PCTCN2022102755-appb-100031
    为保证算法稳定的项,通常ρ=0.001;λ i的计算公式参考如下:
    Figure PCTCN2022102755-appb-100032
    奖励函数构建为:
    Figure PCTCN2022102755-appb-100033
  9. 根据权利要求7所述的高速公路网联车协同匝道汇入多目标优化控制方法,其特征在于,所述强化学习参与者-评价者算法奖励构建中所提出多经验重放的虚拟目标构建算法步骤如下:
    t时间步长下的多经验虚拟目标构建及目标空间优化:依据t时间步长下的目标g *和数据链(s t||g *,a,r,s t+1||g *),构建以
    Figure PCTCN2022102755-appb-100034
    为参数的全连接神经网络H,
    Figure PCTCN2022102755-appb-100035
    获取当前状态下的l个虚拟目标为:
    Figure PCTCN2022102755-appb-100036
    基于优化目标空间的数据探索优化:依据获取的l个虚拟目标构建l个虚拟目标数据链:(s t||g 1,a,r,s t+1||g 1),(s t||g 2,a,r,s t+1||g 2),...,(s t||g l,a,r,s t+1||g l);
    并将虚拟目标数据链存储入智能优化模块;
    全时间步长数据探索优化:对每个时间步长重复步长下的多经验虚拟目标构建及目标空间优化和基于优化目标空间的数据探索优化,完成所有时间步长下的数据探索及目标空间优化;
    基于人工智能模型的虚拟目标有效性选择:根据优化后的数据集合进行人工智能模型的训练,并根据训练结果选取每个时间步长下的最优虚拟目标数据链;
    虚拟目标全连接神经网络H的校核:根据各个时长下最优虚拟目标数据链对虚拟目标 全连接神经网络H的参数
    Figure PCTCN2022102755-appb-100037
    进行校验,不断提升虚拟目标生成的准确性,以保证算法性能和训练速度。
  10. 基于权利要求1-9任一项所述的高速公路网联车协同匝道汇入多目标优化控制方法的系统,包括信息采集模块、数据传输模块、交通控制模块以及智能优化模块,其特征在于,
    所述信息采集模块用于采集控制区域内车辆的状态数据,并对状态数据进行分析和处理,选定匝道汇入车辆、辅助车辆和引导车辆;
    所述信息采集模块包括车载单元和路侧单元,所述路侧单元设置于高速公路的主路和匝道交叉点处,所述路侧单元用于采集控制区域内车辆的位置、速度及相应的时刻信息,还用于采集匝道汇入车辆确定的时间及前保险杆到达匝道出口线的时间,所述车载单元用于采集控制区域内车辆的动力电池状态及相应的时刻信息;
    所述数据传输模块用于以移动通信技术为主体信息传输通信方式,辅助WiFi/BT、DSRC无线通信方式中的一种或两种实现数据在信息采集模块与交通控制模块、交通控制模块与智能优化模块之间的传输;
    所述交通控制模块用于根据信息采集模块提供的车辆状态数据获取实时最优的行为策略a、目标g及奖励r,并将行为策略发送至车载单元,实现车辆实时控制,同时将最优的行为策略a、目标g及奖励r发送至智能优化模块;
    所述智能优化模块用于存储所述交通控制模块传入的数据,并基于步骤3所选择的备选车辆集合AL和步骤4所提出的优化算法对匝道汇入多目标控制模型进行优化,并将优化后的模型传输至所述交通控制模块。
PCT/CN2022/102755 2022-02-23 2022-06-30 高速公路网联车协同匝道汇入多目标优化控制方法和系统 WO2023159841A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/112,541 US20230267829A1 (en) 2022-02-23 2023-02-22 Multi-objective optimization control method and system for cooperative ramp merging of connected vehicles on highway

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210164445.8 2022-02-23
CN202210164445.8A CN114241778B (zh) 2022-02-23 2022-02-23 高速公路网联车协同匝道汇入多目标优化控制方法和系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/112,541 Continuation US20230267829A1 (en) 2022-02-23 2023-02-22 Multi-objective optimization control method and system for cooperative ramp merging of connected vehicles on highway

Publications (1)

Publication Number Publication Date
WO2023159841A1 true WO2023159841A1 (zh) 2023-08-31

Family

ID=80747768

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/102755 WO2023159841A1 (zh) 2022-02-23 2022-06-30 高速公路网联车协同匝道汇入多目标优化控制方法和系统

Country Status (2)

Country Link
CN (1) CN114241778B (zh)
WO (1) WO2023159841A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117198082A (zh) * 2023-11-06 2023-12-08 北京理工大学前沿技术研究院 基于双层优化的车辆匝道汇入决策方法及系统
CN117238138A (zh) * 2023-09-26 2023-12-15 南京感动科技有限公司 一种高速公路枢纽节点渠化管控策略确定方法和系统
CN117975737A (zh) * 2024-04-02 2024-05-03 北京中交华安科技有限公司 一种面向公路交织区的车辆主动诱导和智能管控方法
CN118134209A (zh) * 2024-05-06 2024-06-04 江苏大块头智驾科技有限公司 一种智慧港矿一体化管控与调度系统及方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241778B (zh) * 2022-02-23 2022-05-17 东南大学 高速公路网联车协同匝道汇入多目标优化控制方法和系统
CN114973650B (zh) * 2022-04-13 2023-05-23 东南大学 车辆匝道入口合流控制方法、车辆、电子设备及存储介质
CN115188204B (zh) * 2022-06-29 2023-08-15 东南大学 一种异常天气条件下高速公路车道级可变限速控制方法
CN114863689B (zh) * 2022-07-08 2022-09-30 中汽研(天津)汽车工程研究院有限公司 一种上下匝道行为场景数据采集、识别与提取方法和系统
CN114999160B (zh) * 2022-07-18 2022-10-21 四川省公路规划勘察设计研究院有限公司 一种基于车路协同道路的车辆安全合流控制方法及系统
CN115171388A (zh) * 2022-07-20 2022-10-11 辽宁工程技术大学 一种智能网联车的多交叉口旅行时间协同优化方法
CN115578865B (zh) * 2022-09-28 2023-08-29 东南大学 一种基于人工智能的自动驾驶车辆汇入间隙选择优化方法
CN115909780B (zh) * 2022-11-09 2023-07-21 江苏大学 基于智能网联与rbf神经网络的高速路汇入控制系统与方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104464317A (zh) * 2014-12-03 2015-03-25 武汉理工大学 高速公路入口匝道合流区引导控制系统和方法
CN111091721A (zh) * 2019-12-23 2020-05-01 清华大学 一种面向智慧车列交通系统的匝道合流控制方法及系统
CN112233413A (zh) * 2020-07-20 2021-01-15 北方工业大学 一种面向智能网联车辆的多车道时空轨迹优化方法
CN112977477A (zh) * 2021-02-26 2021-06-18 江苏大学 一种基于神经网络的混合车车协同汇流系统和方法
CN114241778A (zh) * 2022-02-23 2022-03-25 东南大学 高速公路网联车协同匝道汇入多目标优化控制方法和系统

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106157650A (zh) * 2016-07-11 2016-11-23 东南大学 一种基于强化学习可变限速控制的快速道路通行效率改善方法
IL288191B2 (en) * 2016-12-23 2023-10-01 Mobileye Vision Technologies Ltd A navigation system with forced commitment constraints
WO2020014540A1 (en) * 2018-07-13 2020-01-16 Deepdivebio, Inc. Thermocycler reaction control
US10940863B2 (en) * 2018-11-01 2021-03-09 GM Global Technology Operations LLC Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle
US20200160411A1 (en) * 2018-11-16 2020-05-21 Mitsubishi Electric Research Laboratories, Inc. Methods and Systems for Optimal Joint Bidding and Pricing of Load Serving Entity
CN112289044B (zh) * 2020-11-02 2021-09-07 南京信息工程大学 基于深度强化学习的高速公路道路协同控制系统及方法
CN112700642B (zh) * 2020-12-19 2022-09-23 北京工业大学 一种利用智能网联车辆提高交通通行效率的方法
CN113744527B (zh) * 2021-08-31 2022-07-12 北京航空航天大学 一种面向高速公路合流区的智能靶向疏堵方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104464317A (zh) * 2014-12-03 2015-03-25 武汉理工大学 高速公路入口匝道合流区引导控制系统和方法
CN111091721A (zh) * 2019-12-23 2020-05-01 清华大学 一种面向智慧车列交通系统的匝道合流控制方法及系统
CN112233413A (zh) * 2020-07-20 2021-01-15 北方工业大学 一种面向智能网联车辆的多车道时空轨迹优化方法
CN112977477A (zh) * 2021-02-26 2021-06-18 江苏大学 一种基于神经网络的混合车车协同汇流系统和方法
CN114241778A (zh) * 2022-02-23 2022-03-25 东南大学 高速公路网联车协同匝道汇入多目标优化控制方法和系统

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117238138A (zh) * 2023-09-26 2023-12-15 南京感动科技有限公司 一种高速公路枢纽节点渠化管控策略确定方法和系统
CN117238138B (zh) * 2023-09-26 2024-05-03 南京感动科技有限公司 一种高速公路枢纽节点渠化管控策略确定方法和系统
CN117198082A (zh) * 2023-11-06 2023-12-08 北京理工大学前沿技术研究院 基于双层优化的车辆匝道汇入决策方法及系统
CN117198082B (zh) * 2023-11-06 2024-04-05 北京理工大学前沿技术研究院 基于双层优化的车辆匝道汇入决策方法及系统
CN117975737A (zh) * 2024-04-02 2024-05-03 北京中交华安科技有限公司 一种面向公路交织区的车辆主动诱导和智能管控方法
CN117975737B (zh) * 2024-04-02 2024-05-31 北京中交华安科技有限公司 一种面向公路交织区的车辆主动诱导和智能管控方法
CN118134209A (zh) * 2024-05-06 2024-06-04 江苏大块头智驾科技有限公司 一种智慧港矿一体化管控与调度系统及方法

Also Published As

Publication number Publication date
CN114241778A (zh) 2022-03-25
CN114241778B (zh) 2022-05-17

Similar Documents

Publication Publication Date Title
WO2023159841A1 (zh) 高速公路网联车协同匝道汇入多目标优化控制方法和系统
Bi et al. GIS aided sustainable urban road management with a unifying queueing and neural network model
US11205124B1 (en) Method and system for controlling heavy-haul train based on reinforcement learning
Qu et al. Jointly dampening traffic oscillations and improving energy consumption with electric, connected and automated vehicles: A reinforcement learning based approach
Zhang et al. Eco-driving control for connected and automated electric vehicles at signalized intersections with wireless charging
US20230267829A1 (en) Multi-objective optimization control method and system for cooperative ramp merging of connected vehicles on highway
Yan et al. Hierarchical predictive energy management of fuel cell buses with launch control integrating traffic information
CN112339756A (zh) 一种基于强化学习的新能源汽车红绿灯路口能量回收优化速度规划算法
Yang et al. Reinforcement learning-based real-time intelligent energy management for hybrid electric vehicles in a model predictive control framework
Yan et al. Design of a deep inference framework for required power forecasting and predictive control on a hybrid electric mining truck
Mei et al. A deep reinforcement learning approach to energy management control with connected information for hybrid electric vehicles
Tong et al. Speed planning for connected electric buses based on battery capacity loss
Fu et al. Electric vehicle charging scheduling control strategy for the large-scale scenario with non-cooperative game-based multi-agent reinforcement learning
Wu et al. Integrated energy management of hybrid power supply based on short-term speed prediction
Jin et al. Energy-optimal speed control for connected electric buses considering passenger load
Zhang et al. Integrated velocity optimization and energy management strategy for hybrid electric vehicle platoon: A multi-agent reinforcement learning approach
CN112750298B (zh) 一种基于smdp和drl的货车编队动态资源分配方法
Zhang et al. Hierarchical eco-driving control strategy for connected automated fuel cell hybrid vehicles and scenario-/hardware-in-the loop validation
Chen et al. Integrated velocity optimization and energy management for FCHEV: An eco-driving approach based on deep reinforcement learning
Zhang et al. An optimal vehicle speed planning algorithm for regenerative braking at traffic lights intersections based on reinforcement learning
Dong et al. Battery-aware cooperative merging strategy of connected electric vehicles based on reinforcement learning with hindsight experience replay
Chang et al. An energy management strategy of deep reinforcement learning based on multi-agent architecture under self-generating conditions
Li et al. Speed planning for connected and automated vehicles in urban scenarios using deep reinforcement learning
Nie et al. Hierarchical optimization control strategy for intelligent fuel cell hybrid electric vehicles platoon in complex operation conditions
Shi et al. Learning eco-driving strategies from human driving trajectories

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22928111

Country of ref document: EP

Kind code of ref document: A1