CN115482659B

CN115482659B - An autonomous decision-making method for agents based on deep reinforcement learning

Info

Publication number: CN115482659B
Application number: CN202210992167.5A
Authority: CN
Inventors: 刘东升; 刘彦妮; 王黎明; 陈亚辉
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2023-09-19
Anticipated expiration: 2042-08-18
Also published as: JP7518993B2; JP2024028097A; CN115482659A

Abstract

本发明公开了一种基于深度强化学习的智能体自主决策方法，涉及智能体自主决策技术领域，通过针对处于禁标状态下的路口进行基础数据获取，得到所有比较对象的车牌号、右拐时间和车辆标识；比较对象存储T1时间后自动删除；之后对所有的潜在对象进行初向筛选，根据实进对象的车牌号、进入时间和车辆标识，与比较对象的车牌号、右拐时间和车辆标识之间的比对，确定所有的初疑对象及其对应的间隔时间；再次对初疑对象进行二次审核，通过二次审核确定违规对象，便于执法部门处理，解决现有技术中不能针对该问题进行处理，造成安全隐患的问题。The invention discloses an autonomous decision-making method for intelligent agents based on deep reinforcement learning, which relates to the technical field of autonomous decision-making for intelligent agents. By acquiring basic data for intersections in a banned state, the license plate numbers and right-turn times of all comparison objects are obtained. and vehicle identification; the comparison object is automatically deleted after the T1 time is stored; then all potential objects are initially screened, based on the actual entry object's license plate number, entry time and vehicle identification, and the comparison object's license plate number, right turn time and vehicle Compare the signs to determine all the initial suspects and their corresponding intervals; conduct a secondary review of the initial suspects again, and determine the violating objects through the secondary review, which facilitates the processing of law enforcement departments and solves the problem that cannot be targeted in the existing technology. This issue is addressed and poses a security risk.

Description

An autonomous decision-making method for agents based on deep reinforcement learning

技术领域Technical field

本发明属于自主决策技术领域，具体是一种基于深度强化学习的智能体自主决策方法。The invention belongs to the field of autonomous decision-making technology, and is specifically an autonomous decision-making method for intelligent agents based on deep reinforcement learning.

背景技术Background technique

专利号为CN111833597A的专利公开了具有规划控制的交通情形中的自主决策。提出用于生成自主运载工具的在交通场景中的操纵决策的控制设备。控制设备包括第一模块，第一模块包括经训练的自学习模型该第一模块被配置为接收包括关于自主运载工具的周围环境的信息的数据，借助于经训练的自学习模型基于接收的数据来确定要由自主运载工具执行的动作。控制设备包括第二模块，第二模块被配置为接收所确定的动作、接收包括关于有限时间范围期间自主运载工具的周围环境的信息的数据、预测有限时间范围的第一时间段的环境状态、基于所接收的有限时间范围的动作以及第一时间段的环境状态来确定自主运载工具的轨迹、发送信号以便根据所确定的轨迹来在第一时间段期间控制自主运载工具。The patent number CN111833597A discloses autonomous decision-making in traffic situations with planned control. A control device for generating steering decisions for autonomous vehicles in traffic scenarios is proposed. The control device includes a first module including a trained self-learning model, the first module being configured to receive data including information about a surrounding environment of the autonomous vehicle, based on the received data by means of the trained self-learning model. to determine the actions to be performed by the autonomous vehicle. The control device includes a second module configured to receive the determined action, receive data including information about an environment surrounding the autonomous vehicle during a limited time range, predict an environmental state for a first time period of the limited time range, A trajectory of the autonomous vehicle is determined based on the received limited time range actions and the environmental state of the first time period, and a signal is sent to control the autonomous vehicle during the first time period based on the determined trajectory.

但是，对于该专利来说，针对交通场景的自主决策，还缺少了重要一环，现有路况行驶过程中，经常存在一些在直行红灯的情况下，通过提前右拐，之后掉头后再次右拐的方式规避红灯，当然正常情况下，这类行驶方式是被允许的，但是现有中经常存在在右拐后违规掉头的情况，从而造成一些安全隐患，为了解决这一问题，现提供一种解决方案。However, this patent still lacks an important link in autonomous decision-making in traffic scenarios. During driving on existing road conditions, there are often situations where people turn right in advance when going straight at a red light, and then turn right again. Turn to avoid the red light. Of course, under normal circumstances, this type of driving is allowed. However, there are often illegal U-turns after turning right, which causes some safety hazards. In order to solve this problem, we now provide A solution.

发明内容Contents of the invention

本发明旨在至少解决现有技术中存在的技术问题之一；为此，本发明提出了一种基于深度强化学习的智能体自主决策方法。The present invention aims to solve at least one of the technical problems existing in the prior art; to this end, the present invention proposes an intelligent agent autonomous decision-making method based on deep reinforcement learning.

为实现上述目的，根据本发明的第一方面的实施例提出一种基于深度强化学习的智能体自主决策方法，该方法具体包括下述步骤：In order to achieve the above object, according to the first embodiment of the present invention, a method for autonomous decision-making of an agent based on deep reinforcement learning is proposed. The method specifically includes the following steps:

步骤一：针对处于禁标状态下的路口进行基础数据获取，具体方式为：Step 1: Obtain basic data for intersections that are in the banned state. The specific method is:

获取到标的对象指挥的路口处在禁标状态下，所有向右拐弯车辆图片，将其标记为比较对象；Obtain the pictures of all vehicles turning right when the intersection commanded by the target object is in a prohibited state, and mark them as comparison objects;

同步自动获取到所有比较对象的车牌号、右拐时间和车辆标识；比较对象存储T1时间后自动删除；Synchronously and automatically obtain the license plate number, right turn time and vehicle identification of all comparison objects; the comparison objects are automatically deleted after storing the T1 time;

步骤二：之后对所有的潜在对象进行初向筛选，根据实进对象的车牌号、进入时间和车辆标识，与比较对象的车牌号、右拐时间和车辆标识之间的比对，确定所有的初疑对象及其对应的间隔时间；Step 2: Then conduct an initial screening of all potential targets. Based on the comparison between the license plate number, entry time and vehicle identification of the actual object and the license plate number, right turn time and vehicle identification of the comparison object, all potential targets are determined. The initial suspect and the corresponding interval;

步骤三：对初疑对象进行二次审核，二次审核具体方式为：Step 3: Conduct a second review of the initial suspect. The specific method of the second review is:

S1：获取到所有的初疑对象及其对应的间隔时间；S1: Obtain all initial suspects and their corresponding intervals;

S2：之后获取到初疑对象右拐进入标的对象指挥道路的拐前道路，拐前道路即为即为初疑对象在右拐进入标的对象时所处的道路，获取到拐前道路的对象车道，将其标记为插过车道；S2: Then obtain the road before the turn when the initial suspect turns right into the target object command road. The road before the turn is the road where the initial suspect turned right when entering the target object. Obtain the target lane of the road before the turn. , marking it as crossing the lane;

S3：获取到标的对象指挥道路的右拐弯进入插过车道的路口位置，到达插过车道掉头点位置的距离，将该距离标记为掉头距离，掉头点指代为车辆可在该处掉头；S3: Obtain the distance from the right turn of the target object's command road to the intersection where the vehicle crosses the lane, and then to the U-turn point of the crossing lane. Mark this distance as the U-turn distance, and the U-turn point refers to where the vehicle can make a U-turn;

S4：之后获取到掉头点从拐前道路到达标的对象指挥道路的距离，将该距离标记为规内距离；S4: Then obtain the distance of the U-turn point from the road before the turn to the target command road, and mark this distance as the within-regulation distance;

S5：获取到插过车道的限速值一，将掉头距离除以限速值一得到掉头单时；S5: Obtain the speed limit value 1 of the crossing lane, divide the U-turn distance by the speed limit value 1 to obtain the U-turn order time;

之后获取到拐前道路的限速值二，将规内距离除以该限速值二得到掉头二时；将掉头单时加上掉头二时，得到短限耗时；Then obtain the speed limit value 2 of the road before the turn, divide the distance within the regulation by the speed limit value 2 to get the U-turn time of 2; add the U-turn time to the U-turn time of 2 to get the short-limit time;

S6：将间隔时间超过短限耗时的初疑对象标记为投机对象；S6: Mark the initial suspect whose interval exceeds the short time limit as a speculative object;

S7：得到所有投机对象。S7: Get all speculation objects.

与现有技术相比，本发明的有益效果是：Compared with the prior art, the beneficial effects of the present invention are:

本发明通过针对处于禁标状态下的路口进行基础数据获取，得到所有比较对象的车牌号、右拐时间和车辆标识；比较对象存储T1时间后自动删除；之后对所有的潜在对象进行初向筛选，根据实进对象的车牌号、进入时间和车辆标识，与比较对象的车牌号、右拐时间和车辆标识之间的比对，确定所有的初疑对象及其对应的间隔时间；This invention obtains the license plate number, right turn time and vehicle identification of all comparison objects by acquiring basic data for intersections in a banned state; the comparison objects are automatically deleted after storing the T1 time; and then all potential objects are initially screened. , based on the comparison between the license plate number, entry time and vehicle identification of the actual object and the license plate number, right turn time and vehicle identification of the comparison object, determine all initial suspects and their corresponding intervals;

再次对初疑对象进行二次审核，通过二次审核确定违规对象，便于执法部门处理，解决现有技术中不能针对该问题进行处理，造成安全隐患的问题。Conduct a secondary review of the initially suspected object again, and determine the illegal object through the secondary review, which facilitates the processing of the law enforcement department and solves the problem that the existing technology cannot deal with the problem and causes potential safety hazards.

具体实施方式Detailed ways

下面将结合实施例对本发明的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。The technical solution of the present invention will be clearly and completely described below with reference to the embodiments. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

本申请提供了一种基于深度强化学习的智能体自主决策方法，作为本发明的实施例一，该方法具体包括下述步骤：This application provides a method for autonomous decision-making by an agent based on deep reinforcement learning. As the first embodiment of the present invention, the method specifically includes the following steps:

步骤一：同步标的对象的状态，当任意行驶方向处于禁标状态时，启动自主决策判定，其余状态下保持静默；此处的任意行驶方向即为对应该红绿灯负责指挥的道路可通行的方向；Step 1: Synchronize the status of the subject object. When any driving direction is in the prohibited state, start independent decision-making and remain silent in other states; any driving direction here is the passable direction of the road that the traffic light is responsible for commanding;

此处标的对象指代为红绿灯，标的对象的状态即为红绿灯的实时状态处于红、绿、黄的具体哪种；禁标状态即为直行车道对应的直行方向的车辆处于红灯禁止通行的状态；The target object here refers to the traffic light, and the status of the target object is the specific real-time status of the traffic light, which is red, green, or yellow; the prohibited status is that the vehicles in the straight direction corresponding to the through lane are in a red light and prohibited from passing;

步骤二：针对处于禁标状态下的路口进行基础数据获取，具体方式为：Step 2: Obtain basic data for intersections that are in the banned state. The specific method is:

获取到标的对象指挥的路口处在禁标状态下，所有向右拐弯车辆图片，将其标记为比较对象，同步自动获取到所有比较对象的车牌号、右拐时间和车辆标识；右拐时间即为对应比较对象在右拐的具体时间点；比较对象存储T1时间后自动删除，T1为预设数值，通常取值为十五分钟，当然可以设置为其他数值；Obtain the pictures of all vehicles turning right when the intersection directed by the target object is in the prohibited state, mark them as comparison objects, and automatically obtain the license plate numbers, right turn time and vehicle identification of all comparison objects simultaneously; the right turn time is It corresponds to the specific time point when the comparison object turns right; the comparison object stores the T1 time and is automatically deleted. T1 is a preset value, usually fifteen minutes, but of course it can be set to other values;

车辆标识指代车辆引擎盖颜色和车辆品牌，车辆品牌通过摄像头自动识别品牌可以得知；The vehicle logo refers to the color of the vehicle hood and the vehicle brand. The vehicle brand can be known through automatic camera recognition of the brand;

步骤三：之后对所有的潜在对象进行初向筛选，初向筛选具体方式为：Step 3: Then conduct an initial screening of all potential targets. The specific method of initial screening is:

S01：获取到通过右拐进入标的对象所指挥道路的所有车辆的图片，通过图像识别将对应车辆标记为实进对象，获取到所有的实进对象的车牌号、进入时间和车辆标识；进入时间即为实进对象进入标的对象所指挥道路的时间节点；S01: Obtain pictures of all vehicles that enter the road directed by the target object by turning right, mark the corresponding vehicles as actual entry objects through image recognition, and obtain the license plate numbers, entry time and vehicle identification of all actual entry objects; entry time That is the time node when the actual object enters the path directed by the target object;

S02：首先将实进对象与比较对象进行任意两两比对，比较实进对象与比较对象的车牌号和车辆标识是否一致，若比对到一致时，将一致的实进对象标记为初疑对象；S02: First, compare any pair of the actual entry object and the comparison object, and compare whether the license plate number and vehicle identification of the actual entry object and the comparison object are consistent. If the comparison is consistent, mark the consistent actual entry object as a preliminary suspect. object;

S03：同步根据初疑对象的进入时间和比较对象的右拐时间，得到间隔时间，此处间隔时间根据从进入时间到右拐时间的间隔的时间长度确定；S03: Synchronize and obtain the interval time based on the entry time of the initial suspect object and the right turn time of the comparison object, where the interval time is determined based on the length of the interval from the entry time to the right turn time;

S04：得到所有的初疑对象及其对应的间隔时间；S04: Obtain all initial suspects and their corresponding intervals;

步骤四：对初疑对象进行二次审核，二次审核具体方式为：Step 4: Conduct a second review of the initial suspect. The specific method of the second review is:

S2：之后获取到初疑对象右拐进入标的对象指挥道路的拐前道路，拐前道路即为即为初疑对象在右拐进入标的对象时所处的道路，获取到拐前道路的对象车道，将其标记为插过车道；此处的插过车道满足车辆可以通过标的对象指挥的道路右拐进入；S2: Then obtain the road before the turn when the initial suspect turns right into the target object command road. The road before the turn is the road where the initial suspect turned right when entering the target object. Obtain the target lane of the road before the turn. , mark it as a passing lane; the passing lane here satisfies the requirement that the vehicle can turn right into the road directed by the subject object;

S7：得到所有投机对象；S7: Get all speculation objects;

步骤五：将所有的投机对象及其图片传输到对应监管部门的智能终端端口，便于进行处理；此处的投机对象指代为用户在直行红灯时未等待从右拐路掉头后再次右拐，且未经合法位置掉头以此避过红绿灯的违法情况；Step 5: Transmit all speculation objects and their pictures to the smart terminal port of the corresponding regulatory department for easy processing; the speculation objects here refer to the user who did not wait to turn right when going straight on the red light and then turned right again. And it is illegal to make a U-turn without legal position to avoid traffic lights;

作为本发明的实施例二，在实施例一的基础上，本实施例的步骤四与实施例一略有不同，具体为：As Embodiment 2 of the present invention, based on Embodiment 1, step 4 of this embodiment is slightly different from Embodiment 1, specifically as follows:

之后获取到拐前道路的限速值二，将规内距离除以该限速值二得到掉头二时；将掉头单时加上掉头二时，得到短限耗时；此处若插过车道不存在掉头位置时则将短限耗时标记为0；Then get the speed limit value 2 of the road before the turn, divide the distance within the regulation by the speed limit value 2 to get the U-turn time of 2; add the U-turn time to the U-turn time of 2 to get the short time limit; if you cross the lane here When there is no U-turn position, the short time limit is marked as 0;

此处不存在掉头位置具体指代为：There is no U-turn position here, which specifically refers to:

沿着插过车道直行，在遇到第二个红绿灯之前且包括第二个红绿灯所在位置，不存在任何的掉头位置；Go straight along the passing lane, and there is no U-turn position before and including the second traffic light;

S7：得到所有投机对象；S7: Get all speculation objects;

当然作为本发明的实施例三，本发明在实施例一的基础上，进行完步骤五的处理后，还需进行下述步骤，具体方式为：Of course, as the third embodiment of the present invention, based on the first embodiment, after completing the processing of step five, the following steps need to be performed. The specific method is:

SS1：获取到所有的投机对象，以及被标记为投机对象的时间点，得到所有的投机对象和投机时间点；SS1: Obtain all speculation objects and the time points marked as speculation objects, and obtain all speculation objects and speculation time points;

SS2：获取到从距离当下靠近阶段所有的投机对象，以及其被标记为投机对象的次数，将次数低于X1的去除，X1为预设数值，一般取值为2；靠近阶段指代为从当下时间起，往前推算三个月得到；SS2: Obtain all the speculative objects in the approaching stage from the present time, and the number of times they are marked as speculative objects, and remove those whose times are lower than X1. Starting from the time, extrapolate three months forward to get the result;

SS3：将剩余的投机对象标记为惯性对象，对应被标记为投机对象的次数标记为惯性次数，得到所有的惯性对象和惯性次数；SS3: Mark the remaining speculative objects as inertia objects, and mark the number of times they are marked as speculative objects as inertia times, and obtain all inertia objects and inertia times;

SS4：任选一惯性对象，获取到其从第一次到最近一次被标记为投机对象的这个期间，每一次被标记为投机对象距离上次的间隔时间，将其标记为间投时段值，对应标记为Ji,i＝1...n,表示为存在n个间投时段值，也就是被标记为投机对象n+1次；SS4: Select an inertial object, obtain the period from the first time to the last time it was marked as a speculative object, and the interval between each time it was marked as a speculative object and the last time, and mark it as the inter-investment period value. The corresponding marks are Ji, i=1...n, which means that there are n inter-investment period values, that is, it is marked as a speculation object n+1 times;

SS5：之后自动计算Ji的均值，将其标记为P，根据公式计算Ji的聚合度D，具体计算公式为：SS5: Then automatically calculate the mean value of Ji, mark it as P, and calculate the degree of aggregation D of Ji according to the formula. The specific calculation formula is:

SS6：根据聚合度D定义翻倍值，当D小于X2时，翻倍值等于2；SS6: Define the doubling value based on the degree of polymerization D. When D is less than X2, the doubling value is equal to 2;

当X2≤D≤X3时，定义翻倍值为1.5；When X2≤D≤X3, define the doubling value as 1.5;

当D>X3时，定义翻倍值为1；When D>X3, define the doubling value as 1;

SS7：之后对所有的惯性对象进行步骤SS4-SS7相同处理，得到所有的惯性对象的翻倍值；SS7: Then perform the same processing of steps SS4-SS7 on all inertial objects to obtain the doubled value of all inertial objects;

SS8：利用公式计算所有惯性对象的衡量值，具体公式为：SS8: Use formulas to calculate the measurement values of all inertial objects. The specific formula is:

衡量值＝惯性次数×翻倍值；Measurement value = times of inertia × doubling value;

SS9：按照惯性对象的衡量值从大到小进行排序，将排名前百分之三十五的对应惯性对象标记为习惯标的；SS9: Sort the inertial objects according to their measurement values from large to small, and mark the top 35% corresponding inertial objects as habitual targets;

SS10：当任意习惯标的在实施例一步骤三中的处理被标记为初疑对象时，会同步将其标记为半投机对象，并将半投机对象传输到对应监管部门的智能终端端口，便于进行处理，当然此处半投机对象没有投机对象违法事实确定，可以进行轻微出发，若后续被标记为投机对象，则正常按照投机对象的方式进行处罚。SS10: When any customary object is marked as a preliminary suspect object in step three of the first embodiment, it will be marked as a semi-speculative object simultaneously, and the semi-speculative object will be transmitted to the smart terminal port of the corresponding regulatory department to facilitate the process. Of course, the semi-speculative object here has no illegal facts as a speculative object, so it can be treated lightly. If it is subsequently marked as a speculative object, it will be punished in the same way as a speculative object.

作为本发明的实施例四，具体实施时将实施例一到实施例三融合实施。As Embodiment 4 of the present invention, during specific implementation, Embodiment 1 to Embodiment 3 are integrated and implemented.

上述公式中的部分数据均是去除量纲取其数值计算，公式是由采集的大量数据经过软件模拟得到最接近真实情况的一个公式；公式中的预设参数和预设阈值由本领域的技术人员根据实际情况设定或者通过大量数据模拟获得。Some of the data in the above formula are calculated by removing the dimensions and taking their numerical values. The formula is a formula closest to the real situation obtained through software simulation of a large amount of collected data; the preset parameters and preset thresholds in the formula are determined by those skilled in the art. It is set according to the actual situation or obtained through a large amount of data simulation.

以上实施例仅用以说明本发明的技术方法而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方法进行修改或等同替换，而不脱离本发明技术方法的精神和范围。The above embodiments are only used to illustrate the technical methods of the present invention and are not limiting. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical methods of the present invention can be modified or equivalently substituted. without departing from the spirit and scope of the technical method of the present invention.

Claims

1. An intelligent agent autonomous decision-making method based on deep reinforcement learning, characterized in that the method specifically includes the following steps:

Step 1: Obtain basic data for intersections that are in the banned state. The specific method is:

Obtain the pictures of all vehicles turning right when the intersection commanded by the target object is in a prohibited state, and mark them as comparison objects;

Synchronously and automatically obtain the license plate number, right turn time and vehicle identification of all comparison objects; the comparison objects are automatically deleted after storing the T1 time;

Step 2: Then conduct an initial screening of all potential targets. Based on the comparison between the license plate number, entry time and vehicle identification of the actual object and the license plate number, right turn time and vehicle identification of the comparison object, all potential targets are determined. The initial suspect and the corresponding interval;

Step 3: Conduct a second review of the initial suspect. The specific method of the second review is:

S1: Obtain all initial suspects and their corresponding intervals;

S2: Then obtain the road before the turn when the initial suspect turns right into the target object command road. The road before the turn is the road where the initial suspect turned right when entering the target object. Obtain the target lane of the road before the turn. , marking it as crossing the lane;

S3: Obtain the distance from the right turn of the target object's command road to the intersection where the vehicle crosses the lane, and then to the U-turn point of the crossing lane. Mark this distance as the U-turn distance, and the U-turn point refers to where the vehicle can make a U-turn;

S4: Then obtain the distance of the U-turn point from the road before the turn to the target command road, and mark this distance as the within-regulation distance;

S5: Obtain the speed limit value 1 of the crossing lane, divide the U-turn distance by the speed limit value 1 to obtain the U-turn order time;

Then obtain the speed limit value 2 of the road before the turn, divide the distance within the regulation by the speed limit value 2 to get the U-turn time of 2; add the U-turn time to the U-turn time of 2 to get the short-limit time;

S6: Mark the initial suspect whose interval exceeds the short time limit as a speculative object;

S7: Get all speculation objects.

2. A method for autonomous decision-making of an agent based on deep reinforcement learning according to claim 1, characterized in that the following steps need to be performed before step one, specifically:

Synchronize the status of the target object. When the target object is in the banned status, autonomous decision-making is started, and it remains silent in other states.

3. An intelligent agent autonomous decision-making method based on deep reinforcement learning according to claim 2, characterized in that the target object refers to a traffic light; the prohibited sign state means that the vehicle in the straight direction corresponding to the straight lane is at a red light and is prohibited from passing. status.

4. An autonomous decision-making method for an agent based on deep reinforcement learning according to claim 1, characterized in that the right-turn time in step one is the specific time point when the corresponding comparison object turns right;

The vehicle logo refers to the vehicle hood color and vehicle brand; and T1 is a preset value.

5. An intelligent agent autonomous decision-making method based on deep reinforcement learning according to claim 1, characterized in that the specific method of initial screening in step two is:

S01: Obtain pictures of all vehicles that enter the road directed by the target object by turning right, and mark them as actual entry objects. Obtain the license plate numbers, entry time and vehicle identification of all actual entry objects; the entry time is the actual entry object. The time node when the incoming object enters the road directed by the target object;

S02: First, compare any pair of the actual entry object and the comparison object, and compare whether the license plate number and vehicle identification of the actual entry object and the comparison object are consistent. If the comparison is consistent, mark the consistent actual entry object as a preliminary suspect. object;

S03: Synchronize and obtain the interval time based on the entry time of the initial suspect object and the right turn time of the comparison object. The interval time is determined based on the length of the interval from the entry time to the right turn time;

S04: Obtain all initial suspects and their corresponding intervals.

6. An intelligent agent autonomous decision-making method based on deep reinforcement learning according to claim 1, characterized in that the lane insertion in step S2 satisfies the requirement that the vehicle can turn right into the road directed by the target object;

In step S5, if there is no U-turn position after crossing the lane, the short-term time elapsed time is marked as 0.

7. A method for autonomous decision-making of an agent based on deep reinforcement learning according to claim 1, characterized in that after completing the processing of step three, the following steps are required:

Transmit all speculation objects to the smart terminal port of the corresponding regulatory department.