CN111159832A

CN111159832A - Construction method and device of traffic information flow

Info

Publication number: CN111159832A
Application number: CN201811222416.2A
Authority: CN
Inventors: 张俊飞; 孙庆瑞; 毛继明; 董芳芳
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-10-19
Filing date: 2018-10-19
Publication date: 2020-05-15
Anticipated expiration: 2038-10-19
Also published as: CN111159832B

Abstract

The embodiment of the invention provides a method and a device for constructing a traffic information stream. The method comprises the following steps: loading a regional map of a simulation scene to be constructed; placing a plurality of agents in the area map; controlling each intelligent agent to operate in the regional map, and collecting the operating state of each intelligent agent; and carrying out reinforcement learning on the running state of each intelligent agent by using a traffic rule to obtain a traffic information stream of the simulation scene. The embodiment of the invention adopts a reinforcement learning method to train the traffic information flow of the simulation scene, and can provide enough and complex automatic driving simulation scenes, thereby ensuring that the simulation result of the automatic driving vehicle is more accurate.

Description

Construction method and device of traffic information flow

Technical Field

The invention relates to the technical field of automatic driving, in particular to a method and a device for constructing a traffic information stream.

Background

Statistically, unmanned vehicles need to travel 2 hundred million miles in a complete road network to ensure performance beyond human, which is difficult to achieve in practical scenarios. This goal can be accelerated by simulation. In simulation, road network traffic information is discretized to form a discrete scene, and at least a 3M (mega) scene is needed to achieve the aim according to the Bayesian theory. However, the 3M scene is required to completely characterize the scene distribution in the derivation.

If complete scene distribution is required to be obtained, driving can be carried out in a scene with more developed traffic information. When the simulation scene is constructed, the collected video of the real scene is played back frame by frame. For example, an obstacle vehicle appears at a certain position at a certain time. The host vehicle is then placed in a scene built from the field samples for simulation.

At this stage, it is extremely difficult to obtain a complete scene distribution set by sampling in the field. In addition, scenes obtained by using field sampling are generally more suitable for a collection place, and in other places, the scenes may be different, and the result obtained by simulation may be inaccurate.

Disclosure of Invention

The embodiment of the invention provides a method and a device for constructing a traffic information stream, which are used for solving one or more technical problems in the prior art.

In a first aspect, an embodiment of the present invention provides a method for constructing a traffic information stream, including:

loading a regional map of a simulation scene to be constructed;

placing a plurality of agents in the area map;

controlling each intelligent agent to operate in the regional map, and collecting the operating state of each intelligent agent;

and carrying out reinforcement learning on the running state of each intelligent agent by using a traffic rule to obtain a traffic information stream of the simulation scene.

In one embodiment, the performing reinforcement learning on the operation state of each of the agents by using traffic rules to obtain the traffic information flow of the simulation scene includes:

the traffic rules in the rule base are used for scoring the running state of each intelligent agent;

rewarding the agent if the agent's score is above a set threshold;

punishment is carried out on the agent if the score of the agent is lower than a set threshold value;

and adjusting the operation state of each agent according to the score and punishment result of each agent so that the score of each agent meets the set threshold value and is not punished.

In one embodiment, the method further comprises:

placing a host vehicle in the simulated scene, the host vehicle having an automatic driving system;

controlling the main vehicle to run in the simulation scene according to the running strategy of the main vehicle;

and judging the running state of the main vehicle, and adjusting the running strategy of the main vehicle by using the judgment result.

In one embodiment, the method further comprises:

and saving snapshot information of the driving of the host vehicle in the simulation scene.

In one embodiment, saving snapshot information of the host vehicle traveling in the simulated scene includes:

and if the main vehicle has an accident in the driving process in the simulation scene, storing snapshot information of the accident process.

In a second aspect, an embodiment of the present invention provides a device for constructing a traffic information stream, including:

the map loading module is used for loading a regional map of a simulation scene to be constructed;

an agent placement module for placing a plurality of agents in the map of the area;

the intelligent agent control module is used for controlling each intelligent agent to operate in the regional map and collecting the operation state of each intelligent agent;

and the reinforcement learning module is used for performing reinforcement learning on the running state of each intelligent agent by using traffic rules to obtain the traffic information flow of the simulation scene.

In one embodiment, the reinforcement learning module comprises:

the scoring submodule is used for scoring the running state of each intelligent agent by using the traffic rules in the rule base;

a reward submodule for rewarding the agent if the score of the agent is above a set threshold;

the punishment submodule is used for punishing the intelligent agent if the score of the intelligent agent is lower than a set threshold value;

and the adjusting submodule is used for adjusting the running state of each agent according to the scores and the reward punishment results of each agent, so that the scores of each agent meet the set threshold value and are not punished.

In one embodiment, the apparatus further comprises:

a tow vehicle placement module for placing a tow vehicle in the simulated scene, the tow vehicle having an automatic driving system;

the main vehicle control module is used for controlling the main vehicle to run in the simulation scene according to the running strategy of the main vehicle;

and the main vehicle adjusting module is used for judging the running state of the main vehicle and adjusting the running strategy of the main vehicle by using the judgment result.

In one embodiment, the apparatus further comprises:

and the snapshot module is used for saving snapshot information of the driving of the main vehicle in the simulation scene.

In one embodiment, the snapshot module is further configured to save snapshot information of an accident process if an accident occurs during the driving of the host vehicle in the simulation scene.

In a third aspect, an embodiment of the present invention provides a device for constructing a traffic information stream, where functions of the device may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-described functions.

In one possible design, the structure of the apparatus includes a processor and a memory, the memory is used for storing a program supporting the apparatus to execute the above construction method of the traffic information stream, and the processor is configured to execute the program stored in the memory. The apparatus may also include a communication interface for communicating with other devices or a communication network.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer software instructions for a traffic information stream construction apparatus, which includes a program for executing the above traffic information stream construction method.

One of the above technical solutions has the following advantages or beneficial effects: the traffic information flow of the simulation scene is trained by adopting a reinforcement learning method, enough and complicated automatic driving simulation scenes can be provided, and therefore the simulation result of the automatic driving vehicle is more accurate. The method is suitable for constructing the simulation scenes of the massive unmanned vehicles.

Another technical scheme in the above technical scheme has the following advantages or beneficial effects: after a stable simulation scene is obtained by training various intelligent agents in the constructed simulation scene, a main vehicle with an automatic driving system, namely an unmanned vehicle, is placed in the scene to operate, and then the strategy of the unmanned vehicle is adjusted, so that the simulation result of the unmanned vehicle is more accurate, and the scene is more widely adapted.

The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present invention will be readily apparent by reference to the drawings and following detailed description.

Drawings

In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.

Fig. 1 illustrates a flowchart of a construction method of a traffic information stream according to an embodiment of the present invention.

Fig. 2 shows a flowchart of a construction method of a traffic information stream according to an embodiment of the present invention.

Fig. 3 shows a flowchart of a construction method of a traffic information stream according to an embodiment of the present invention.

Fig. 4 is a schematic diagram showing an application example of the construction method of the traffic information stream according to the embodiment of the present invention.

Fig. 5 is a block diagram showing a construction of a traffic information stream constructing apparatus according to an embodiment of the present invention.

Fig. 6 is a block diagram showing a construction of a traffic information stream constructing apparatus according to an embodiment of the present invention.

Fig. 7 is a block diagram showing a construction of a traffic information stream constructing apparatus according to an embodiment of the present invention.

Fig. 8 is a block diagram showing a construction of a traffic information stream constructing apparatus according to an embodiment of the present invention.

Detailed Description

In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

Fig. 1 illustrates a flowchart of a construction method of a traffic information stream according to an embodiment of the present invention. As shown in fig. 1, the method for constructing the traffic information stream may include:

and step S11, loading the area map of the simulation scene to be constructed.

And step S12, placing a plurality of agents in the area map.

And step S13, controlling each intelligent agent to operate in the area map, and collecting the operation state of each intelligent agent.

And step S14, carrying out reinforcement learning on the operation state of each intelligent agent by using traffic rules to obtain the traffic information flow of the simulation scene.

When a certain simulation scene needs to be built, a regional map of the simulation scene needing to be built can be loaded firstly. In an autopilot simulation scenario, for example, a regional map may include fixed facilities such as buildings, streets, overpasses, bus stations, subway stations, etc. for a certain region. A desired area map can be acquired from an actual electronic map system to accurately reproduce the area.

At multiple locations of the regional map, multiple agents may be placed. The number and the position of the placed intelligent agents can be flexibly selected according to the actual requirements of the simulation scene. For example, the traffic monitoring video of an area in which an obstacle car and a pedestrian are placed may be referred to. Assuming that there are 10 pedestrians at an intersection in the video, 10 agents for representing pedestrians can be placed at the position of the intersection on the map. And setting attribute information such as speed, direction, acceleration and the like of the intelligent agent according to the advancing direction and speed of the pedestrian in the video.

In one embodiment, the agent may have the following characteristics:

autonomy (Autonomy) the Agent can automatically adjust the behavior and the state of the Agent according to the change of the external environment, and has the capacity of self-management and self-regulation.

Reactive Agent has the ability to respond to an external stimulus.

Initiative (active) an Agent has the ability to actively take action in response to changes in the external environment.

Social (Social) Agents have the ability to collaborate with other agents or people, and different agents can interact with other agents according to their respective intentions.

Evolution Agents can accumulate or learn experience and knowledge and modify their own behavior to adapt to new environments.

In one example, agents in an autopilot simulation scenario may include entities such as obstacles, pedestrians, traffic lights, and so forth. Each agent has its own attribute information, such as: size, position (x, y), velocity, acceleration, etc. The attribute information of each agent may have an initial value. In the training process, the attribute information of the agent changes along with the changes of the position, the speed, the acceleration and the like of the agent. The reinforcement learning is also a deep learning, and the intelligent agent continuously iterates the operation strategy thereof by the reinforcement learning method to strive for the maximum reward.

Reinforcement learning (called reinjection learning, evaluation learning, etc.) is an important machine learning method. Reinforcement learning methods include a variety of, for example: proximal Policy Optimization (PPO), or Q-learning (Q-learning), etc. Reinforcement learning includes a heuristic evaluation process, and an Agent can select an action to act on the surrounding environment in the simulation scenario. The environment receives the action and changes state, and generates an enhanced signal such as reward or punishment, which is fed back to the Agent. The Agent selects the next action again according to the reinforcement signal and the current state of the environment. The principle of selection is generally such that the probability of being subjected to positive reinforcement, i.e. reward, is increased.

In one embodiment, as shown in fig. 2, the reinforcement learning process of step S14 may include:

and step S141, scoring the running state of each agent by using the traffic rules in the rule base.

And S142, if the score of the agent is higher than a set threshold value, rewarding the agent.

And S143, if the score of the agent is lower than a set threshold, punishing the agent.

And S144, according to the scores and the reward punishment results of the intelligent bodies, the intelligent bodies self-adjust the operation strategy so that the scores of the intelligent bodies meet a set threshold value and are not punished. In this case, the simulation scene may be considered to be stable, and the traffic flow information of the simulation scene may be obtained while maintaining the state of each agent.

In the embodiment of the present invention, the score threshold for rewarding, the score threshold for punishment, and the score threshold for determining whether the simulation scene is stable may be the same or different, and are specifically set according to the needs of practical applications. The traffic rules in the rule base can be set according to different countries and regions. In the simulation, an appropriate traffic rule can be selected according to the regional map. In the reinforcement learning process, traffic rules can be used for scoring the states of the intelligent bodies, and further a reward and punishment mechanism is used for reward and punishment so as to stimulate each intelligent body to adjust the self operation mode. For example, if the agent is traveling in compliance with traffic regulations, a reward may be given if the score is above a threshold. As another example, if an agent violates a traffic rule, a penalty may be given if the score is below a threshold. Further, reward and punishment can be performed with reference to interaction between agents. For example, if a certain agent runs for 30 minutes, no collision occurs, and the score is above a threshold, a reward may be given. As another example, if an agent runs for 3 minutes and collides with another agent or a building, the score is below the threshold, and a penalty may be given.

In one embodiment, as shown in fig. 3, after the simulation scene is stabilized, the method further comprises:

and step S31, placing a main vehicle in the simulation scene, wherein the main vehicle is provided with an automatic driving system.

And step S32, controlling the main vehicle to run in the simulation scene according to the running strategy of the main vehicle.

And step S33, judging the running state of the host vehicle, and adjusting the running strategy of the host vehicle by using the judgment result.

A main vehicle with an automatic driving system, namely an unmanned vehicle, is placed in a stable simulation scene, and the running state of the main vehicle can be judged by utilizing traffic information flow in the simulation scene and traffic rules in a rule base. If the situation that the main vehicle collides with the intelligent body, violates the traffic rules and the like occurs in the operation process of the main vehicle, the operation state of the main vehicle is judged to be not good, and the operation strategy needs to be adjusted. Specifically, the developer can adjust the response of the host vehicle to the external environment and plan strategies such as changing speed, overtaking, changing lanes and the like with reference to the situation that the host vehicle collides with each intelligent body in the simulation scene, whether traffic regulations are violated or not and the like.

In one embodiment, as shown in fig. 3, the method further comprises:

and step S34, saving snapshot information of the driving of the host vehicle in the simulation scene.

In one embodiment, saving snapshot information of the host vehicle traveling in the simulated scene includes: and if the main vehicle has an accident in the driving process in the simulation scene, storing snapshot information of the accident process. For example, if the host vehicle and the agent collide, a video may be recorded for a period of time from the start to the end of the collision. These snapshot information may be referenced when adjusting the operating strategy of the host vehicle.

The embodiment of the invention can construct a deep traffic information flow which is stable day and night by means of a reinforcement learning method such as a neural network, simulate elements such as vehicles, pedestrians, weather, illumination and the like, and can simulate the extreme conditions of each element.

In an application example, as shown in fig. 4, an embodiment of the present invention may be adopted to include the following parts: the training field system comprises all agents (agents), wherein each Agent runs in the system; the traffic rule system can comprise an extensible rule base which is used for scoring each main body in the traffic information flow so as to excite the main body to adjust the operation mode; there is also a central transceiver system, which is responsible for collecting the states (states) of all agents.

Under the cooperation of the three systems, complex and complete scene information flow can be obtained through strong learning methods such as PPO (polyphenylene oxide), Q-learning and the like. And after the information flow is constructed, putting the unmanned vehicle with the automatic driving system into the information flow for testing the performance of the unmanned vehicle.

In an application example, the method for constructing the traffic information stream specifically includes the following stages:

(I) training phase

1. Starting a simulation engine, loading a map of an area for a certain scene in a training field system, and placing a plurality of, for example, 100 agents in the map, including: barrier vehicles, pedestrians, traffic lights, etc. may be mobile or stationary agents. The real world is simulated. The Agent has attribute information: size, position (x, y), velocity, acceleration, etc. The attribute information of the general Agent has an initial value, and some attribute information changes according to the running state in the training process.

2. And controlling the agents to run in the scene, and collecting the states of all the agents through a central transceiving system.

3. And scoring the states of the agents in the scene by using the rule base, and further performing reward and punishment by using a reward and punishment mechanism so as to stimulate each Agent to adjust the self operation mode. For example, each vehicle in the control scenario is traveling on a road. When the vehicle is traveling, the vehicle may be in various traveling states such as acceleration, deceleration, lane change, overtaking, and the like. Scoring the running state of each vehicle, and carrying out reward and punishment by utilizing a reward and punishment mechanism. For example, some vehicles follow the traffic rules and run fast with a high score, which can be awarded. For another example, some vehicles violate traffic rules, and the score is lower, so that punishment can be given. For another example, some vehicles may be given a penalty in the event of a collision.

4. And each Agent adjusts the running state of the Agent according to the score, reward and punishment and other results of the Agent.

5. And finishing training after the scores and punishments of all the agents meet certain requirements. For example, the score of each Agent is greater than a certain threshold and is not penalized, and training can be completed.

(II) main vehicle operation stage

1. Clicking the guided-in host vehicle on the page of the simulation engine can place the host vehicle at a certain position of the simulated scene which is trained. The host vehicle has initial attribute information and an operation strategy. The attribute information may include position, velocity, acceleration, and the like. The operating strategy may include the host's reaction to the external environment and planning.

2. The main vehicle runs in the scene according to the attribute information and the operation strategy, and the running state of the main vehicle is judged by using the rule base. If the result of the travel is not good, such as a collision, the travel result may be returned to direct modification of the operating strategy of the host vehicle.

Furthermore, snapshots of some scenes may be saved during unmanned vehicle testing, i.e., during the primary vehicle operation phase. For example, if an unmanned vehicle fault condition is found, accident snapshots can be collected through a central transceiver system. And playing back the snapshot in the simulation, so that the snapshot is used for guiding and iterating the unmanned vehicle related strategy.

The traffic information flow of the simulation scene is trained by adopting a reinforcement learning method, enough and complicated automatic driving simulation scenes can be provided, and therefore the simulation result of the automatic driving vehicle is more accurate. The method is suitable for constructing the simulation scenes of the massive unmanned vehicles. By training various agents in the constructed simulation scene, after a stable simulation scene is obtained, the main vehicle is put into the scene to operate, and then the strategy of the unmanned vehicle is adjusted, so that the simulation result of the unmanned vehicle is more accurate, and the adaptability to the scene is wider.

Fig. 5 is a block diagram showing a construction of a traffic information stream constructing apparatus according to an embodiment of the present invention. As shown in fig. 5, the apparatus may include:

a map loading module 51, configured to load a regional map of a simulation scene to be constructed;

an agent placement module 52 for placing a plurality of agents in the area map;

an agent control module 53, configured to control each agent to operate in the area map, and collect an operation state of each agent;

and the reinforcement learning module 54 is configured to perform reinforcement learning on the operation states of the agents according to traffic rules to obtain a traffic information stream of the simulation scene.

In one embodiment, as shown in fig. 6, the reinforcement learning module 54 includes:

a scoring submodule 541, configured to score the operating state of each agent according to the traffic rules in the rule base;

a reward submodule 542 configured to reward the agent if the score of the agent is above a set threshold;

the punishment submodule 543 is used for punishing the intelligent agent if the score of the intelligent agent is lower than a set threshold;

and the adjusting submodule 544 is configured to, according to the scores and the reward and punishment results of the smarts, self-adjust the operation policy of the smarts, so that the scores of the smarts meet the set threshold and are not punished.

In one embodiment, as shown in fig. 7, the apparatus further comprises:

a master placement module 71 for placing a master in the simulated scene, the master having an automatic driving system;

a main vehicle control module 72, configured to control the main vehicle to run in the simulation scene according to its own operation strategy;

and a main vehicle adjusting module 73, configured to determine a running state of the main vehicle, and adjust the operation strategy of the main vehicle by using the determination result.

In one embodiment, the apparatus further comprises:

and the snapshot module 74 is used for saving snapshot information of the driving of the host vehicle in the simulation scene.

In one embodiment, the snapshot module 74 is further configured to save snapshot information of a course of an accident if the host vehicle is traveling in the simulation scenario.

The functions of each module in each apparatus in the embodiments of the present invention may refer to the corresponding description in the above method, and are not described herein again.

Fig. 8 is a block diagram showing a construction of a traffic information stream constructing apparatus according to an embodiment of the present invention. As shown in fig. 8, the apparatus includes: a memory 910 and a processor 920, the memory 910 having stored therein computer programs operable on the processor 920. The processor 920 implements the transaction commit method in the above embodiments when executing the computer program. The number of the memory 910 and the processor 920 may be one or more.

The device also includes:

and a communication interface 930 for communicating with an external device to perform data interactive transmission.

Memory 910 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 910, the processor 920 and the communication interface 930 are implemented independently, the memory 910, the processor 920 and the communication interface 930 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

Optionally, in an implementation, if the memory 910, the processor 920 and the communication interface 930 are integrated on a chip, the memory 910, the processor 920 and the communication interface 930 may complete communication with each other through an internal interface.

An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program is used for implementing the method of any one of the above embodiments when being executed by a processor.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various changes or substitutions within the technical scope of the present invention, and these should be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for constructing a traffic information stream, comprising:

loading a regional map of a simulation scene to be constructed;

placing a plurality of agents in the area map;

2. The method of claim 1, wherein the using of traffic rules to perform reinforcement learning on the operation state of each agent to obtain the traffic information flow of the simulation scenario comprises:

rewarding the agent if the agent's score is above a set threshold;

and according to the scores and reward punishment results of the intelligent bodies, the intelligent bodies self-adjust the operation strategy so that the scores of the intelligent bodies meet the set threshold value and are not punished.

3. The method of claim 1 or 2, further comprising:

4. The method of claim 3, further comprising:

5. The method of claim 4, wherein saving snapshot information of travel of the host vehicle in the simulated scene comprises:

6. An apparatus for constructing a traffic information stream, comprising:

7. The apparatus of claim 6, wherein the reinforcement learning module comprises:

and the adjusting submodule is used for self-adjusting the operation strategy of each intelligent body according to the scores and the reward punishment results of each intelligent body, so that the scores of each intelligent body meet the set threshold value and are not punished.

8. The apparatus of claim 6 or 7, further comprising:

9. The apparatus of claim 8, further comprising:

10. The apparatus of claim 9, wherein the snapshot module is further configured to save snapshot information of a course of an accident if the host vehicle is traveling in the simulated scene.

11. An apparatus for constructing a traffic information stream, comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.

12. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 5.