CN117151246B - Agent decision method, control method, electronic device and storage medium - Google Patents
Agent decision method, control method, electronic device and storage medium Download PDFInfo
- Publication number
- CN117151246B CN117151246B CN202311406886.5A CN202311406886A CN117151246B CN 117151246 B CN117151246 B CN 117151246B CN 202311406886 A CN202311406886 A CN 202311406886A CN 117151246 B CN117151246 B CN 117151246B
- Authority
- CN
- China
- Prior art keywords
- agent
- decision
- information
- traffic
- intelligent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 105
- 238000003860 storage Methods 0.000 title claims abstract description 29
- 230000006399 behavior Effects 0.000 claims abstract description 75
- 238000004088 simulation Methods 0.000 claims abstract description 31
- 230000007613 environmental effect Effects 0.000 claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 24
- 230000003068 static effect Effects 0.000 claims description 17
- 230000009471 action Effects 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 5
- 238000004458 analytical method Methods 0.000 claims description 4
- 238000007621 cluster analysis Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 230000004888 barrier function Effects 0.000 claims description 3
- 238000013136 deep learning model Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 7
- 230000001364 causal effect Effects 0.000 abstract description 3
- 239000003795 chemical substances by application Substances 0.000 description 198
- 238000004422 calculation algorithm Methods 0.000 description 25
- 238000004590 computer program Methods 0.000 description 8
- 238000012360 testing method Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 5
- 230000001133 acceleration Effects 0.000 description 4
- 230000002787 reinforcement Effects 0.000 description 4
- 241001465754 Metazoa Species 0.000 description 2
- 238000013475 authorization Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 235000002198 Annona diversifolia Nutrition 0.000 description 1
- 241000282842 Lama glama Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011157 data evaluation Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- GOLXNESZZPUPJE-UHFFFAOYSA-N spiromesifen Chemical compound CC1=CC(C)=CC(C)=C1C(C(O1)=O)=C(OC(=O)CC(C)(C)C)C11CCCC1 GOLXNESZZPUPJE-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
- G08G1/0125—Traffic data processing
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
- G08G1/0137—Measuring and analyzing of parameters relative to traffic conditions for specific applications
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/09—Arrangements for giving variable traffic instructions
- G08G1/0962—Arrangements for giving variable traffic instructions having an indicator mounted inside the vehicle, e.g. giving voice messages
- G08G1/0968—Systems involving transmission of navigation instructions to the vehicle
- G08G1/096877—Systems involving transmission of navigation instructions to the vehicle where the input to the navigation device is provided by a suitable I/O arrangement
- G08G1/096888—Systems involving transmission of navigation instructions to the vehicle where the input to the navigation device is provided by a suitable I/O arrangement where input information is obtained using learning systems, e.g. history databases
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the field of computers, in particular to an agent decision method, a control method, electronic equipment and a storage medium, and aims to solve the problem that the actual driving behavior of a real person is not close enough when an agent is controlled by the existing simulation technology. To this end, the method of the invention comprises: acquiring environmental information and driving intention of an agent to be decided; inputting the environment information and the driving intention of the agent to be decided into a trained agent decision model to obtain an agent decision result; wherein the agent decision model is trained based at least on the following steps: constructing an initial agent decision model based on the large language model; training an initial agent decision model based on traffic environment data to obtain a trained agent decision model. Through the implementation mode, the intelligent agent can be effectively controlled to execute various behaviors, the causal relevance of the various behaviors is high, the decision flexibility is high, and the simulation data can better cover the real-world driving behaviors.
Description
Technical Field
The invention relates to the technical field of computers, in particular to an agent decision method, a control method, electronic equipment and a storage medium.
Background
At present, in the technical field of robots or intelligent driving simulation, other intelligent agents are controlled to interact with a tested simulation main object mainly based on a rule algorithm or based on reinforcement learning (Reinforcement Learning). However, the two methods are used for controlling the intelligent body, which is not close to the driving behavior of a real person, and mainly show that the behavior of the controlled object is single, the controllability is poor, the causal association between the front and back of the behavior is poor, and the like, so that the real world cannot be fully covered when the algorithm test or the model training is performed by using a simulation technology.
Accordingly, there is a need in the art for a new solution to the above-mentioned problems.
Disclosure of Invention
In order to overcome the above-mentioned drawbacks, the present invention provides an agent decision method, a control method, an electronic device and a storage medium, which solve or at least partially solve the technical problem that the existing simulation technology is not close to real driving behavior of an actual person when controlling the agent.
In a first aspect, there is provided an agent decision method, the method comprising:
acquiring environmental information and driving intention of an agent to be decided;
inputting the environmental information and the driving intention of the intelligent agent to be decided into a trained intelligent agent decision model to obtain an intelligent agent decision result;
wherein the agent decision model is trained based at least on the following steps:
constructing an initial agent decision model based on the large language model;
training the initial agent decision model based on traffic environment data to obtain the trained agent decision model.
In one technical scheme of the agent decision method, the training the initial agent decision model based on the traffic environment data includes:
acquiring the traffic environment data;
generating traffic behavior description information based on the traffic environment data;
training the initial agent decision model based on the traffic behavior description information;
the traffic behavior description information comprises at least one of the to-be-decided intelligent agent, other intelligent agents in a preset area around the to-be-decided intelligent agent and movement tracks and action semantics of other living bodies.
In one technical scheme of the agent decision method, the traffic environment data comprises dynamic traffic environment data and static traffic environment data; the acquiring the traffic environment data comprises:
acquiring the dynamic traffic environment data and the static traffic environment data based on a simulation simulator and/or a sensor;
the dynamic traffic environment data comprises driving intention data of an agent to be decided, behavior state data of other agents and other life bodies, and the static traffic environment data comprises at least one of traffic signal data and traffic sign data.
In one technical scheme of the agent decision method, the generating traffic behavior description information based on the traffic environment data includes:
preprocessing the traffic environment data;
extracting feature data based on the preprocessed traffic environment data;
performing cluster analysis on the characteristic data;
and generating the traffic behavior description information based on the clustering analysis result and a preset rule.
In one technical scheme of the agent decision method, the environment information comprises dynamic environment information and static environment information, and the driving intention comprises the position, the target position and the behavior state information of the agent to be decided; the obtaining the environmental information and the driving intention of the agent to be decided comprises the following steps:
acquiring the dynamic environment information and the static environment information, and acquiring the position, the target position and the behavior state information of the intelligent body to be decided;
wherein the dynamic environment information comprises behavior state information of other intelligent agents and other living bodies; the static environment information includes at least one of traffic signal information and traffic sign information.
In one technical scheme of the agent decision method, the inputting the environmental information and the driving intention of the agent to be decided into the trained agent decision model to obtain the agent decision result includes:
analyzing the environment information and the driving intention of the intelligent agent to be decided to acquire at least one of traffic condition information, barrier information and drivable area information;
formulating a driving planning strategy based on the traffic condition information and/or the obstacle information and/or the drivable area information; the driving planning strategy comprises at least one of a current driving state, a driving speed and a driving route;
and generating the agent decision result based on the driving planning strategy.
In one technical scheme of the agent decision method, the agent decision result comprises a behavior planning result and a path planning result; the generating the agent decision result based on the driving planning strategy comprises:
generating the behavior planning result and/or the path planning result based on the driving planning strategy;
wherein the behavior planning result comprises the current running state; the path planning result includes the travel speed and the travel route.
In a second aspect, the present invention provides an agent control method, the method comprising:
the method for deciding the intelligent agent based on any one of the technical schemes of the method for deciding the intelligent agent acquires decision results of other intelligent agents in a preset area around the intelligent agent to be controlled;
the decision result of the agent to be controlled is obtained based on the agent decision method and the decision result of the other agents according to any one of the technical schemes of the agent decision method;
evaluating the intelligent agent to be controlled based on the decision result of the intelligent agent to be controlled;
and correcting the decision result of the intelligent agent to be controlled based on the evaluation result, and controlling the intelligent agent to be controlled based on the corrected decision result.
In a third aspect, an electronic device is provided, which comprises a processor and a storage device, the storage device being adapted to store a plurality of program codes, the program codes being adapted to be loaded and run by the processor to perform an agent decision method according to any one of the above-mentioned agent decision methods or an agent control method according to any one of the above-mentioned agent control methods.
In a fourth aspect, a computer readable storage medium is provided, in which a plurality of program codes are stored, the program codes being adapted to be loaded and run by a processor to perform an agent decision method according to any one of the above-mentioned agent decision methods or an agent control method according to any one of the above-mentioned agent control methods.
The technical scheme provided by the invention has at least one or more of the following beneficial effects:
in the technical scheme of implementing the invention, firstly, the environmental information and the driving intention of the intelligent body to be decided are acquired, then the environmental information and the driving intention of the intelligent body to be decided are input into a trained intelligent body decision model, and an intelligent body decision result is obtained; when the intelligent agent decision model is trained, an initial intelligent agent decision model is built based on a large language model, and then the initial intelligent agent decision model is trained based on traffic environment data, so that a trained intelligent agent decision model is obtained. According to the embodiment, the intelligent body decision model obtained through adjustment training of the large language model can simulate the thinking process of human driving by utilizing the thinking chain capacity of the large language model, and the decision result is output according to the environmental information and the driving intention of the intelligent body, so that the intelligent body can be effectively controlled to execute various behaviors, the causality of the various behaviors is high, the decision flexibility is high, and the better coverage of the simulation data on the real-world driving behaviors is realized.
Drawings
The present disclosure will become more readily understood with reference to the accompanying drawings. As will be readily appreciated by those skilled in the art: the drawings are for illustrative purposes only and are not intended to limit the scope of the present invention. Wherein:
FIG. 1 is a flow chart illustrating the main steps of an agent decision method according to one embodiment of the present invention;
FIG. 2 is a flow chart of the main steps of a training method of an agent decision model according to one embodiment of the invention;
FIG. 3 is a flow chart of the main steps of training an initial agent decision model based on traffic environment data according to one embodiment of the present invention;
FIG. 4 is a flow chart of the main steps of inputting environmental information and driving intent of an agent to be decided into a trained agent decision model to obtain an agent decision result, according to one embodiment of the present invention;
FIG. 5 is a schematic diagram of a simulation scenario in accordance with one embodiment of the present invention;
FIG. 6 is a flow chart illustrating the main steps of an agent control method according to one embodiment of the present invention;
fig. 7 is a schematic view of the main structure of an electronic device according to an embodiment of the present invention.
List of reference numerals:
701: a processor; 702: a storage device.
Detailed Description
Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
In the description of the present invention, a "processor" may include hardware, software, or a combination of both. The processor may be a central processor, a microprocessor, an image processor, a digital signal processor, or any other suitable processor. The processor has data and/or signal processing functions. The processor may be implemented in software, hardware, or a combination of both. The computer readable storage medium includes any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random access memory, and the like. The term "a and/or B" means all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" has a meaning similar to "A and/or B" and may include A alone, B alone or A and B. The singular forms "a", "an" and "the" include plural referents.
Some terms related to the present invention will be explained first.
Agent (Agent): in the field of artificial intelligence, an agent refers to a program or machine capable of simulating human intelligent behavior, having interactive capabilities, which can perceive environmental information, understand language and semantics, reason and decision, and perform various tasks.
Large language model (Large Language Model, LLM): refers to deep learning models trained using large amounts of text data that can generate natural language text or understand the meaning of language text. The large language model may handle a variety of natural language tasks such as text classification, questions and answers, conversations, and the like.
Simulation simulator: is a computer program for simulating certain scenes or processes of the real world, typically for training and testing the performance and behavior of complex systems. The simulation simulator may simulate a variety of different systems including physical systems, social systems, economic systems, traffic systems, and the like.
As described in the background art, in the field of robot or intelligent driving simulation technology, other agents are controlled to interact with a subject object to be simulated based on a rule algorithm or reinforcement learning (Reinforcement Learning). However, the two methods are used for controlling the intelligent body, which is not close to the driving behavior of a real person, and mainly show that the behavior of the controlled object is single, the controllability is poor, the causal association between the front and back of the behavior is poor, and the like, so that the real world cannot be fully covered when the algorithm test or the model training is performed by using a simulation technology.
In order to solve the problems, the invention provides an agent decision method, a control method, electronic equipment and a storage medium.
Referring to fig. 1, fig. 1 is a schematic flow chart of main steps of an agent decision method according to an embodiment of the present invention. As shown in fig. 1, the agent decision method in the embodiment of the invention mainly includes the following steps S101 to S102.
Step S101: acquiring environmental information and driving intention of an agent to be decided;
step S102: and inputting the environmental information and the driving intention of the agent to be decided into a trained agent decision model to obtain an agent decision result.
Referring to fig. 2, fig. 2 is a schematic flow chart of main steps of a training method of an agent decision model according to an embodiment of the present invention. As shown in fig. 2, the agent decision model is trained based on at least the following steps S1021 to S1022:
step S1021: constructing an initial agent decision model based on the large language model;
step S1022: training an initial agent decision model based on traffic environment data to obtain a trained agent decision model.
Based on the method described in the steps S101 to S102, the intelligent decision model obtained by adjusting and training through the large language model can simulate the thinking process of human driving by using the thinking chain capability of the large language model, and output decision results according to the environmental information and the driving intention of the intelligent body, so that the intelligent body can be effectively controlled to execute various behaviors, the causality of the various behaviors is high, the decision flexibility is high, and the better coverage of the simulation data to the real-world driving behaviors is realized.
The above agent decision method is further described below.
In some embodiments, an agent decision model needs to be constructed and trained before the agent decision method is performed, specifically, refer to steps S1021 to S1022 shown in fig. 2.
In some embodiments of step S1021 described above, an initial agent decision model may be constructed based on a large language model such as GPT, llama, chatGLM, falcon.
Further, in some embodiments of step S1022 described above, the initial agent decision model may be trained based on traffic environment data.
Referring to fig. 3, fig. 3 is a schematic flow chart of main steps for training an initial agent decision model based on traffic environment data according to an embodiment of the present invention. As shown in fig. 3, the following steps S301 to S303 are mainly included:
step S301: acquiring traffic environment data;
wherein the traffic environment data includes dynamic traffic environment data and static traffic environment data.
Further, in some embodiments of step S301, acquiring traffic environment data includes: dynamic traffic environment data and static traffic environment data are acquired based on the simulation simulator and/or the sensor.
The simulation simulator is a computer program, can simulate a real traffic environment and generate virtual traffic environment data, and can embed an agent into the simulation simulator when the simulation technology is used for model training, and the traffic environment data is acquired based on the simulation simulator.
The sensor is a real physical device, can sense traffic environment and collect real traffic environment data, and can acquire the traffic environment data based on cameras, laser radars, inertial measurement units (Inertial Measurement Unit, IMU), positioning system GPS and other sensors when model training is performed.
The traffic environment data can be obtained through the simulation simulator and the sensor, and in practical application, a person skilled in the art can obtain the traffic environment data according to specific scenes and requirements, and the traffic environment data is not limited herein.
In some embodiments of step S301, the dynamic traffic environment data of the acquired traffic environment data includes driving intention data of the agent to be decided, behavior state data of other agents and other living bodies; the static traffic environment data includes at least one of traffic signal data and traffic sign data.
The intelligent agent to be decided can be various driving equipment such as cars, trucks, buses and the like; other agents may be various types of driving apparatuses, obstacles, and the like; other living bodies may be other traffic participants, such as pedestrians, animals, etc.
Further, the driving intention data of the agent to be decided includes the position of the driving device to be decided, the target position and behavior state information including the running speed, the acceleration, the braking state, the steering angle, etc.
The behavior state data of other intelligent agents comprise running speed, acceleration, braking state, steering angle, position of obstacle and the like of other driving equipment in a preset area (such as 20 meters) around the intelligent agent to be decided.
Behavior state data of other living bodies includes behaviors of pedestrians, animals, such as crossing roads, waiting for streets, walking, and the like.
The traffic signal data is the control rules of various traffic signals in the road, the states of the traffic signals and the like, and comprises the types of the traffic signals (such as traffic lights, arrow lights and the like), colors (such as red lights, green lights, yellow lights and the like) and timing schemes (such as green light time, yellow light time, red light time and the like of each intersection).
Traffic sign data is the type, position, pattern, meaning, etc. of various traffic signs in the road, including the type of traffic sign (e.g., forbidden sign, indicated sign, warning sign, service facility sign, etc.), the position (e.g., at the start, end, intersection, junction, toll station, etc.), the pattern and color (e.g., red circle of forbidden sign, arrow of indicated sign, triangle of warning sign, etc.), the meaning, and the effect (e.g., the effect of forbidden sign is to forbid certain traffic behaviors, the effect of indicated sign is to indicate direction and distance, etc.).
It should be noted that the above examples of the acquired traffic environment data are only illustrative, and in practical applications, those skilled in the art may acquire corresponding traffic environment data according to specific situations, and the relevant acquired traffic environment data are all within the protection scope of the present invention, which is not limited herein.
The above is a further explanation of step S301.
Step S302: generating traffic behavior description information based on the traffic environment data;
in some embodiments, step S302 includes the following steps S3021 to S3024:
step S3021: preprocessing traffic environment data;
the original traffic environment data can be subjected to preprocessing operations such as cleaning, denoising and interpolation, and the quality and accuracy of the data are improved.
Step S3022: extracting feature data based on the preprocessed traffic environment data;
and extracting characteristic data required by generating traffic flow description information from the preprocessed traffic environment data.
Step S3023: performing cluster analysis on the characteristic data;
the similar data in the characteristic data are classified into one type, wherein the clustering analysis can use K-means clustering, hierarchical clustering and other algorithms, and the method is not limited herein.
Step S3024: and generating traffic behavior description information based on the clustering analysis result and preset rules.
Specifically, the result of the cluster analysis is combined with a preset rule to generate traffic behavior description information. The preset rules may be formulated according to actual requirements, for example, a speed limit, a lane rule, etc. to be observed by the driving device in a certain period of time may be specified.
In some embodiments, the traffic behavior description information includes at least one of movement trajectories and action semantics of the agent to be decided, other agents within a preset area around the agent to be decided, and other living bodies.
The movement track of the intelligent agent to be decided can be a path, and the action semantics can be left and right turning or lane changing and the like; the movement track and action semantics of other agents and other living bodies can be movement around the agent to be decided on, etc.
Further, according to the above traffic behavior description information, road conditions, traffic flows, obstacle information, and the like may be acquired.
The above is a further explanation of step S302.
Step S303: and training the initial agent decision model based on the traffic behavior description information.
Specifically, the initial intelligent body decision model can be continuously subjected to fine tuning training based on the traffic behavior description information, the traffic behavior description information is used as the input of the model, the intelligent body decision result is used as the output of the model, and model parameters and an optimization algorithm are adjusted according to actual conditions in the training process until the preset training times are met or the initial intelligent body decision model is converged to a preset error, so that the trained intelligent body decision model is obtained.
Through the implementation mode, the intelligent decision model can simulate the thinking process of human driving by utilizing the thinking chain capability of the large language model, and has higher decision flexibility.
The foregoing is illustrative of the construction and training of an agent decision model, and further, the agent decision method may be performed based on the agent decision model.
In some embodiments of step S101 described above, the environmental information includes dynamic environmental information and static environmental information, and the driving intention includes the location where the agent to be decided is located, the target location, and the behavior state information.
Further, the obtaining the environmental information and the driving intention of the agent to be decided includes: dynamic environment information and static environment information are acquired, and the position, target position and behavior state information of the intelligent agent to be decided are acquired.
The dynamic environment information comprises behavior state information of other intelligent agents and other life agents; the static environment information includes at least one of traffic signal information and traffic sign information. Other relevant available environmental information and driving intention of the agent to be decided are all within the scope of the present invention, which is not limited herein.
In some embodiments, driving intent of the agent to be decided, behavior state information of other agents and other living bodies, as well as traffic signal information, traffic sign information may be obtained based on the simulation simulator and/or the sensor. For convenience and brevity of description, the description of acquiring the environmental information and the driving intention of the agent to be determined may refer to the content described in the embodiment of step S301, which is not repeated herein.
The above is a further explanation of step S101, and the following further explanation of step S102 is continued.
In some implementations of step S102, referring to fig. 4, fig. 4 is a schematic flow chart of main steps of inputting environmental information and driving intention of an agent to be decided into a trained agent decision model to obtain an agent decision result according to an embodiment of the present invention. As shown in fig. 4, step S102 mainly includes the following steps S401 to S403:
step S401: analyzing the environment information and the driving intention of the intelligent agent to be decided to acquire at least one of traffic condition information, barrier information and drivable area information;
the current driving environment state and the drivable path can be obtained by obtaining traffic condition information, obstacle information and drivable area information.
Step S402: formulating a driving planning strategy based on traffic condition information and/or obstacle information and/or drivable area information;
the driving planning strategy comprises at least one of a current driving state, a driving speed and a driving route.
By making a driving planning strategy, the intelligent agent to be decided can select the optimal driving route and speed under different driving conditions so as to realize safe and efficient driving.
Step S403: and generating an agent decision result based on the driving planning strategy.
The intelligent agent can make an optimal decision according to the current running environment and task requirements so as to realize safe and efficient running. The agent decision results comprise a behavior planning result and a path planning result.
Further, step S403 includes: and generating a behavior planning result and/or a path planning result based on the driving planning strategy.
In some embodiments, the behavior planning result includes a current driving state; the path planning result includes the travel speed and the travel route.
The current running state comprises a current vehicle speed, a current steering angle and behaviors to be executed by an intelligent agent, such as turning, lane changing, decelerating and the like; the running speed is the running speed of a preset path, and the running route is a planned running route.
Further, in some embodiments, when intelligent algorithm testing is performed using simulation technology, referring to fig. 5, fig. 5 is a schematic diagram of a simulation scenario according to an embodiment of the present invention. As shown in fig. 5, after the to-be-decided agent obtains the environmental information and the driving intention of the to-be-decided agent based on the simulation simulator, the decision result output by the agent decision model can be transmitted back to the simulation simulator for the next simulation, and the simulation simulator is connected to the intelligent algorithm to be tested for testing or directly stores the decision result as the output of the synthesized data.
In some embodiments, when the intelligent algorithm test is performed in the real environment, the intelligent agent to be decided can perform corresponding operations based on the decision result, which is not limited herein.
The above is a description of the agent decision method.
Further, the invention also provides an agent control method.
In some embodiments, an agent control system may be constructed having multiple agents, including the agent to be controlled and multiple other agents within a predetermined area around the agent.
The intelligent algorithm is mainly used for controlling the intelligent agent (i.e. driving equipment) to realize decision results such as path planning, autonomous navigation, obstacle detection, self-adaptive cruising, lane keeping, automatic parking and the like. For example, an optimal driving route is calculated through an intelligent algorithm, and the driving speed and the driving direction are automatically adjusted according to the traffic conditions around the driving equipment, so that the driving equipment can timely avoid other vehicles and pedestrians, and the driving safety and the traffic smoothness are improved.
Other intelligent agents in the control system are used for interacting with the intelligent agents to be controlled so as to test and evaluate the performance and effect of the intelligent algorithm in the intelligent agents to be controlled, further correct the intelligent algorithm and improve the performance and stability of the intelligent algorithm.
Specifically, each of the other agents is provided with a respective task and is controlled based on a corresponding agent decision model. Among other tasks, other agents include providing different driving behaviors, generating different road environments, providing interference and attacks, collecting simulation data, and the like. The driving behavior comprises acceleration, deceleration, turning, lane changing and the like, so that the control effect of the intelligent algorithm under different scenes is tested; the generated road environment comprises urban streets, expressways, rural small roads and the like, so that the adaptability of the intelligent algorithm under different road conditions is tested; the provided interference and attack comprise abrupt lane change, red light running, abrupt parking and the like, so as to test the reaction capability and safety of the intelligent algorithm under complex conditions; the collected simulation data includes driving equipment position, speed, acceleration, etc. for data analysis and evaluation of the performance of the intelligent algorithm.
In some embodiments, referring to fig. 6, fig. 6 is a schematic flow chart of main steps of an agent control method according to an embodiment of the present invention. As shown in fig. 6, the following steps S601 to S604 are mainly included:
step S601: acquiring other agent decision results in a preset area around the agent to be controlled based on the agent decision method in the agent decision method embodiment;
in the control system of the intelligent agent, other intelligent agents in a preset area around the intelligent agent to be controlled can make decision results, including behavior planning results and path planning results, according to the current running environment and corresponding tasks through corresponding intelligent agent decision models.
Step S602: acquiring decision results of the intelligent agent to be controlled based on the intelligent agent decision method and other intelligent agent decision results in the intelligent agent decision method embodiment;
specifically, decision results of a plurality of other agents in a preset area around the to-be-controlled agent can be used as a part of the environmental information, and the decision results of the to-be-controlled agent can be obtained according to the acquired environmental information and the driving intention of the to-be-controlled agent further based on an intelligent algorithm set in the to-be-controlled agent.
The decision result of the to-be-controlled agent includes a behavior planning result and a path planning result of the to-be-controlled agent, for example, avoidance behavior (such as turning, lane changing, decelerating, etc.) generated when the to-be-controlled agent is closer to other agents, a running speed and a running route of the to-be-controlled agent, and the like.
Step S603: evaluating the agent to be controlled based on the decision result of the agent to be controlled;
namely, the performance and effect of the intelligent algorithm in the intelligent agent to be controlled are evaluated, and particularly the evaluation indexes such as the accuracy, the safety, the efficiency and the robustness of the intelligent algorithm can be included.
The accuracy refers to whether the decision result of the intelligent agent to be controlled accords with the actual situation; safety refers to whether the decision result of the intelligent agent to be controlled is safe or not, and whether the environment is damaged or not; efficiency refers to time and the like for generating decision results of the intelligent agent to be controlled; robustness refers to the resistance of the decision result of the agent to be controlled to external disturbances and noise. In addition, other evaluation indexes, such as reliability, expandability, maintainability, and the like, may also need to be considered according to specific application scenes and requirements.
Step S604: and correcting the decision result of the agent to be controlled based on the evaluation result, and controlling the agent to be controlled based on the corrected decision result.
By comprehensively evaluating the intelligent algorithm in the intelligent agent to be controlled, the decision result made by the intelligent algorithm can be corrected and optimized, the performance and stability of the intelligent algorithm are improved, and further, the corrected intelligent algorithm can be used for making the decision result, so that the intelligent agent to be controlled is controlled, and more accurate, efficient and safe running is realized.
Through the implementation mode, the intelligent agent can be effectively controlled to execute various behaviors, the causality of the various behaviors is strong, the decision flexibility is high, and the simulation data can better cover the real-world driving behaviors
It should be noted that, although the foregoing embodiments describe the steps in a specific order, it will be understood by those skilled in the art that, in order to achieve the effects of the present invention, the steps are not necessarily performed in such an order, and may be performed simultaneously (in parallel) or in other orders, and these variations are within the scope of the present invention.
It will be appreciated by those skilled in the art that the present invention may implement all or part of the above-described methods according to the above-described embodiments, or may be implemented by means of a computer program for instructing relevant hardware, where the computer program may be stored in a computer readable storage medium, and where the computer program may implement the steps of the above-described embodiments of the method when executed by a processor. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable storage medium may include: any entity or device, medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunications signals, software distribution media, and the like capable of carrying the computer program code.
The invention further provides electronic equipment. Referring to fig. 7, fig. 7 is a schematic view of the main structure of an electronic device according to an embodiment of the present invention. As shown in fig. 7, the electronic device in the embodiment of the present invention mainly includes a processor 701 and a storage 702, the storage 702 may be configured to store a program for executing the agent decision method or the agent control method of the above-described method embodiment, and the processor 701 may be configured to execute the program in the storage 702, including, but not limited to, the program for executing the agent decision method or the agent control method of the above-described method embodiment. For convenience of explanation, only those portions of the embodiments of the present invention that are relevant to the embodiments of the present invention are shown, and specific technical details are not disclosed, please refer to the method portions of the embodiments of the present invention.
In some possible embodiments of the invention, the electronic device may comprise a plurality of processors 701 and a plurality of storage 702. While the program for performing the agent decision method or the agent control method of the above method embodiment may be divided into a plurality of sub-programs, each of which may be loaded and executed by the processor 701 to perform the different steps of the agent decision method or the agent control method of the above method embodiment, respectively. Specifically, each of the sub-programs may be stored in a different storage 702, and each of the processors 701 may be configured to execute the programs in one or more storage 702 to collectively implement the agent decision method or the agent control method of the above method embodiment, that is, each of the processors 701 executes different steps of the agent decision method or the agent control method of the above method embodiment, respectively, to collectively implement the agent decision method or the agent control method of the above method embodiment.
The plurality of processors 701 may be processors disposed on the same device, for example, the electronic device may be a high-performance device composed of a plurality of processors, and the plurality of processors 701 may be processors configured on the high-performance device. In addition, the plurality of processors 701 may be processors disposed on different devices, for example, the electronic device may be a server cluster, and the plurality of processors 701 may be processors on different servers in the server cluster.
Further, the invention also provides a computer readable storage medium. In one embodiment of a computer-readable storage medium according to the present invention, the computer-readable storage medium may be configured to store a program for executing the agent decision method or the agent control method of the above-described method embodiment, which may be loaded and executed by a processor to implement the agent decision method or the agent control method described above. For convenience of explanation, only those portions of the embodiments of the present invention that are relevant to the embodiments of the present invention are shown, and specific technical details are not disclosed, please refer to the method portions of the embodiments of the present invention. The computer readable storage medium may be a storage device including various electronic devices, and optionally, the computer readable storage medium in the embodiments of the present invention is a non-transitory computer readable storage medium.
It should be noted that, the information and data related to the disclosed embodiments of the present invention are information and data authorized by the user or fully authorized by each party.
The data acquisition, acquisition and other actions related in the embodiment of the invention are all executed after the authorization of the user and the object or after the full authorization of all the parties.
Thus far, the technical solution of the present invention has been described in connection with one embodiment shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.
Claims (9)
1. An agent decision method, the method comprising:
acquiring environmental information and driving intention of an agent to be decided;
inputting the environmental information and the driving intention of the intelligent agent to be decided into a trained intelligent agent decision model to obtain an intelligent agent decision result;
wherein the agent decision model is trained based at least on the following steps:
constructing an initial agent decision model based on the large language model; the large language model is a deep learning model with thinking chain capability;
virtual traffic environment data are acquired based on a simulation simulator, and real traffic environment data are acquired based on a sensor;
generating traffic behavior description information based on the virtual traffic environment data, the real traffic environment data and preset rules;
performing fine tuning training on the initial agent decision model based on the traffic behavior description information to obtain the trained agent decision model;
wherein the virtual traffic environment data and the real traffic environment data both comprise dynamic traffic environment data and static traffic environment data; the dynamic traffic environment data comprises driving intention data of an agent to be decided, behavior state data of other agents and other life bodies, and the static traffic environment data comprises at least one of traffic signal data and traffic sign data.
2. The agent decision method of claim 1, wherein the performing fine-tuning training on the initial agent decision model based on the traffic behavior description information to obtain the trained agent decision model comprises:
performing fine tuning training on the initial intelligent agent decision model based on the traffic behavior description information until the initial intelligent agent decision model converges to a preset error, so as to obtain the trained intelligent agent decision model;
the traffic behavior description information comprises at least one of the to-be-decided intelligent agent, other intelligent agents in a preset area around the to-be-decided intelligent agent and movement tracks and action semantics of other living bodies.
3. The agent decision method of claim 2, wherein the generating traffic behavior description information based on the virtual traffic environment data, the real traffic environment data, and a preset rule comprises:
preprocessing the virtual traffic environment data and the real traffic environment data;
extracting feature data based on the preprocessed traffic environment data;
performing cluster analysis on the characteristic data;
and generating the traffic behavior description information based on the clustering analysis result and the preset rule.
4. The agent decision method of claim 1, wherein the environmental information includes dynamic environmental information and static environmental information, and the driving intention includes a location of an agent to be decided, a target location, and behavior state information; the obtaining the environmental information and the driving intention of the agent to be decided comprises the following steps:
acquiring the dynamic environment information and the static environment information, and acquiring the position, the target position and the behavior state information of the intelligent body to be decided;
wherein the dynamic environment information comprises behavior state information of other intelligent agents and other living bodies; the static environment information includes at least one of traffic signal information and traffic sign information.
5. The agent decision method of claim 1, wherein the inputting the environmental information and the driving intention of the agent to be decided into the trained agent decision model to obtain an agent decision result comprises:
analyzing the environment information and the driving intention of the intelligent agent to be decided to acquire at least one of traffic condition information, barrier information and drivable area information;
formulating a driving planning strategy based on the traffic condition information and/or the obstacle information and/or the drivable area information; the driving planning strategy comprises at least one of a current driving state, a driving speed and a driving route;
and generating the agent decision result based on the driving planning strategy.
6. The agent decision method of claim 5, wherein the agent decision results include a behavior planning result and a path planning result; the generating the agent decision result based on the driving planning strategy comprises:
generating the behavior planning result and/or the path planning result based on the driving planning strategy;
wherein the behavior planning result comprises the current running state; the path planning result includes the travel speed and the travel route.
7. An agent control method, the method comprising:
acquiring other agent decision results in a preset area around the agent to be controlled based on the agent decision method of any one of claims 1 to 6;
acquiring a decision result of an agent to be controlled based on the agent decision method of any one of claims 1 to 6 and the other agent decision results;
evaluating the intelligent agent to be controlled based on the decision result of the intelligent agent to be controlled;
and correcting the decision result of the intelligent agent to be controlled based on the evaluation result, and controlling the intelligent agent to be controlled based on the corrected decision result.
8. An electronic device comprising a processor and a storage means, the storage means being adapted to store a plurality of program codes, characterized in that the program codes are adapted to be loaded and executed by the processor to perform the agent decision method of any one of claims 1 to 6 or the agent control method of claim 7.
9. A computer readable storage medium, in which a plurality of program codes are stored, characterized in that the program codes are adapted to be loaded and executed by a processor to perform the agent decision method of any one of claims 1 to 6 or the agent control method of claim 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311406886.5A CN117151246B (en) | 2023-10-27 | 2023-10-27 | Agent decision method, control method, electronic device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311406886.5A CN117151246B (en) | 2023-10-27 | 2023-10-27 | Agent decision method, control method, electronic device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117151246A CN117151246A (en) | 2023-12-01 |
CN117151246B true CN117151246B (en) | 2024-02-20 |
Family
ID=88906402
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311406886.5A Active CN117151246B (en) | 2023-10-27 | 2023-10-27 | Agent decision method, control method, electronic device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117151246B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117909470B (en) * | 2024-01-15 | 2024-07-16 | 北京华如科技股份有限公司 | Rapid simulation design constructing method and device based on large language model |
CN118332421B (en) * | 2024-06-12 | 2024-09-03 | 航科广软(广州)数字科技有限公司 | Intelligent decision-making method, system, equipment and storage medium for ecological environment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111832652A (en) * | 2020-07-14 | 2020-10-27 | 北京罗克维尔斯科技有限公司 | Training method and device of decision model |
KR20220102395A (en) * | 2021-01-13 | 2022-07-20 | 부경대학교 산학협력단 | System and Method for Improving of Advanced Deep Reinforcement Learning Based Traffic in Non signalalized Intersections for the Multiple Self driving Vehicles |
CN115848413A (en) * | 2022-12-26 | 2023-03-28 | 广州文远知行科技有限公司 | Method, device, equipment and medium for determining control decision of automatic driving vehicle |
CN115951587A (en) * | 2023-03-10 | 2023-04-11 | 苏州浪潮智能科技有限公司 | Automatic driving control method, device, equipment, medium and automatic driving vehicle |
CN116661452A (en) * | 2023-05-30 | 2023-08-29 | 上海大学 | Unmanned ship environment perception decision-making method and system based on brain-like memory |
-
2023
- 2023-10-27 CN CN202311406886.5A patent/CN117151246B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111832652A (en) * | 2020-07-14 | 2020-10-27 | 北京罗克维尔斯科技有限公司 | Training method and device of decision model |
KR20220102395A (en) * | 2021-01-13 | 2022-07-20 | 부경대학교 산학협력단 | System and Method for Improving of Advanced Deep Reinforcement Learning Based Traffic in Non signalalized Intersections for the Multiple Self driving Vehicles |
CN115848413A (en) * | 2022-12-26 | 2023-03-28 | 广州文远知行科技有限公司 | Method, device, equipment and medium for determining control decision of automatic driving vehicle |
CN115951587A (en) * | 2023-03-10 | 2023-04-11 | 苏州浪潮智能科技有限公司 | Automatic driving control method, device, equipment, medium and automatic driving vehicle |
CN116661452A (en) * | 2023-05-30 | 2023-08-29 | 上海大学 | Unmanned ship environment perception decision-making method and system based on brain-like memory |
Also Published As
Publication number | Publication date |
---|---|
CN117151246A (en) | 2023-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11061398B2 (en) | Machine-learning systems and techniques to optimize teleoperation and/or planner decisions | |
CN110796856B (en) | Vehicle lane change intention prediction method and training method of lane change intention prediction network | |
CN108921200B (en) | Method, apparatus, device and medium for classifying driving scene data | |
CN117151246B (en) | Agent decision method, control method, electronic device and storage medium | |
RU2733015C1 (en) | Real-time vehicle control | |
Ishihara et al. | Multi-task learning with attention for end-to-end autonomous driving | |
US10401852B2 (en) | Teleoperation system and method for trajectory modification of autonomous vehicles | |
JP6962926B2 (en) | Remote control systems and methods for trajectory correction of autonomous vehicles | |
US9507346B1 (en) | Teleoperation system and method for trajectory modification of autonomous vehicles | |
CN110562258B (en) | Method for vehicle automatic lane change decision, vehicle-mounted equipment and storage medium | |
US20230124864A1 (en) | Graph Representation Querying of Machine Learning Models for Traffic or Safety Rules | |
US11740620B2 (en) | Operational testing of autonomous vehicles | |
CN108290579A (en) | Simulation system and method for autonomous vehicle | |
CN109213134A (en) | The method and apparatus for generating automatic Pilot strategy | |
CN108292473A (en) | Adaptive self vehicle planner logic | |
EP3371668A1 (en) | Teleoperation system and method for trajectory modification of autonomous vehicles | |
CN111572562A (en) | Automatic driving method, device, equipment, system, vehicle and computer readable storage medium | |
US11960292B2 (en) | Method and system for developing autonomous vehicle training simulations | |
Gómez-Huélamo et al. | Train here, drive there: ROS based end-to-end autonomous-driving pipeline validation in CARLA simulator using the NHTSA typology | |
WO2023192397A1 (en) | Capturing and simulating radar data for autonomous driving systems | |
Koenig et al. | Bridging the gap between open loop tests and statistical validation for highly automated driving | |
US20230358640A1 (en) | System and method for simulating autonomous vehicle testing environments | |
US20230070734A1 (en) | Method and system for configuring variations in autonomous vehicle training simulations | |
WO2023009926A1 (en) | Method and system for developing autonomous vehicle training simulations | |
Islam et al. | Evaluate the performance of attention DDPG-based adaptive cruise control system in all weather conditions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |