CN116279580A - Decision information generation method and device, electronic equipment and storage medium - Google Patents

Decision information generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116279580A
CN116279580A CN202310249192.9A CN202310249192A CN116279580A CN 116279580 A CN116279580 A CN 116279580A CN 202310249192 A CN202310249192 A CN 202310249192A CN 116279580 A CN116279580 A CN 116279580A
Authority
CN
China
Prior art keywords
information
target
driving
real
motion state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310249192.9A
Other languages
Chinese (zh)
Inventor
王渤谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Qianshi Technology Co Ltd
Original Assignee
Beijing Jingdong Qianshi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Qianshi Technology Co Ltd filed Critical Beijing Jingdong Qianshi Technology Co Ltd
Priority to CN202310249192.9A priority Critical patent/CN116279580A/en
Publication of CN116279580A publication Critical patent/CN116279580A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0015Planning or execution of driving tasks specially adapted for safety
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0002Automatic control, details of type of controller or control system architecture
    • B60W2050/0016State machine analysis
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • B60W2050/0028Mathematical models, e.g. for simulation
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0062Adapting control system settings
    • B60W2050/0075Automatic parameter input, automatic initialising or calibrating means
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2552/00Input parameters relating to infrastructure
    • B60W2552/05Type of road, e.g. motorways, local streets, paved or unpaved roads
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/80Spatial relation or speed relative to objects
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Automation & Control Theory (AREA)
  • Human Computer Interaction (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Traffic Control Systems (AREA)

Abstract

The disclosure provides a decision information generation method, a decision information generation device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence and the technical field of automatic driving. The decision information generation method comprises the following steps: and acquiring real-time motion state information of the target vehicle, lane type information of a target area and real-time relative motion state information of a traffic participation object and the target vehicle in response to receiving a driving service request of the target vehicle. And processing the real-time motion state information and the lane type information by using the driving scene analysis model to obtain target driving scene information. And determining a target driving behavior decision model according to the target driving scene information. And processing the real-time motion state information and the real-time relative motion state information by using the target driving behavior decision model to obtain driving behavior decision information of the target vehicle.

Description

Decision information generation method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence and autopilot technologies, and more particularly, to a decision information generation method, apparatus, electronic device, storage medium, and program product.
Background
With the continuous development of automatic driving technology, an automatic driving vehicle is widely applied to various fields as an intelligent device capable of integrating multiple functions such as sensing environment, dynamic decision, line planning, behavior control, command execution and the like. In the related art, driving decisions are generally performed based on static map data in combination with preset rules to avoid static obstacles in the driving path.
In the process of implementing the disclosed concept, the inventor finds that at least the following problems exist in the related art: in an actual driving environment, there are dynamic obstacles that dynamically change in addition to static obstacles, resulting in lower accuracy of driving decisions obtained in the related art, so that the automatic driving process is at risk of safety.
Disclosure of Invention
In view of this, the present disclosure provides a decision information generation method, apparatus, electronic device, storage medium, and program product.
One aspect of the present disclosure provides a decision generation method, comprising:
and acquiring real-time motion state information of the target vehicle, lane type information of a target area and real-time relative motion state information of a traffic participation object and the target vehicle in response to receiving a driving service request of the target vehicle. And processing the real-time motion state information and the lane type information by using the driving scene analysis model to obtain target driving scene information. And determining a target driving behavior decision model according to the target driving scene information. And processing the real-time motion state information and the real-time relative motion state information by using the target driving behavior decision model to obtain driving behavior decision information of the target vehicle.
According to an embodiment of the present disclosure, the real-time motion state information includes real-time driving direction information and real-time position information, and the processing of the real-time motion state information and lane type information using the driving scene analysis model to obtain target driving scene information includes:
position information of a target task point and position information of a scene switching point are determined from the target area. And obtaining the distance information between the real-time position and the scene switching point according to the real-time position information and the position information of the scene switching point. And inquiring the lane type information of the target task point from the lane type information of the target area according to the position information of the target task point. And processing the distance information, the real-time driving direction and the lane type information of the target task point by using the driving scene analysis model to obtain target driving scene information.
According to an embodiment of the present disclosure, processing distance information, a real-time driving direction, and lane type information of a target task point by using a driving scene analysis model to obtain target driving scene information includes:
and determining a plurality of candidate scene switching conditions according to the real-time driving direction and the lane type information of the target task point. And determining target driving scene information from the plurality of candidate scene switching conditions according to the distance information.
According to an embodiment of the present disclosure, processing real-time motion state information and real-time relative motion state information by using a target driving behavior decision model to obtain driving behavior decision information of a target vehicle includes:
and determining the target traffic participation object according to the target driving scene information and the real-time relative motion state information. And screening the real-time relative motion state information of the target traffic participation object and the target of the target vehicle from the real-time relative motion state information. And determining attribute information of the target motion state according to the target driving scene information. And screening the real-time motion state information of the target corresponding to the attribute information of the target motion state from the real-time motion state information of the target vehicle. And obtaining driving action decision information by utilizing the target driving action decision model to the target real-time motion state information and the target real-time relative motion state information.
According to an embodiment of the present disclosure, determining a target traffic participation object according to target driving scenario information and real-time relative motion state information includes:
and determining the relative position relation between the traffic participation object and the target vehicle according to the real-time relative motion state information. And determining the risk degree of the traffic participation object on the target vehicle in the target driving scene according to the relative position relation. And screening the target traffic participation objects from the traffic participation objects according to the risk degree.
According to an embodiment of the present disclosure, determining a risk level of a traffic participation object to a target vehicle in a target driving scene according to a relative positional relationship includes:
and obtaining the association relation between the traffic participation object and the driving track of the target vehicle according to the relative position relation. And determining the risk degree according to the association relation of the driving tracks.
According to an embodiment of the present disclosure, a training method of a target driving behavior decision model includes:
in the simulated driving scene, the motion state information of the test vehicle and the relative motion state information of the traffic participation object and the test vehicle are obtained. The driving motion information is randomly selected from the motion probability distribution space of the target vehicle. Based on a near-end strategy optimization algorithm, the driving action information, the motion state information of the test vehicle and the relative motion state information of the traffic participation object and the test vehicle are utilized to process and train the preset model, and a target driving behavior decision model is obtained.
According to an embodiment of the present disclosure, driving motion information includes S pieces of driving motion information, where S is an integer greater than 1, a preset model includes a policy network and a value network, and based on a near-end policy optimization algorithm, the preset model is trained by processing driving motion information, motion state information of a test vehicle, and relative motion state information of a traffic participation object and the test vehicle, to obtain a target driving behavior decision model, including:
And aiming at the S-th driving action information, processing the S-th driving action information, the motion state information of the test vehicle and the relative motion state information of the traffic participation object and the test vehicle by utilizing a strategy network to obtain S-th driving result information, wherein S is an integer greater than or equal to 1 and less than S. And processing the s driving action information and the s driving result information by using the value network to obtain action value information. And obtaining a strategy dominance value according to the driving state information and the action value information based on the objective function. And under the condition that the strategy dominance value does not meet the preset threshold value, adjusting model parameters of a preset model, returning to execute processing operation of utilizing the strategy network and processing operation of utilizing the value network, and increasing s. And under the condition that the strategy dominance value meets the preset threshold value, determining the s-th driving action information as target decision action information.
Another aspect of the present disclosure provides a decision information generating apparatus, including: the device comprises an acquisition module, an analysis module, a determination module and a generation module. The system comprises an acquisition module, a control module and a control module, wherein the acquisition module is used for acquiring real-time motion state information of a target vehicle, lane type information of a target area and real-time relative motion state information of a traffic participation object and the target vehicle in response to receiving a driving service request of the target vehicle. And the analysis module is used for processing the real-time motion state information and the lane type information by using a driving scene analysis model to obtain target driving scene information. And the determining module is used for determining a target driving behavior decision model according to the target driving scene information. The generation module is used for processing the real-time motion state information and the real-time relative motion state information by utilizing a target driving behavior decision model to obtain driving behavior decision information of the target vehicle.
According to an embodiment of the present disclosure, the real-time motion state information includes real-time driving direction information and real-time position information. The analysis module comprises: the device comprises a first determining unit, a first obtaining unit, a first inquiring unit and a second obtaining unit. The first determining unit is used for determining the position information of the target task point and the position information of the scene switching point from the target area. The first obtaining unit is used for obtaining the distance information between the real-time position and the scene switching point according to the real-time position information and the position information of the scene switching point. The first query unit is used for querying the lane type information of the target task point from the lane type information of the target area according to the position information of the target task point. The second obtaining unit is used for processing the distance information, the real-time driving direction and the lane type information of the target task point by using the driving scene analysis model to obtain target driving scene information.
According to an embodiment of the present disclosure, the second obtaining unit includes a first determining subunit and a second determining subunit. The first determining subunit is used for determining a plurality of candidate scene switching conditions according to the real-time driving direction and the lane type information of the target task point. And a second determination subunit configured to determine target driving scenario information from among the plurality of candidate scenario switching conditions according to the distance information.
According to an embodiment of the present disclosure, the generating module includes a second determining unit, a first screening unit, a third determining unit, a second screening unit, and a third obtaining unit. The second determining unit is used for determining the target traffic participation object according to the target driving scene information and the real-time relative motion state information. And the first screening unit is used for screening the target real-time relative motion state information of the target traffic participation object and the target vehicle from the real-time relative motion state information. A third determining unit, configured to determine attribute information of a target motion state according to the target driving scene information; and the second screening unit is used for screening the real-time motion state information of the target corresponding to the attribute information of the target motion state from the real-time motion state information of the target vehicle. And the third determining unit is used for processing the real-time target motion state information and the real-time target relative motion state information by utilizing the target driving behavior decision model to obtain driving action decision information.
According to an embodiment of the present disclosure, the second determining unit includes a third determining subunit, a fourth determining subunit, and a screening subunit. And the third determination subunit is used for determining the relative position relation between the traffic participation object and the target vehicle according to the real-time relative motion state information. And the fourth determination subunit is used for determining the risk degree of the traffic participation object on the target vehicle in the target driving scene according to the relative position relation. And the screening subunit is used for screening the target traffic participation objects from the traffic participation objects according to the risk degree.
According to an embodiment of the present disclosure, the fourth determining subunit is configured to obtain, according to the relative positional relationship, an association relationship between a traffic participation object and a driving track of a target vehicle; and determining the risk degree according to the association relation of the driving track.
According to an embodiment of the disclosure, the generation module comprises an acquisition unit and a training unit. The acquisition unit is used for acquiring the motion state information of the test vehicle and the relative motion state information of the traffic participation object and the test vehicle in the simulated driving scene. The training unit is used for training the preset model by processing the driving action information, the motion state information of the test vehicle and the relative motion state information of the traffic participation object and the test vehicle based on the near-end strategy optimization algorithm to obtain a target driving behavior decision model.
According to an embodiment of the present disclosure, the driving action information includes S driving action information, where S is an integer greater than 1, a policy network and a value network are included in a preset model, and the training unit includes: the device comprises a processing subunit, a first obtaining subunit, a second obtaining subunit and an adjusting subunit. The processing subunit is configured to process, according to the S-th driving action information, the motion state information of the test vehicle, and the relative motion state information of the traffic participation object and the test vehicle by using the policy network, to obtain S-th driving result information, where S is an integer greater than or equal to 1 and less than S. The first obtaining subunit is used for processing the s driving action information and the s driving result information by using the value network to obtain action value information. And the second obtaining subunit is used for obtaining the strategy dominance value according to the driving state information and the action value information based on the objective function. The adjusting subunit is configured to adjust the model parameters of the preset model, return to perform the processing operation of the utilization policy network and the processing operation of the utilization value network, and increment s when it is determined that the policy advantage value does not meet the predetermined threshold. And under the condition that the strategy dominance value meets the preset threshold value, determining the s-th driving action information as target decision action information.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions that, when executed, are configured to implement a method as described above.
Another aspect of the present disclosure provides a computer program product comprising computer executable instructions which, when executed, are for implementing a method as described above.
According to the embodiment of the disclosure, the technical means that the driving scene analysis model is utilized to determine the real-time motion state information and the lane type information, and then the target driving behavior decision model corresponding to the target driving scene is utilized to process the real-time motion state information and the real-time relative motion state information, so that the driving action decision information of the target vehicle is obtained is adopted, so that the technical problem that the decision accuracy is low due to the fact that the driving decision is carried out based on the static obstacle information in the related art is at least partially overcome, and the technical effect that the driving action of the target vehicle is dynamically decided by combining the dynamic real-time motion state of the target vehicle, the relative motion state information of traffic participation objects in the target vehicle and the lane type information of the target area is achieved, so that the decision accuracy is improved is achieved.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings in which:
FIG. 1 schematically illustrates an application scenario diagram in which the decision information generation method of the present disclosure may be applied;
FIG. 2 schematically illustrates a flow chart of a decision information generation method according to an embodiment of the disclosure;
FIG. 3 schematically illustrates an exemplary architecture in which decision information generation of embodiments of the present disclosure may be applied;
fig. 4 schematically illustrates a schematic diagram of driving scenario switching according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a training method flow diagram of a target behavior decision model in accordance with an embodiment of the disclosure;
FIG. 6 schematically illustrates a logic flow diagram of a training method for a target behavior decision model in accordance with an embodiment of the present disclosure;
fig. 7 schematically shows a block diagram of a decision information generating apparatus according to an embodiment of the disclosure; and
fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement a method of decision information generation according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In embodiments of the present disclosure, the collection, updating, analysis, processing, use, transmission, provision, disclosure, storage, etc., of the data involved (including, but not limited to, user personal information) all comply with relevant legal regulations, are used for legal purposes, and do not violate well-known. In particular, necessary measures are taken for personal information of the user, illegal access to personal information data of the user is prevented, and personal information security, network security and national security of the user are maintained.
In embodiments of the present disclosure, the user's authorization or consent is obtained before the user's personal information is obtained or collected.
In the related art, a behavior decision model of an automatic driving vehicle generally determines a position of a static obstacle based on map information on a planned path and an image acquired by the vehicle, and the static obstacle may include a green belt, a traffic sign, a traffic light, a roadblock, and the like. Based on a predetermined rule, a decision of driving action is made according to the distance between the vehicle and the static obstacle.
However, in actual driving situations, not only are complex but also are numerous, for example: intersection driving scenes, driving scenes where lanes merge and merge, driving scenes where lanes diverge, driving scenes in common lanes, and the like. In situations where the target vehicle is in different driving scenarios, the facing traffic participants may be dynamically changing, which has a significant impact on the driving decisions of the target vehicle. Therefore, in a complex driving scene, the decision method for performing the driving action based on the preset rule is difficult to obtain a more accurate decision, so that the target vehicle has safety risks such as collision with other traffic participants in the driving scene.
In view of this, embodiments of the present disclosure provide a method for generating decision information. The method includes acquiring real-time motion state information of a target vehicle, lane type information of a target area, and real-time relative motion state information of a traffic participant and the target vehicle in response to receiving a driving service request of the target vehicle. And processing the real-time motion state information and the lane type information by using the driving scene analysis model to obtain target driving scene information. And determining a target driving behavior decision model according to the target driving scene information. And processing the real-time motion state information and the real-time relative motion state information by using the target driving behavior decision model to obtain driving behavior decision information of the target vehicle.
Fig. 1 schematically illustrates an application scenario diagram to which the decision information generation method of the present disclosure may be applied.
It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.
As shown in fig. 1, a system architecture 100 according to this embodiment may include an in-vehicle terminal 101, a network 102, and a server 103 on a target vehicle. The network 102 is the medium used to provide a communication link between the target vehicle 101 and the server 103. Network 102 may include various connection types, such as wireless communication links, and the like.
The user can interact with the server 103 through the network 102 using the in-vehicle terminal 101 on the target vehicle to receive or transmit a message or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients and/or social platform software, etc., may be installed on the in-vehicle terminal 101 on the target vehicle (as examples only).
The in-vehicle terminal 101 on the target vehicle may be various electronic devices having a display screen and supporting web browsing, including but not limited to a tablet computer.
The server 103 may be a server that provides various services, such as a background management server (merely an example) that provides support for a user to browse a website using the in-vehicle terminal 101 on the target vehicle. The background management server may analyze and process the received data such as the user driving service request, and feed back the processing result (e.g., a web page, information, or data acquired or generated according to the user request) to the terminal device vehicle-mounted terminal.
It should be noted that, the method for generating decision information provided by the embodiments of the present disclosure may be generally performed by the server 103. Accordingly, the decision information generating apparatus provided by the embodiments of the present disclosure may be generally provided in the server 103. The decision information generation method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 103 and is capable of communicating with the in-vehicle terminal 101 and/or the server 103. Accordingly, the decision information generating apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 103 and is capable of communicating with the in-vehicle terminal 101 and/or the server 103. Alternatively, the decision information generating method provided by the embodiment of the present disclosure may also be performed by the in-vehicle terminal 101, or may also be performed by another terminal device different from the in-vehicle terminal 101. Accordingly, the decision information generating apparatus provided in the embodiments of the present disclosure may also be provided in the vehicle-mounted terminal 101, or in other terminal devices different from the vehicle-mounted terminal 101.
For example, the target vehicle may acquire a real-time motion state of the target vehicle, a real-time relative motion state of the traffic participation object and the target vehicle through an on-vehicle sensor, and lane information of the target area may be stored in the on-vehicle terminal 101. Then, the decision information generating method provided in the embodiment of the present disclosure may be executed locally in the in-vehicle terminal 101, or the real-time motion state of the target vehicle, the real-time relative motion state of the traffic participation object and the target vehicle, the lane information of the target area may be transmitted to other terminal devices, servers, or server clusters, and the decision information generating method provided in the embodiment of the present disclosure may be executed by other terminal devices, servers, or server clusters that receive the real-time motion state of the target vehicle, the real-time relative motion state of the traffic participation object and the target vehicle, the lane information of the target area.
It should be understood that the number of in-vehicle terminals, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically illustrates a flowchart of a decision information generation method according to an embodiment of the present disclosure.
As shown in fig. 2, the method 200 includes operations S210 to S240.
In response to receiving a driving service request of the target vehicle, real-time motion state information of the target vehicle, lane type information of the target area, and real-time relative motion state information of the traffic participant and the target vehicle are acquired in operation S210.
In operation S220, the real-time motion state information and the lane type information are processed using the driving scene analysis model to obtain target driving scene information.
In operation S230, a target driving behavior decision model is determined according to the target driving scenario information.
In operation S240, the real-time motion state information and the real-time relative motion state information are processed using the target driving behavior decision model to obtain driving behavior decision information of the target vehicle.
According to an embodiment of the present disclosure, the real-time motion state information of the target vehicle may include real-time traveling direction information and real-time position information of the target vehicle. The real-time traveling direction information may be heading angle information of the traveling direction of the target vehicle.
According to an embodiment of the present disclosure, the target area may be a map area between a real-time location of the target vehicle and a location of the destination. The lane type information may include lane merging, lane splitting.
According to embodiments of the present disclosure, the traffic participant may be another vehicle, a pedestrian, or the like in the target area other than the target vehicle. The real-time relative motion state information with the target vehicle may include: relative speeds, relative heading angles, relative accelerations, etc. of other vehicles and the target vehicle.
According to an embodiment of the present disclosure, the driving scenario analysis model may be a finite state machine FSM (Finite State Machine) driving scenario transition model. The driving scenario may be determined based on the state transition trigger condition.
For example: the state transition triggering condition may be that the real-time motion direction of the target vehicle is that the vehicle faces the intersection, and the distance between the target vehicle and the next intersection is greater than the set threshold D1, and it may be determined that the driving scene is the driving scene of the common road.
In the embodiment of the disclosure, the driving scene may include a driving scene at a lane separation, a driving scene at a lane merge and a driving scene at an intersection, and a driving scene of a general road. According to different driving scenes, different behavior decision models can be trained respectively. When a certain driving scene is triggered by the target vehicle, the driving environment state information analysis can be switched to the real-time driving environment state information analysis by the driving behavior decision model corresponding to the driving scene, so as to obtain the driving action decision information of the target vehicle.
Fig. 3 schematically illustrates an exemplary architecture in which the decision information generation scheme of the embodiments of the present disclosure may be applied.
As shown in fig. 3, a driving scenario analysis model 301, a driving behavior decision model set 302, and a target driving behavior decision model 303 may be included in this embodiment 300. Driving behavior decision models corresponding to different driving scenarios may be included in the set of behavior decision models. For example: a driving behavior decision model M1 (302-1) corresponding to the driving scene M1, a driving behavior decision model M2 (302-2) corresponding to the driving scene M2, and a driving behavior decision model M3 (302-3) corresponding to the driving scene M3.
According to the embodiment of the present disclosure, the driving scene analysis model 301 may obtain the target driving scene according to the real-time running state of the target vehicle and the lane type information of the target area. The target driving behavior decision model 303 is queried from the behavior decision model set 302 according to the target driving scene. For example: the target driving scenario is M1, and the target driving behavior decision model 303 may be the driving behavior decision model M1 (302-1) in the behavior decision model set 302.
The driving action decision information of the target vehicle is obtained by using the real-time running state of the target vehicle and the real-time relative motion state information of the traffic participation object and the target vehicle by the target driving action model 303.
According to the embodiment of the disclosure, the technical means that the driving scene analysis model is utilized to determine the real-time motion state information and the lane type information, and then the target driving behavior decision model corresponding to the target driving scene is utilized to process the real-time motion state information and the real-time relative motion state information, so that the driving action decision information of the target vehicle is obtained is adopted, so that the technical problem that the decision accuracy is low due to the fact that the driving decision is carried out based on the static obstacle information in the related art is at least partially overcome, and the technical effect that the driving action of the target vehicle is dynamically decided by combining the dynamic real-time motion state of the target vehicle, the relative motion state information of traffic participation objects in the target vehicle and the lane type information of the target area is achieved, so that the decision accuracy is improved is achieved.
According to an embodiment of the present disclosure, processing real-time motion state information and lane type information using a driving scene analysis model to obtain target driving scene information may include the following operations:
position information of a target task point and position information of a scene switching point are determined from the target area. And obtaining the distance information between the real-time position and the scene switching point according to the real-time position information and the position information of the scene switching point. And inquiring the lane type information of the target task point from the lane type information of the target area according to the position information of the target task point. And processing the distance information, the real-time driving direction and the lane type information of the target task point by using the driving scene analysis model to obtain target driving scene information.
According to embodiments of the present disclosure, the target task point may be at any position on the lane travel path within the target area. In order to improve the accuracy of the driving behavior decision of the target vehicle, a plurality of target task points may be provided.
According to an embodiment of the present disclosure, the scene-switching point may be a position point for condition determination in the scene-switching condition. For example: in the scene switching condition in which the driving scene of the ordinary road is switched to the driving scene of the intersection, the scene switching point may be the next intersection closest to the current position of the target vehicle on the driving path. When the movement direction of the target vehicle is toward the next intersection and the distance between the target vehicle and the next intersection is smaller than a certain predetermined threshold value, it may be determined that the driving scene is switched to the intersection driving scene.
According to embodiments of the present disclosure, the real-time motion state information may include real-time driving direction information and real-time position information. The lane type information of the target task point may include: lane merging, lane splitting, etc.
According to an embodiment of the present disclosure, processing distance information, a real-time driving direction, and lane type information of a target task point by using a driving scene analysis model to obtain target driving scene information may include the following operations:
And determining a plurality of candidate scene switching conditions according to the real-time driving direction and the lane type information of the target task point. And determining target driving scene information from the plurality of candidate scene switching conditions according to the distance information.
Fig. 4 schematically shows a schematic diagram of driving scenario switching according to an embodiment of the present disclosure.
As shown in fig. 4, in the embodiment 400, when the real-time motion state of the target vehicle and the lane type information of the target area satisfy the scene switching condition A1, it is indicated that the current driving scene is a normal road scene.
When the target vehicle runs in the normal road scene, the real-time motion state of the target vehicle and the lane type information of the target area meet the scene switching condition A2, the normal road scene is switched to the lane separation scene. When the real-time motion state of the target vehicle and the lane type information of the target area meet the scene switching condition A7, switching from a common road scene to a lane merging scene; when the real-time motion state of the target vehicle and the lane type information of the target area meet the scene switching condition A5, the common road scene is switched to the intersection scene.
When the target vehicle runs in the lane separation scene, the real-time motion state of the target vehicle and the lane type information of the target area meet the scene switching condition A3, the lane separation scene is switched to the common road scene.
When the target vehicle runs in the lane merging scene, the real-time motion state of the target vehicle and the lane type information of the target area meet the scene switching condition A6, and the lane merging scene is switched into the common road scene.
When the target vehicle runs in the intersection scene, the real-time motion state of the target vehicle and the lane type information of the target area meet the scene switching condition A4, and the intersection scene is switched into the common road scene.
According to an embodiment of the present disclosure, the scene switching condition may be preset to the driving scene analysis model.
For example: the scene change condition A1 may be: the real-time movement direction of the target vehicle is toward the next intersection, and the distance between the real-time position of the target vehicle and the next intersection is greater than a predetermined threshold D1.
The scene change condition A2 may be: the lane type of the next task point of the target vehicle is lane separation, the real-time movement direction of the target vehicle faces the lane separation point, and the distance between the real-time position of the target vehicle and the lane separation point is smaller than a preset threshold D1.
The scene change condition A3 may be: the lane type of the last task point of the target vehicle is lane separation, the real-time movement direction of the target vehicle is far away from the lane separation point, the distance between the real-time position of the target vehicle and the lane separation point is larger than a preset threshold D2, and the included angle between the real-time movement direction of the target vehicle and the direction of the next section of lane line is smaller than a preset angle.
The scene change condition A4 may be: the real-time moving direction of the target vehicle is far away from the intersection, and the real-time position of the target vehicle is greater than a preset threshold D2 from the exit point of the current intersection.
The scene change condition A5 may be: the real-time moving direction of the target vehicle faces the intersection, and the real-time position of the target vehicle is away from the junction point of the next intersection by a preset threshold D1.
The scene change condition A6 may be: the lane type of the last task point of the target vehicle is lane merging, the real-time movement direction of the target vehicle is far away from the lane merging point, the distance between the position of the target vehicle and the lane merging point is larger than a preset threshold D2, and the included angle between the real-time movement direction of the target vehicle and the direction of the next section of lane line is smaller than a preset angle.
The scene change condition A7 may be: the next task point of the target vehicle is lane merging, and the distance between the real-time position of the target vehicle and the lane merging point is smaller than a predetermined threshold D1.
According to the practical example of the present disclosure, the driving scene analysis model can be used to switch the driving scene of the target vehicle in real time, so as to make driving behavior decisions for different scenes, and improve the decision accuracy.
According to an embodiment of the present disclosure, processing real-time motion state information and real-time relative motion state information by using a target driving behavior decision model to obtain driving behavior decision information of a target vehicle may include the following operations:
And determining the target traffic participation object according to the target driving scene information and the real-time relative motion state information. And screening the real-time relative motion state information of the target traffic participation object and the target of the target vehicle from the real-time relative motion state information. And determining attribute information of the target motion state according to the target driving scene information. And screening the real-time motion state information of the target corresponding to the attribute information of the target motion state from the real-time motion state information of the target vehicle. And processing the real-time target motion state information and the real-time target relative motion state information by using the target driving behavior decision model to obtain driving action decision information.
According to embodiments of the present disclosure, due to the different driving scenarios, the traffic participation objects that need to be considered in making the driving action decisions may also be different. For example: upon entering the lane merge scenario, the traffic participant may be a vehicle in a lane adjacent to the target vehicle, as well as a vehicle in a lane in which the target vehicle is currently located. Upon entering an intersection scene, the traffic participant may be a vehicle that may have driving conflict with the target vehicle in the intersection area, for example: vehicles entering the intersection from the opposite lane.
For example: in a lane merge or lane split scenario, the target traffic participant may be a vehicle that is located within the same lane as the target vehicle and a vehicle that is located within a lane adjacent to the target vehicle. At this time, the target real-time relative motion state information of the target traffic participation object and the target vehicle may include: relative speed, relative heading angle, relative acceleration, relative position, time of collision, relative distance from the point of entry or exit. The attributes of the target motion state may include: acceleration of the target vehicle, steering wheel angle, heading angle, relative position to the destination, heading angle error from the forward path point, distance from the current lane centerline, distance from the ingress or egress point.
According to the embodiment of the disclosure, the screened target real-time relative motion state information and the target real-time motion state information can be processed by using the driving behavior decision model corresponding to the lane merging scene to obtain the target decision action information.
According to the embodiment of the disclosure, according to different driving scenes, the motion state attributes of the traffic participation object to be observed and the target vehicle to be observed are screened out, the target real-time relative motion state of the traffic participation object and the target vehicle is obtained, and the target real-time motion state can pertinently obtain driving action decision information suitable for the driving scenes based on the characteristics of the real-time driving states of the traffic participation object and the target vehicle in different driving scenes.
According to an embodiment of the present disclosure, determining a target traffic participation object according to target driving scenario information and real-time relative motion state information may include the following operations:
and determining the relative position relation between the traffic participation object and the target vehicle according to the real-time relative motion state information. And determining the risk degree of the traffic participation object on the target vehicle in the target driving scene according to the relative position relation. And screening the target traffic participation objects from the traffic participation objects according to the risk degree.
According to the embodiment of the disclosure, the intersection scene is a complex scene in the actual driving scene, the number of traffic participation objects to be observed is also large, and dynamic changes of relative positions between the traffic participation objects and the target vehicles are also diversified. Therefore, when making a driving action decision in the face of an intersection scene, it is necessary to consider the relative positional relationship between the traffic participation object and the target vehicle to determine whether or not the traffic participation object is at risk for the travel process of the target vehicle.
According to an embodiment of the present disclosure, the relative positional relationship of the traffic participant and the target vehicle may include: merging into the same lane position; the movement trajectories are expected to intersect, the traffic participant is located in an adjacent lane of the target vehicle, the traffic participant is located in the current lane of the target vehicle, and so on.
For example: the movement tracks of the traffic participation object and the target vehicle are expected to intersect, the relative speed of the traffic participation object and the target vehicle is high, the condition that the traffic participation object causes a high degree of safety risk to the target vehicle in an intersection scene can be determined, and the risk degree can be determined according to a preset judging rule based on the distance between the relative positions of the traffic participation object and the target vehicle. When the risk level is higher than the preset risk threshold value, the traffic participation object is determined to be a target traffic participation object which needs to be focused during the process of driving the target vehicle in the crossing scene.
According to an embodiment of the present disclosure, determining a risk level of a traffic participation object to a target vehicle in a target driving scene according to a relative positional relationship may include the following operations:
and obtaining the association relation between the traffic participation object and the driving track of the target vehicle according to the relative position relation. And determining the risk degree according to the association relation of the driving tracks.
According to embodiments of the present disclosure, the association relationship of the driving trajectories may include whether there is an association, for example: when the traffic participation object and the target vehicle are in opposite lanes, the driving tracks of the traffic participation object and the target vehicle can collide, and the driving tracks are related. The degree of risk may be determined based on a situation of driving trajectory association between the traffic participant and the target vehicle. For example: the driving tracks of the two are opposite to each other, which means that the two can collide, and the corresponding risk degree is higher. The driving tracks of the two are coincident in the same direction, which means that the two may be merged into the same lane, and the risk degree is lower than that of the opposite coincidence of the driving tracks although the risk exists.
According to the embodiment of the disclosure, through the relative position relation between the traffic participation object and the target vehicle, the risk degree of the traffic participation object on the target vehicle in the current driving scene can be determined, so that the traffic participation object with higher risk degree is screened out and used as the traffic participation object which needs to be concerned when the driving behavior decision model makes a decision. The redundant information is screened, and the accuracy of model decision is improved.
The target driving behavior decision model may be trained based on a reinforcement learning algorithm. However, in the conventional reinforcement learning algorithm, for example: the Q-learning algorithm may output an expected driving motion by inputting a driving motion and a driving environment state. However, the conventional reinforcement learning algorithm has the defects that the strategy to be updated and the strategy used for interacting with the environment are the same strategy, the data is single, the algorithm is easy to fall into the dead loop of the bad strategy, the bad data and the worse strategy, and the training process is difficult to converge. Thus, the embodiments of the present disclosure employ a near-end policy optimization algorithm (PPO: proximal Policy Optimization) to address the slow convergence problem by limiting the policy update magnitudes.
Fig. 5 schematically illustrates a training method of a target driving behavior decision model according to an embodiment of the present disclosure.
As shown in fig. 5, the method 500 may include operations S510-S530.
In operation S510, in a simulated driving scenario, movement state information of a test vehicle, and relative movement state information of a traffic participant and the test vehicle are acquired.
In operation S520, driving motion information is randomly selected from the motion probability distribution space of the target vehicle.
In operation S530, the preset model is trained based on the near-end policy optimization algorithm by processing the driving action information, the motion state information of the test vehicle, and the relative motion state information of the traffic participant and the test vehicle, so as to obtain the target driving behavior decision model.
According to an embodiment of the present disclosure, the simulated driving scenario may be a scenario that is the same as the target driving scenario obtained by simulation in the simulated driving test environment, for model training. In the embodiment of the present disclosure, the actual driving scene is mainly divided into the following four scenes: a common road scene, a lane merging scene, a lane separation scene and an intersection scene. Aiming at each driving scene, a target behavior decision model is obtained according to the training method provided by the embodiment of the disclosure.
According to an embodiment of the disclosure, the driving action information includes S driving action information, where S is an integer greater than 1, and the preset model includes a policy network and a value network.
Fig. 6 schematically illustrates a logic flow diagram of a training method of a target driving behavior decision model according to an embodiment of the disclosure.
As shown in fig. 6, operations S610 to S650 may be included in the embodiment 600.
In operation S610, the S-th driving action information, the movement state information of the test vehicle, and the relative movement state information of the traffic participant and the test vehicle are processed by using the policy network for the S-th driving action information to obtain S-th driving result information, where S is an integer greater than or equal to 1 and less than S.
In operation S620, the S-th driving action information and the S-th driving result information are processed using the value network to obtain action value information.
In operation S630, a policy advantage value is obtained from the driving state information and the action value information based on the objective function.
In operation S640, it is determined whether the policy advantage value satisfies a preset threshold. If yes, operation S650 is performed, if not, the model parameters are adjusted, S is incremented, and operation S610 is performed.
In operation S650, in case it is determined that the policy advantage value satisfies the predetermined threshold value, it is determined that the S-th driving motion information is the target decision motion information.
According to an embodiment of the present disclosure, the objective function may be a function in the PPO algorithm for limiting the policy update amplitude. The update amplitude of the policy output space can be controlled by measuring the proportional term of the difference between the policy to be updated and the sampling policy.
According to the embodiment of the disclosure, the value network is utilized to process the s-th driving action information and the s-th driving result information, and action value information can be obtained based on the value function of the Q-learning algorithm. The reinforcement learning method based on the Q-learning algorithm is a relatively mature technology and is not described in detail herein.
It should be noted that, when the reinforcement learning method of the Q-learning algorithm is applied, the embodiments of the present disclosure may design the reinforcement learning reward function from three aspects of safety, smoothness of driving, and compliance with traffic regulations of the target vehicle.
For example: the reward function may be as shown in equation (1):
r=r c +r ttc +r o +r goal +r reach +r steering +r lane +r speed (1):
r c is a punishment term for testing the collision of vehicles, if collision r occurs c = -50, otherwise r c =0;r ttc =∑0.01*(now_ttc i -last_ttc i ) Is a bonus item for guiding the test vehicle to pull away from other vehicles, not_ttc i TTC (time to collision ) last_ttc of current frame test vehicle and ith vehicle i TTC of the last frame of test vehicle and the ith vehicle, and punishment is obtained if the collision time is increased and the test vehicle obtains rewards otherwise; r is (r) o Is a punishment item for testing the condition of vehicles driving out of the road, if driving out of the road r o = -50, otherwise r o =0. All three are designed to ensure the safety of the test vehicle.
r goal =-0.01*d goal Inversely proportional to the distance of the test vehicle from the target point, the greater the distance, the less rewards are obtained; r is (r) reach Is a bonus item for testing the arrival of the vehicle at the target point, if the target point r is reached reach =50, otherwise r reach =0. Both of these are designed to ensure that the test vehicle can reach the target point.
r steering =-((speed-60)*steering*k) 2 This term only works when the test vehicle speed exceeds 60km/h, which is inversely proportional to the steering wheel angle, the larger the angle, the smaller the prize. This is designed to ensure ride comfort.
r lane =-0.01*d lane The larger the distance, the smaller the rewards are, which is inversely proportional to the lateral distance of the test vehicle from the center line of the current lane, and the test vehicle is guided to run along the center line of the lane as much as possible; r is (r) speed -0.01 x (speed-speed limit), which will only work when the test vehicle speed exceeds the road speed limit, the more the road speed limit is exceeded, the less the prize. Both are designed to comply with traffic regulations.
The embodiment of the disclosure designs the action space of the test vehicle as a continuous action space, namely the output of the target driving behavior decision model is [ steering wheel corner, braking force and accelerator opening ]. Accordingly, a continuous type of PPO algorithm should be selected, in which case the output of the target driving behavior model is finally transformed into the range of [ -1,1] via the tanh function.
According to the embodiment of the disclosure, the strategy updating amplitude in the reinforcement learning training process is limited by combining the near-end strategy optimization algorithm with the traditional Q-learning algorithm, so that the model reaches the convergence condition faster, the model precision is higher, and the model training efficiency can be improved.
Fig. 7 schematically shows a block diagram of a decision information generating apparatus according to an embodiment of the disclosure.
As shown in fig. 7, the decision information generating apparatus 700 of this embodiment may include: an acquisition module 710, an analysis module 720, a determination module 730, and a generation module 740.
And the acquiring module 710 is configured to acquire real-time motion state information of a target vehicle, lane type information of a target area, and real-time relative motion state information of a traffic participation object and the target vehicle in response to receiving a driving service request of the target vehicle.
The analysis module 720 is configured to process the real-time motion state information and the lane type information by using a driving scene analysis model, so as to obtain target driving scene information.
And the determining module 730 is configured to determine a target driving behavior decision model according to the target driving scenario information.
The generating module 740 is configured to process the real-time motion state information and the real-time relative motion state information by using a target driving behavior decision model, so as to obtain driving behavior decision information of the target vehicle.
According to an embodiment of the present disclosure, the real-time motion state information includes real-time driving direction information and real-time position information. The analysis module comprises: the device comprises a first determining unit, a first obtaining unit, a first inquiring unit and a second obtaining unit. The first determining unit is used for determining the position information of the target task point and the position information of the scene switching point from the target area. The first obtaining unit is used for obtaining the distance information between the real-time position and the scene switching point according to the real-time position information and the position information of the scene switching point. The first query unit is used for querying the lane type information of the target task point from the lane type information of the target area according to the position information of the target task point. The second obtaining unit is used for processing the distance information, the real-time driving direction and the lane type information of the target task point by using the driving scene analysis model to obtain target driving scene information.
According to an embodiment of the present disclosure, the second obtaining unit includes a first determining subunit and a second determining subunit. The first determining subunit is used for determining a plurality of candidate scene switching conditions according to the real-time driving direction and the lane type information of the target task point. And a second determination subunit configured to determine target driving scenario information from among the plurality of candidate scenario switching conditions according to the distance information.
According to an embodiment of the present disclosure, the generating module includes a second determining unit, a first screening unit, a third determining unit, a second screening unit, and a third obtaining unit. The second determining unit is used for determining the target traffic participation object according to the target driving scene information and the real-time relative motion state information. And the first screening unit is used for screening the target real-time relative motion state information of the target traffic participation object and the target vehicle from the real-time relative motion state information. A third determining unit, configured to determine attribute information of a target motion state according to the target driving scene information; and the second screening unit is used for screening the real-time motion state information of the target corresponding to the attribute information of the target motion state from the real-time motion state information of the target vehicle. And the third determining unit is used for processing the real-time target motion state information and the real-time target relative motion state information by utilizing the target driving behavior decision model to obtain driving action decision information.
According to an embodiment of the present disclosure, the second determining unit includes a third determining subunit, a fourth determining subunit, and a screening subunit. And the third determination subunit is used for determining the relative position relation between the traffic participation object and the target vehicle according to the real-time relative motion state information. And the fourth determination subunit is used for determining the risk degree of the traffic participation object on the target vehicle in the target driving scene according to the relative position relation. And the screening subunit is used for screening the target traffic participation objects from the traffic participation objects according to the risk degree.
According to an embodiment of the present disclosure, the fourth determining subunit is configured to obtain, according to the relative positional relationship, an association relationship between a traffic participation object and a driving track of a target vehicle; and determining the risk degree according to the association relation of the driving track.
According to an embodiment of the disclosure, the generation module comprises an acquisition unit and a training unit. The acquisition unit is used for acquiring the motion state information of the test vehicle and the relative motion state information of the traffic participation object and the test vehicle in the simulated driving scene. The training unit is used for training the preset model by processing the driving action information, the motion state information of the test vehicle and the relative motion state information of the traffic participation object and the test vehicle based on the near-end strategy optimization algorithm to obtain a target driving behavior decision model.
According to an embodiment of the present disclosure, the driving action information includes S driving action information, where S is an integer greater than 1, a policy network and a value network are included in a preset model, and the training unit includes: the device comprises a processing subunit, a first obtaining subunit, a second obtaining subunit and an adjusting subunit. The processing subunit is configured to process, according to the S-th driving action information, the motion state information of the test vehicle, and the relative motion state information of the traffic participation object and the test vehicle by using the policy network, to obtain S-th driving result information, where S is an integer greater than or equal to 1 and less than S. The first obtaining subunit is used for processing the s driving action information and the s driving result information by using the value network to obtain action value information. And the second obtaining subunit is used for obtaining the strategy dominance value according to the driving state information and the action value information based on the objective function. The adjusting subunit is configured to adjust the model parameters of the preset model, return to perform the processing operation of the utilization policy network and the processing operation of the utilization value network, and increment s when it is determined that the policy advantage value does not meet the predetermined threshold. And under the condition that the strategy dominance value meets the preset threshold value, determining the s-th driving action information as target decision action information.
Any number of modules, sub-modules, units, sub-units, or at least some of the functionality of any number of the sub-units according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented as split into multiple modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or in any other reasonable manner of hardware or firmware that integrates or encapsulates the circuit, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be at least partially implemented as computer program modules, which when executed, may perform the corresponding functions.
For example, any of the acquisition module 710, the analysis module 720, the determination module 730, and the generation module 740 may be combined in one module/unit/sub-unit or any of the modules/units/sub-units may be split into multiple modules/units/sub-units. Alternatively, at least some of the functionality of one or more of these modules/units/sub-units may be combined with at least some of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. At least one of the acquisition module 710, the analysis module 720, the determination module 730, and the generation module 740 according to embodiments of the present disclosure may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware, such as any other reasonable way of integrating or packaging circuitry, or in any one of or a suitable combination of any of the three. Alternatively, at least one of the acquisition module 710, the analysis module 720, the determination module 730, and the generation module 740 may be at least partially implemented as computer program modules, which when executed, may perform the respective functions.
It should be noted that, in the embodiment of the present disclosure, the decision information generating device corresponds to the decision information generating method in the embodiment of the present disclosure, and the description of the decision information generating device specifically refers to the data processing method, which is not described herein.
Fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement the above-described method according to an embodiment of the present disclosure. The electronic device shown in fig. 8 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 8, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 801 may also include on-board memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the disclosure.
In the RAM 803, various programs and data required for the operation of the electronic device 800 are stored. The processor 801, the ROM802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM802 and/or the RAM 803. Note that the program may be stored in one or more memories other than the ROM802 and the RAM 803. The processor 801 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, the electronic device 800 may also include an input/output (I/O) interface 805, the input/output (I/O) interface 805 also being connected to the bus 804. The system 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.
According to embodiments of the present disclosure, the method flow according to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 802 and/or RAM 803 and/or one or more memories other than ROM 802 and RAM 803 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program comprising program code for performing the methods provided by the embodiments of the present disclosure, when the computer program product is run on an electronic device, for causing the electronic device to carry out the methods provided by the embodiments of the present disclosure.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, and/or from a removable medium 811 via a communication portion 809. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (12)

1. A decision information generation method, comprising:
in response to receiving a driving service request of a target vehicle, acquiring real-time motion state information of the target vehicle, lane type information of a target area and real-time relative motion state information of a traffic participation object and the target vehicle;
processing the real-time motion state information and the lane type information by using a driving scene analysis model to obtain target driving scene information; and
determining a target driving behavior decision model according to the target driving scene information;
and processing the real-time motion state information and the real-time relative motion state information by using a target driving behavior decision model to obtain driving behavior decision information of the target vehicle.
2. The method of claim 1, wherein the real-time motion state information includes real-time driving direction information and real-time position information, the processing the real-time motion state information and the lane type information using a driving scene analysis model to obtain target driving scene information includes:
determining the position information of a target task point and the position information of a scene switching point from the target area;
obtaining distance information between the real-time position and the scene switching point according to the real-time position information and the position information of the scene switching point;
inquiring the lane type information of the target task point from the lane type information of the target area according to the position information of the target task point; and
and processing the distance information, the real-time driving direction and the lane type information of the target task point by using a driving scene analysis model to obtain the target driving scene information.
3. The method of claim 2, wherein the processing the distance information, the real-time driving direction, and lane type information of the target task point using a driving scenario analysis model to obtain the target driving scenario information comprises:
Determining a plurality of candidate scene switching conditions according to the real-time driving direction and the lane type information of the target task point; and
and determining target driving scene information from the plurality of candidate scene switching conditions according to the distance information.
4. The method of claim 1, wherein the processing the real-time motion state information and the real-time relative motion state information using a target driving behavior decision model to obtain driving behavior decision information of the target vehicle comprises:
determining a target traffic participation object according to the target driving scene information and the real-time relative motion state information;
screening the real-time relative motion state information of the target traffic participation object and the target vehicle from the real-time relative motion state information;
determining attribute information of a target motion state according to the target driving scene information
Screening target real-time motion state information corresponding to the attribute information of the target motion state from the real-time motion state information of the target vehicle; and
and processing the target real-time motion state information and the target real-time relative motion state information by using the target driving behavior decision model to obtain the driving action decision information.
5. The method of claim 4, wherein the determining a target traffic participant based on the target driving scenario information and the real-time relative motion state information comprises:
determining the relative position relation between the traffic participation object and the target vehicle according to the real-time relative motion state information;
determining the risk degree of the traffic participation object to the target vehicle in a target driving scene according to the relative position relation; and
and screening the target traffic participation objects from the traffic participation objects according to the risk degree.
6. The method of claim 4, wherein the determining the degree of risk of the traffic participant to the target vehicle in the target driving scenario according to the relative positional relationship comprises:
obtaining an association relation between the traffic participation object and the driving track of the target vehicle according to the relative position relation; and
and determining the risk degree according to the association relation of the driving tracks.
7. The method of claim 1, wherein the training method of the target driving behavior decision model comprises:
in a simulated driving scene, acquiring motion state information of a test vehicle and relative motion state information of a traffic participation object and the test vehicle;
Randomly selecting driving action information from an action probability distribution space of a target vehicle; and
and processing and training a preset model by using the driving action information, the motion state information of the test vehicle and the relative motion state information of the traffic participation object and the test vehicle based on a near-end strategy optimization algorithm to obtain the target driving behavior decision model.
8. The method according to claim 7, wherein the driving action information includes S pieces of driving action information, S is an integer greater than 1, the preset model includes a strategy network and a value network, the near-end strategy optimization algorithm is used for processing the preset model to train by using the driving action information, the motion state information of the test vehicle, and the relative motion state information of the traffic participation object and the test vehicle, so as to obtain the target driving behavior decision model, and the method includes:
aiming at the S-th driving action information, processing the S-th driving action information, the motion state information of a test vehicle and the relative motion state information of a traffic participation object and the test vehicle by utilizing a strategy network to obtain S-th driving result information, wherein S is an integer greater than or equal to 1 and less than S;
Processing the s driving action information and the s driving result information by using a value network to obtain action value information;
based on an objective function, obtaining a strategy dominance value according to the driving state information and the action value information;
under the condition that the strategy dominance value does not meet the preset threshold value, adjusting model parameters of the preset model, returning to execute processing operation of utilizing the strategy network and processing operation of utilizing the value network, and increasing s; and
and under the condition that the strategy dominance value meets a preset threshold value, determining the s-th driving action information as target decision action information.
9. A decision information generating apparatus comprising:
the system comprises an acquisition module, a control module and a control module, wherein the acquisition module is used for acquiring real-time motion state information of a target vehicle, lane type information of a target area and real-time relative motion state information of a traffic participation object and the target vehicle in response to receiving a driving service request of the target vehicle;
the analysis module is used for processing the real-time motion state information and the lane type information by using a driving scene analysis model to obtain target driving scene information;
the determining module is used for determining a target driving behavior decision model according to the target driving scene information; and
The generation module is used for processing the real-time motion state information and the real-time relative motion state information by utilizing a target driving behavior decision model to obtain driving behavior decision information of the target vehicle.
10. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1 to 8.
11. A computer readable storage medium having stored thereon executable instructions which when executed by a processor cause the processor to implement the method of any of claims 1 to 8.
12. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-8.
CN202310249192.9A 2023-03-15 2023-03-15 Decision information generation method and device, electronic equipment and storage medium Pending CN116279580A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310249192.9A CN116279580A (en) 2023-03-15 2023-03-15 Decision information generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310249192.9A CN116279580A (en) 2023-03-15 2023-03-15 Decision information generation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116279580A true CN116279580A (en) 2023-06-23

Family

ID=86833819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310249192.9A Pending CN116279580A (en) 2023-03-15 2023-03-15 Decision information generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116279580A (en)

Similar Documents

Publication Publication Date Title
JP7406215B2 (en) Orientation adjustment actions for autonomous vehicle motion management
CN110796856B (en) Vehicle lane change intention prediction method and training method of lane change intention prediction network
CN110603497B (en) Autonomous vehicle and method of autonomous vehicle operation management control
CN110431037B (en) Autonomous vehicle operation management including application of partially observable Markov decision process model examples
CN112133089B (en) Vehicle track prediction method, system and device based on surrounding environment and behavior intention
US10896122B2 (en) Using divergence to conduct log-based simulations
CN109697875B (en) Method and device for planning driving track
CN110562258B (en) Method for vehicle automatic lane change decision, vehicle-mounted equipment and storage medium
CN110843789B (en) Vehicle lane change intention prediction method based on time sequence convolution network
JP2023510136A (en) Geolocation models for perception, prediction or planning
US20210261123A1 (en) Autonomous Vehicle Operation with Explicit Occlusion Reasoning
WO2021008605A1 (en) Method and device for determining vehicle speed
CN115016474A (en) Control method, road side equipment, cloud control platform and system for cooperative automatic driving of vehicle and road
US11300957B2 (en) Multiple objective explanation and control interface design
CN114475656B (en) Travel track prediction method, apparatus, electronic device and storage medium
CN112249009A (en) Vehicle speed control method, device and system and electronic equipment
CN115062202A (en) Method, device, equipment and storage medium for predicting driving behavior intention and track
CN116686028A (en) Driving assistance method and related equipment
CN116279580A (en) Decision information generation method and device, electronic equipment and storage medium
CN115112138A (en) Trajectory planning information generation method and device, electronic equipment and storage medium
Tang et al. Cooperative connected smart road infrastructure and autonomous vehicles for safe driving
CN114692289A (en) Automatic driving algorithm testing method and related equipment
CN113911139B (en) Vehicle control method and device and electronic equipment
EP4353560A1 (en) Vehicle control method and apparatus
US20230192133A1 (en) Conditional mode anchoring

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination