CN113879323B - Reliable learning type automatic driving decision-making method, system, storage medium and equipment - Google Patents

Reliable learning type automatic driving decision-making method, system, storage medium and equipment Download PDF

Info

Publication number
CN113879323B
CN113879323B CN202111246972.5A CN202111246972A CN113879323B CN 113879323 B CN113879323 B CN 113879323B CN 202111246972 A CN202111246972 A CN 202111246972A CN 113879323 B CN113879323 B CN 113879323B
Authority
CN
China
Prior art keywords
decision
learning
interpretable
value
automatic driving
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111246972.5A
Other languages
Chinese (zh)
Other versions
CN113879323A (en
Inventor
杨殿阁
曹重
周伟韬
邓楠山
焦新宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202111246972.5A priority Critical patent/CN113879323B/en
Publication of CN113879323A publication Critical patent/CN113879323A/en
Application granted granted Critical
Publication of CN113879323B publication Critical patent/CN113879323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Human Computer Interaction (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Traffic Control Systems (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)

Abstract

The invention relates to a reliable learning type automatic driving decision-making method, a reliable learning type automatic driving decision-making system, a storage medium and equipment, wherein the reliable learning type automatic driving decision-making system comprises the following steps: constructing an interpretable decision based on a preset decision problem, the interpretable decision guiding the learning-based decision training; training a learning type decision by the decision problem to obtain the learning type decision of a decision value function with high value; and selecting the decision with high value in the learning decision and the interpretability decision as the final reliable learning decision action. The method and the device can guarantee the reliability of the learning type decision of the automatic driving automobile so as to ensure the high reliability of the automatic driving automobile. The invention can be widely applied in the technical field of automatic driving.

Description

Reliable learning type automatic driving decision-making method, system, storage medium and equipment
Technical Field
The invention relates to the technical field of decision making of an automatic driving automobile, in particular to a learning type automatic driving decision making method, a learning type automatic driving decision making system, a learning type automatic driving decision making storage medium and learning type automatic driving decision making equipment based on a reinforcement learning method and having reliable driving performance.
Background
The autonomous decision-making of the automatic driving automobile is an important component in an automatic driving automobile system, and the learning type automatic driving decision-making method is expected to obtain driving capability exceeding that of human beings through autonomous learning. The problem is that the learning-type method has a black box decision attribute, and the decision performance is difficult to predict, which contradicts the high reliability requirement of the automatic driving automobile. Therefore, the construction of a reliable learning type automatic driving decision-making method is important for improving the intelligent level of the automatic driving automobile.
At present, the method for guaranteeing the reliability of the learning-type automatic driving automobile mainly comprises the following steps: and three methods of increasing safety constraint, decision training induction and dangerous scene exploration are adopted. The main idea of the method for increasing the safety constraint is to analyze the safety of the learning decision output trajectory and adjust in time when a possible danger is found. The problem with this approach is that in complex scenarios, it is still very difficult to guarantee absolute security using artificially designed rules. Both the decision training induction method and the danger scene exploration method are used for adjusting the training direction or adding specific data in the decision training process so as to improve the safety of learning type decision. The difference is that the decision training induction method is to avoid the study type decision from exploring in dangerous scenes, and study in safe scenes as much as possible, so that the obtained driving strategies can be in safer scenes; on the contrary, the method for exploring the dangerous scene obtains the capability of handling the dangerous scene by repeatedly learning in the dangerous scene by the learning type decision. However, both of these methods simply depend on the learning ability of the learning-based decision-making itself, and do not consider the reliability of the final output result, so it is still difficult to implement a reliable learning-based automatic driving decision-making method.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a reliable learning type automatic driving decision method, system, storage medium and device, which can guarantee the reliability of the learning type decision of the automatic driving vehicle to ensure the high reliability of the automatic driving vehicle.
In order to achieve the above object, on one hand, the technical scheme adopted by the invention is as follows: a method of trustworthy learning-based automated driving decision-making, comprising: constructing an interpretable decision based on a predetermined decision problem, the interpretable decision guiding the learning-based decision training; training a learning type decision by the decision problem to obtain the learning type decision of a decision value function with high value; and selecting the decision with high value in the learning decision and the interpretable decision as the final reliable learning decision action.
Further, the decision problem consists of three elements; the three elements are environmental observation state, automatic driving action and instant reward.
Further, the training of a learning-based decision by the decision problem comprises:
setting a decision value function;
estimating a cost function of the interpretable decision;
and learning to obtain a high-value decision value function according to the interpretable decision value function and the set decision value function.
Further, the estimating a cost function of the interpretable decision comprises: and (3) obtaining the value function of the interpretable decision by constructing a data set and adopting a recursion method from the data set.
Further, the data set is composed of data elements; the data element is in a next-moment state obtained by adopting an interpretable driving strategy under different states;
alternatively, the data set is obtained by collecting driving data during driving of the vehicle by directly driving the vehicle using the interpretable decision.
Further, the learning to obtain a high-value decision cost function according to the interpretable decision cost function and the set decision cost function includes:
when the automatic driving automobile meets a state which is not met, the automatic driving automobile adopts the interpretable decision to drive, and initializes a value function of the learning decision according to the feedback of the environment;
when the automatic driving automobile encounters a state which is encountered once, generating a new action; and after the next moment state is obtained, updating the value function of the learning type decision according to the new action.
Further, the new action a is:
Figure BDA0003321172540000021
wherein N(s) represents the number of times the current state encounters, N (s, a) represents the number of times the current state takes an action, and Q (s, a) representsTaking the decision cost function of action a at state s, δ (s, a, π r ) Is an interpretable decision-inducing value, pi r Representing an interpretable decision, c is a constant that is manually adjusted.
On the other hand, the technical scheme adopted by the invention is as follows: a trustworthy learning-based automated driving decision system, comprising: the system comprises an interpretable decision construction module, a learning type decision training module and an output module; the interpretability decision constructing module is used for constructing an interpretability decision based on a preset decision problem, and the interpretability decision guides the learning type decision training; the learning decision training module trains the learning decision through the decision problem to obtain a learning decision of a decision value function with high value; and the output module selects the decision with high value in the learning decision and the interpretable decision as the final reliable learning decision action.
On the other hand, the technical scheme adopted by the invention is as follows: a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the above methods.
On the other hand, the technical scheme adopted by the invention is as follows: a computing device, comprising: one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods described above.
Due to the adoption of the technical scheme, the invention has the following advantages:
1. the invention adjusts the evaluation process of the value function of the reinforcement learning decision, so that the decision value function of the final generated strategy is higher than a certain interpretable driving strategy, thereby realizing the reliability guarantee of the learning decision of the automatic driving automobile.
2. The method can ensure the high reliability of the automatic driving automobile by ensuring the lower boundary of the expression through the interpretable strategy while fully exerting the decision-making capability of the reinforcement learning on a definite target under the highly uncertain environment, and is an important technology for realizing the reliable learning type decision-making of the automatic driving automobile.
Drawings
FIG. 1 is a flow chart illustrating a learning-based automatic driving decision method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of key elements of a learning-based decision-making problem according to an embodiment of the present invention;
FIG. 3 is a block diagram of a reliable learning-based decision making architecture in accordance with an embodiment of the present invention;
FIG. 4 is a graph of state-action sampling versus decision cost function in one embodiment of the present invention;
FIG. 5 is a graph of induced values versus learned decision confidence in an embodiment of the present invention;
FIG. 6 is a schematic diagram of a computing device in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It should be apparent that the described embodiments are only some of the embodiments of the present invention, and not all of them. All other embodiments, which can be derived by a person skilled in the art from the description of the embodiments of the invention given above, are within the scope of protection of the invention.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The invention provides a credible learning type automatic driving decision-making method, a credible learning type automatic driving decision-making system, a credible learning type storage medium and credible learning type automatic driving decision-making equipment for an automatic driving automobile, which are used for solving a constructed decision-making problem so as to generate an optimal or near-optimal strategy. The present invention is not directed to a specific decision problem, but requires the construction of key elements of the decision problem.
In an embodiment of the present invention, a reliable learning type automatic driving decision method is provided, and this embodiment is exemplified by applying the method to a terminal, it is to be understood that the method may also be applied to a server, and may also be applied to a system including a terminal and a server, and is implemented through interaction between the terminal and the server.
In this embodiment, as shown in fig. 1, the method includes the following steps:
1) Constructing an interpretable decision based on a preset decision problem, and guiding learning type decision training by the interpretable decision;
2) Training the learning type decision by the decision problem to obtain a learning type strategy of a decision value function with high value;
3) And selecting a high-value decision from the learning decision and the interpretable decision as a final reliable learning decision action.
In the step 1), the decision problem consists of three elements; the three elements are environmental observation state, automatic driving action and instant reward.
In the present embodiment, as shown in fig. 2, an automatic driving decision problem is composed of the following three elements: environmental observation state s, autopilot maneuver a and transient reward r. Wherein:
the environmental observation state s refers to the surrounding dynamic and static element states obtained by a sensor and other methods, such as road information, environmental vehicle, pedestrian, riding person and other dynamic and static element information;
the automatic driving action a refers to an instruction which can be received by the control module, and can be a driving track containing speed information, or a vehicle control instruction, such as a steering wheel angle, an accelerator brake and the like;
the instantaneous reward r refers to a quantitative value of reward or punishment for the current driving state according to traffic rules, vehicle occupant requirements and the like, such as safety, smoothness, traffic efficiency and the like. Here, the transient reward function evaluates only the current environment and current actions, and does not need to reward or penalize future or past states and behaviors.
The construction of the above three elements is a precondition for using the decision method of the present invention.
In the step 1), the interpretable decision is constructed as follows: interpretable driving decisions refer to decision-making methods based on rule or model design, where the resulting actions have explicit logic. The mainstream decision making system of the automatic driving automobile is based on the interpretable decision making method. In the present invention, such an interpretable decision-making method will be used to guarantee the performance bound of the learning-based decision, i.e., the final learning-based decision-making performance is required to be not lower than the interpretable decision. The invention has no requirement on the source and the form of the interpretable decision method, and the interpretable decision is constructed into the following form:
Figure BDA0003321172540000041
wherein the content of the first and second substances,
Figure BDA0003321172540000051
representing the space formed by all possible environmental observation states;
Figure BDA0003321172540000052
representing the space formed by all possible decision actions; pi r Representing the interpretable decision, i.e. the mapping from state space to action space; a is r An action representing an interpretable decision output.
The construction method only restricts the input and output forms of the interpretable decision and requires the input and output forms to be consistent with the decision problem of the learning decision. In a specific decision process, only part of the information may be used for making a decision. The comprehension and adjustment capability of interpretable decisions is the basis for the trustworthiness of the present invention.
In the step 2), as shown in fig. 1 and 3, the training of the learning type decision by the decision problem includes the following steps:
2.1 Set a decision cost function;
the decision value refers to the evaluation of the current state and the performance of different decisions in a period of time in the future, and the decision value function is a function for establishing the relationship between the decision and the decision value. The trustworthy learning decision training process will be based on the evaluation of cost functions for different decisions, thereby obtaining a strategy with the highest possible value, while avoiding that the value of the generated strategy is lower than the interpretable decision.
The decision cost function is defined as follows:
Figure BDA0003321172540000053
wherein Q is π Representing the cost function of the strategy pi, H representing the future forecast duration, and gamma representing a reward discount between 0 and 1 for reducing the impact of the far reward value. h represents the number of decision steps, is a positive integer (i.e. N),
Figure BDA0003321172540000054
representing a numerical expectation, r t Indicating the prize values for different driving conditions.
2.2 Estimate a cost function of the interpretable decision;
the method specifically comprises the following steps: and (3) obtaining the value function of the interpretable decision by constructing a data set and adopting a recursion method through the data set.
The data set is composed of data elements; the data element is the next moment state obtained by adopting the interpretable driving strategy under different states.
For example, construct a data element τ r ={s 1 ,a 1 =π r (s 1 ),s 2 Data set of
Figure BDA0003321172540000056
. In the dataset, each data element represents the next tick obtained under different states using interpretable driving strategiesState. The following cost function evaluation process can be obtained by adopting a recursion method through the data set:
Figure BDA0003321172540000055
where α is the manually designed learning rate. The recursion process relies only on the current state, the interpretable policy, and the next-minute state.
Alternatively, the data set is obtained by collecting driving data during the driving of the vehicle by directly driving the vehicle using interpretable decisions. When the collected data is sufficient, a more accurate cost function of interpretable decisions can be obtained.
2.3 According to the interpretable decision value function and the set decision value function, learning to obtain a high-value decision value function.
Since the driving strategy of the learning-based decision is changing itself, the cost function will change accordingly. In the learning type training process, a decision value function needs to be estimated at the same time, and the decision value is improved by adjusting the decision. In this embodiment, the learning-based decision training process generates driving actions in different driving states, and then adjusts the actions according to environmental feedback, so as to improve decision performance. As shown in fig. 4, the method specifically includes:
when the automatic driving automobile meets a state which is not met, the automatic driving automobile adopts interpretable decision to drive, and a value function of a learning type decision is initialized according to the feedback of the environment;
when the automatic driving automobile encounters a state which is encountered once, generating a new action; and after the next moment state is obtained, updating the value function of the learning type decision according to the new action.
Wherein, the new action a is generated as follows:
Figure BDA0003321172540000061
where N(s) denotes the current state encounteredN (s, a) represents the number of times an action is taken in the current state, Q (s, a) represents the decision cost function of taking action a in state s, δ (s, a, pi) r ) Is an interpretable decision-inducing value, pi r Representing an interpretability decision, c is a constant that is manually adjusted.
The decision-making induction value delta (s, a, pi) can be explained r ) The definition is as follows:
Figure BDA0003321172540000062
wherein, c thres Is an artificially designed positive number, and when the action is the same as the interpretable decision, the interpretable decision-inducing value is c thres Otherwise, it is 0. c. C thres Related to the degree of conservation of the reliable decision, as shown in FIG. 5, when c thres Towards infinity, the learned decision output will be exactly the same as the interpretable decision. When c is going to thres Towards infinity, the strategy generated by the learned decision will be independent of the interpretable decision, at which point the learned decision will lose trustworthiness. Generally will c thres Designed to be a value slightly greater than 0.
After the next state is obtained, the updating mode of the cost function is as follows:
Q(s t ,a t )←Q(s t ,a t )+α[r(s t+1 )+γQ(s t+1 ,a)-Q(s t ,a t )] (5)
in the above embodiments, after obtaining the cost function capable of explaining the decision and the learning decision, the method for generating the action of the autonomous vehicle during driving is as follows:
Figure BDA0003321172540000071
wherein, Q(s) t A) a cost function, Q, representing different actions generated by a learning-type decision r (s tr (s t ) A cost function representing an interpretable decision.
In the step 3), the selection of the final action judges whether the learning decision value is higher than the interpretable decision value, if so, the learning decision value is selected, if not, the interpretable decision value is selected, and the driving decision formed by the mechanism has credibility.
In the embodiment, after the interpretable decision is adjusted, the cost function of the interpretable decision is only required to be re-estimated, and the method can be applied to the trained learning-based decision, so that the reliable learning-based decision of the automatic driving automobile is realized.
In summary, a reliable automatic driving learning decision method is designed from a learning decision mechanism, which is one of effective ways for improving the intelligent level of an automatic driving vehicle and further achieving reliable automatic driving in a complex scene, and the development of an automatic driving vehicle is promoted.
In one embodiment of the present invention, there is provided a trustworthy learning automatic driving decision system, comprising: the system comprises an interpretable decision construction module, a learning type decision training module and an output module;
the interpretable decision building module builds an interpretable decision based on a preset decision problem, and the interpretable decision guides learning type decision training;
the learning decision training module trains the learning decision through a decision problem to obtain a learning decision of a decision value function with high value;
and the output module selects a high-value decision from the learning decision and the interpretability decision as a final reliable learning decision action.
The system provided in this embodiment is used for executing the above method embodiments, and for details of the process and the details, reference is made to the above embodiments, which are not described herein again.
As shown in fig. 6, which is a schematic structural diagram of a computing device provided in an embodiment of the present invention, the computing device may be a terminal, and may include: a processor (processor), a communication Interface (communication Interface), a memory (memory), a display screen and an input device. The processor, the communication interface and the memory are communicated with each other through a communication bus. The processor is used to provide computing and control capabilities. The memory includes a non-volatile storage medium, an internal memory, the non-volatile storage medium storing an operating system and a computer program that when executed by the processor implements a method of decision making; the internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The communication interface is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a manager network, NFC (near field communication) or other technologies. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computing equipment, an external keyboard, a touch pad or a mouse and the like. The processor may call logic instructions in memory to perform the following method:
constructing an interpretable decision based on a preset decision problem, and guiding learning type decision training by the interpretable decision; training the learning type decision by the decision problem to obtain the learning type decision of a decision value function with high value; and selecting a high-value decision from the learning decision and the interpretable decision as a final reliable learning decision action.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment of the invention, a computer program product is provided, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method provided by the above-described method embodiments, for example, comprising: constructing an interpretable decision based on a preset decision problem, and guiding learning type decision training by the interpretable decision; training the learning type decision by the decision problem to obtain the learning type decision of a decision value function with high value; and selecting the decision with high value in the learning decision and the interpretability decision as the final reliable learning decision action.
In one embodiment of the invention, a non-transitory computer-readable storage medium is provided, which stores server instructions that cause a computer to perform the methods provided by the above embodiments, for example, including: constructing an interpretable decision based on a preset decision problem, and guiding learning type decision training by the interpretable decision; training the learning type decision by the decision problem to obtain the learning type decision of a decision value function with high value; and selecting a high-value decision from the learning decision and the interpretable decision as a final reliable learning decision action.
The implementation principle and technical effect of the computer-readable storage medium provided by the above embodiments are similar to those of the above method embodiments, and are not described herein again.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A method for reliable learning-based automated driving decision making, comprising:
constructing an interpretable decision based on a preset decision problem, the interpretable decision guiding the learning-based decision training;
training a learning type decision by the decision problem to obtain the learning type decision of a decision value function with high value;
selecting a high-value decision from the learning decision and the interpretable decision as a final reliable learning decision action;
the decision problem consists of three elements; the three elements are environment observation state, automatic driving action and instant reward;
the construction of the interpretable decision is: an interpretable decision method will be used to ensure a performance lower bound for the learning-based decision, requiring that the final learning-based decision performance not be less than the interpretable decision; the interpretable decision is constructed in the form:
Figure FDA0003947422520000011
a r =π r (s)
wherein the content of the first and second substances,
Figure FDA0003947422520000012
representing the space formed by all possible environmental observation states;
Figure FDA0003947422520000013
representing the space formed by all possible decision-making actions; pi r Representing the interpretable decision, a mapping from a state space to an action space; a is a r An act of representing an interpretable decision output; s is a state;
the construction method of the interpretable decision only restricts the input and output forms of the interpretable decision and needs to be consistent with the decision problem of the learning type decision;
the decision problem trains learning-based decisions, including:
setting a decision value function;
estimating a cost function of the interpretable decision;
and learning to obtain a high-value decision value function according to the interpretable decision value function and the set decision value function.
2. The method for trustworthy learning-based automated driving decision-making of claim 1, wherein said estimating a cost function of said interpretable decision comprises: and (3) obtaining the value function of the interpretable decision by constructing a data set and adopting a recursion method from the data set.
3. A method for reliable learning-based automated driving decision making according to claim 2 wherein said data set is comprised of data elements; the data element is in a next-moment state obtained by adopting an interpretable driving strategy under different states;
alternatively, the data set is obtained by collecting driving data during the driving of the vehicle by driving the vehicle directly using the interpretable decision.
4. The confidence-based learning-based automated driving decision-making method according to claim 1, wherein learning a high-value decision-making cost function based on the interpretable decision-making cost function and the set decision-making cost function comprises:
when the automatic driving automobile meets a state which is not met, the automatic driving automobile adopts the interpretable decision to drive, and initializes a value function of the learning decision according to the feedback of the environment;
when the automatic driving automobile encounters a state which is encountered once, generating a new action; and after the next moment state is obtained, updating the value function of the learning type decision according to the new action.
5. The trustworthy learning-based automated driving decision method of claim 4, wherein the new action a is:
Figure FDA0003947422520000021
where N(s) represents the number of times the current state encounters, N (s, a) represents the number of times the current state takes an action, Q (s, a) represents the decision cost function for taking action a at state s, and δ (s, a, π r ) Is an interpretable decision-making induction value, π r Representing an interpretable decision, c is a constant that is manually adjusted.
6. A trustworthy learning-based automated driving decision system, comprising: the system comprises an interpretable decision construction module, a learning type decision training module and an output module;
the interpretable decision building module builds an interpretable decision based on a preset decision problem, and the interpretable decision guides the learning type decision training;
the learning decision training module trains a learning decision through the decision problem to obtain a learning decision of a decision value function with high value;
the output module selects a high-value decision from the learning decision and the interpretable decision as a final reliable learning decision action;
the decision problem consists of three elements; the three factors are environmental observation state, automatic driving action and instant reward;
the construction of the interpretable decision is: an interpretable decision method will be used to ensure a performance lower bound for the learning-based decision, requiring that the final learning-based decision performance not be less than the interpretable decision; the interpretable decision is constructed in the form:
Figure FDA0003947422520000022
a r =π r (s)
wherein the content of the first and second substances,
Figure FDA0003947422520000031
representing a space formed by all possible environmental observation states;
Figure FDA0003947422520000032
representing the space formed by all possible decision-making actions; pi r Representing the interpretability decision, a mapping from a state space to an action space; a is r An act of representing an interpretable decision output; s is a state;
the construction method of the interpretable decision only restricts the input and output forms of the interpretable decision and needs to be consistent with the decision problem of the learning type decision;
the decision problem trains learning-based decisions, including:
setting a decision value function;
estimating a cost function of the interpretable decision;
and learning to obtain a high-value decision value function according to the interpretable decision value function and the set decision value function.
7. A computer readable storage medium storing one or more programs, wherein the one or more programs comprise instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-5.
8. A computing device, comprising: one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods of claims 1-5.
CN202111246972.5A 2021-10-26 2021-10-26 Reliable learning type automatic driving decision-making method, system, storage medium and equipment Active CN113879323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111246972.5A CN113879323B (en) 2021-10-26 2021-10-26 Reliable learning type automatic driving decision-making method, system, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111246972.5A CN113879323B (en) 2021-10-26 2021-10-26 Reliable learning type automatic driving decision-making method, system, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN113879323A CN113879323A (en) 2022-01-04
CN113879323B true CN113879323B (en) 2023-03-14

Family

ID=79014396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111246972.5A Active CN113879323B (en) 2021-10-26 2021-10-26 Reliable learning type automatic driving decision-making method, system, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN113879323B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180090660A (en) * 2017-02-03 2018-08-13 자동차부품연구원 Parameter learning apparatus and method for personalization of autonomous driving control system
CN110297484A (en) * 2018-03-23 2019-10-01 广州汽车集团股份有限公司 Unmanned control method, device, computer equipment and storage medium
CN111273668A (en) * 2020-02-18 2020-06-12 福州大学 Unmanned vehicle motion track planning system and method for structured road
CN111874007A (en) * 2020-08-06 2020-11-03 中国科学院自动化研究所 Knowledge and data drive-based unmanned vehicle hierarchical decision method, system and device
CN111907527A (en) * 2019-05-08 2020-11-10 通用汽车环球科技运作有限责任公司 Interpretable learning system and method for autonomous driving
CN113120003A (en) * 2021-05-18 2021-07-16 同济大学 Unmanned vehicle motion behavior decision method
CN113264043A (en) * 2021-05-17 2021-08-17 北京工业大学 Unmanned driving layered motion decision control method based on deep reinforcement learning
CN113264059A (en) * 2021-05-17 2021-08-17 北京工业大学 Unmanned vehicle motion decision control method supporting multiple driving behaviors and based on deep reinforcement learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180090660A (en) * 2017-02-03 2018-08-13 자동차부품연구원 Parameter learning apparatus and method for personalization of autonomous driving control system
CN110297484A (en) * 2018-03-23 2019-10-01 广州汽车集团股份有限公司 Unmanned control method, device, computer equipment and storage medium
CN111907527A (en) * 2019-05-08 2020-11-10 通用汽车环球科技运作有限责任公司 Interpretable learning system and method for autonomous driving
CN111273668A (en) * 2020-02-18 2020-06-12 福州大学 Unmanned vehicle motion track planning system and method for structured road
CN111874007A (en) * 2020-08-06 2020-11-03 中国科学院自动化研究所 Knowledge and data drive-based unmanned vehicle hierarchical decision method, system and device
CN113264043A (en) * 2021-05-17 2021-08-17 北京工业大学 Unmanned driving layered motion decision control method based on deep reinforcement learning
CN113264059A (en) * 2021-05-17 2021-08-17 北京工业大学 Unmanned vehicle motion decision control method supporting multiple driving behaviors and based on deep reinforcement learning
CN113120003A (en) * 2021-05-18 2021-07-16 同济大学 Unmanned vehicle motion behavior decision method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
无人驾驶车辆行为决策系统研究;熊璐等;《汽车技术》;20180803(第08期);全文 *
深度学习可解释性研究进展;成科扬等;《计算机研究与发展》;20200607(第06期);全文 *

Also Published As

Publication number Publication date
CN113879323A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
AU2019253703B2 (en) Improving the safety of reinforcement learning models
CN108009587B (en) Method and equipment for determining driving strategy based on reinforcement learning and rules
WO2019199878A1 (en) Analysis of scenarios for controlling vehicle operations
CN112997128B (en) Method, device and system for generating automatic driving scene
AU2019251365A1 (en) Dynamically controlling sensor behavior
US20190310632A1 (en) Utility decomposition with deep corrections
US11810460B2 (en) Automatic generation of pedestrians in virtual simulation of roadway intersections
Shi et al. Offline reinforcement learning for autonomous driving with safety and exploration enhancement
Li et al. Modeling mixed traffic flows of human-driving vehicles and connected and autonomous vehicles considering human drivers’ cognitive characteristics and driving behavior interaction
CN113253612B (en) Automatic driving control method, device, equipment and readable storage medium
CN115777088A (en) Vehicle operation safety model test system
CN113879323B (en) Reliable learning type automatic driving decision-making method, system, storage medium and equipment
CN117406756B (en) Method, device, equipment and storage medium for determining motion trail parameters
CN112835362B (en) Automatic lane change planning method and device, electronic equipment and storage medium
Wang et al. Autonomous driving based on approximate safe action
Liang et al. Wip: End-to-end analysis of adversarial attacks to automated lane centering systems
US20220261630A1 (en) Leveraging dynamical priors for symbolic mappings in safe reinforcement learning
Chen et al. Investigation of a driver-oriented adaptive cruise control system
CN114616157A (en) Method and system for checking automated driving functions by reinforcement learning
Shu et al. Generative Models and Connected and Automated Vehicles: A Survey in Exploring the Intersection of Transportation and AI
Yang et al. Deep Reinforcement Learning Lane-Changing Decision Algorithm for Intelligent Vehicles Combining LSTM Trajectory Prediction
CN112766310B (en) Fuel-saving lane-changing decision-making method and system
US20230365157A1 (en) Driving related augmented virtual fields
US20230356747A1 (en) Driving related augmented virtual fields
CN114506337B (en) Method and system for determining a maneuver to be performed by an autonomous vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant