CN110843746B - Anti-lock brake control method and system based on reinforcement learning - Google Patents

Anti-lock brake control method and system based on reinforcement learning Download PDF

Info

Publication number
CN110843746B
CN110843746B CN201911194029.7A CN201911194029A CN110843746B CN 110843746 B CN110843746 B CN 110843746B CN 201911194029 A CN201911194029 A CN 201911194029A CN 110843746 B CN110843746 B CN 110843746B
Authority
CN
China
Prior art keywords
reinforcement learning
reward function
range
value
wheel speed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911194029.7A
Other languages
Chinese (zh)
Other versions
CN110843746A (en
Inventor
董舒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dilu Technology Co Ltd
Original Assignee
Dilu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dilu Technology Co Ltd filed Critical Dilu Technology Co Ltd
Priority to CN201911194029.7A priority Critical patent/CN110843746B/en
Publication of CN110843746A publication Critical patent/CN110843746A/en
Application granted granted Critical
Publication of CN110843746B publication Critical patent/CN110843746B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60TVEHICLE BRAKE CONTROL SYSTEMS OR PARTS THEREOF; BRAKE CONTROL SYSTEMS OR PARTS THEREOF, IN GENERAL; ARRANGEMENT OF BRAKING ELEMENTS ON VEHICLES IN GENERAL; PORTABLE DEVICES FOR PREVENTING UNWANTED MOVEMENT OF VEHICLES; VEHICLE MODIFICATIONS TO FACILITATE COOLING OF BRAKES
    • B60T8/00Arrangements for adjusting wheel-braking force to meet varying vehicular or ground-surface conditions, e.g. limiting or varying distribution of braking force
    • B60T8/17Using electrical or electronic regulation means to control braking
    • B60T8/172Determining control parameters used in the regulation, e.g. by calculations involving measured or detected parameters
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60TVEHICLE BRAKE CONTROL SYSTEMS OR PARTS THEREOF; BRAKE CONTROL SYSTEMS OR PARTS THEREOF, IN GENERAL; ARRANGEMENT OF BRAKING ELEMENTS ON VEHICLES IN GENERAL; PORTABLE DEVICES FOR PREVENTING UNWANTED MOVEMENT OF VEHICLES; VEHICLE MODIFICATIONS TO FACILITATE COOLING OF BRAKES
    • B60T8/00Arrangements for adjusting wheel-braking force to meet varying vehicular or ground-surface conditions, e.g. limiting or varying distribution of braking force
    • B60T8/17Using electrical or electronic regulation means to control braking
    • B60T8/176Brake regulation specially adapted to prevent excessive wheel slip during vehicle deceleration, e.g. ABS
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Human Computer Interaction (AREA)
  • Regulating Braking Force (AREA)

Abstract

The invention discloses an anti-lock brake control method and system based on reinforcement learning, which comprises the following steps: extracting key parameters; quantizing the extracted key parameters and limiting the range of the key parameters; constructing a reinforcement learning module comprising a reward function; defining a wheel speed change range reward function value and a brake time change range reward function value in the reward function, and multiplying the wheel speed change range reward function value and the brake time change range reward function value to output a result; inputting the output result to the reinforcement learning module for training; the trained reinforcement learning module is used for controlling the anti-lock brake of the vehicle, the defect that the existing reinforcement learning is unreasonable in definition of the reward function is overcome, and the reinforcement learning algorithm with the new reward function defined is applied to the anti-lock brake control, so that the performance of the reinforcement learning algorithm is better than that of the traditional algorithm.

Description

Anti-lock brake control method and system based on reinforcement learning
Technical Field
The invention relates to the technical field of automatic driving of automobiles, in particular to an anti-lock brake control method and system based on reinforcement learning.
Background
In modern automobile products, an anti-lock brake system can ensure that the automobile prevents wheels from locking in emergency braking, so that the automobile body is stabilized, the braking distance is shortened, and the anti-lock brake system is a standard configuration of the automobile. With the development of artificial intelligence, it is also possible to realize antilock braking using artificial intelligence technology, and theoretically, it is possible to obtain more excellent performance than the conventional algorithm.
The reinforcement learning is an important direction of artificial intelligence and is more suitable for processing the problem of serialization, and the anti-lock brake operation of the automobile during emergency braking accords with the characteristic of reinforcement learning, so that the realization of anti-lock brake by using the reinforcement learning has feasibility. The basic idea of reinforcement learning is as follows: the intelligent agent makes an action in a specific environment, the environment gives a feedback reward according to the action, the intelligent agent adjusts the action according to the reward, and a higher reward is expected to be obtained.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention is proposed in view of the above problems of the existing abs control algorithm and the reward function definition in reinforcement learning.
Therefore, the technical problem solved by the invention is as follows: when the reinforcement learning algorithm is applied to anti-lock brake control, the defect that the excitation function is unreasonably defined in the conventional reinforcement learning is overcome.
In order to solve the technical problems, the invention provides the following technical scheme: an anti-lock brake control method based on reinforcement learning comprises the following steps: extracting key parameters; quantizing the extracted key parameters and limiting the range of the key parameters; constructing a reinforcement learning module comprising a reward function; defining a wheel speed change range reward function value and a brake time change range reward function value in the reward function, and multiplying the wheel speed change range reward function value and the brake time change range reward function value to output a result; inputting the output result to the reinforcement learning module for training; and controlling the anti-lock brake of the vehicle by using the trained reinforcement learning module.
As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: the key parameter comprises wheel speed V1With vehicle body speed V2Difference ratio parameter and single braking time length parameter T1
As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: the calculation function of the difference ratio parameter is (V)2-V1)/V2And limiting the calculated value of the difference ratio parameter within the range of 0-1.
As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: the calculated value of the difference proportion parameter is defined as resetting the calculated value of the difference proportion parameter to 1 when the calculated value of the difference proportion parameter is larger than 1; resetting the calculated value of the difference ratio parameter to 0 when the calculated value of the difference ratio parameter is less than 0; and when the calculated value of the difference value proportion parameter is 0-1, resetting is not carried out.
As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: defining the wheel speed variation range reward function value includes the step of comparing different body speeds V2Wheel speed V1The shortest braking distance L under the condition; defining the value range of the difference value proportion parameter under the condition of the shortest braking distance L, and extracting the wheel speed V1Is said vehicle body speed V270-90%, the shortest braking distance; according to the vehicle body speed V under the shortest braking distance L2With the wheel speed V1The difference ratio parameter of (2), the wheel speed V1The range-of-variation reward function value is defined as:
V1<75%V2at the time of wheel speed V1Range-dependent reward function value of 1- ((V)2-V1)/V2-25%)÷75%);
V1>85%V2At the time of wheel speed V1Range-of-variation reward function value 1- ((15% - (V)2-V1)/V2)÷15%);
When the wheel speed V is1Is said vehicle body speed V275% -85%, the wheel speed V1The variation range bonus function value is 1.
As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: defining the value of the brake time variation range reward function comprises the following steps of comparing different vehicle body speeds V2Wheel speed V1The shortest braking distance L under the condition; defining the single braking time parameter T under the condition of the shortest braking distance L1Extracting the single braking time parameter T1When the brake distance is between 50 and 150ms, the brake distance is the shortest parameter; according to whatAnd defining the brake time variation range reward function value as follows:
when the duration is more than 150ms, the duration is 1- (t-150ms)/100 ms;
when the time is more than or equal to 150ms, t is 1;
wherein t represents a braking time.
As a preferable aspect of the reinforcement learning-based antilock brake control method according to the present invention, wherein: when the wheel speed V is1Is said vehicle body speed V2And when the brake is 80%, the shortest brake distance L is the smallest.
In order to solve the technical problems, the invention also provides the following technical scheme: an anti-lock brake control system based on reinforcement learning comprises an extraction module for extracting wheel speed V1With vehicle body speed V2Difference ratio parameter and single braking time length parameter T1These two key parameters, and quantizing both of said key parameters; the limiting module limits the range of the two quantized key parameters according to the two quantized key parameters to respectively obtain the difference ratio parameter and the single braking time length parameter T when the braking distance is shortest1The value range of (a); a definition module for respectively defining the wheel speeds V under the condition of the shortest braking distance L1A variation range reward function value and a brake time variation range reward function value; the output module is used for inputting an algorithm formed by multiplying the two function values defined in the definition module into the reinforcement learning module for training; and the control module controls the anti-lock brake of the vehicle by using the trained reinforcement learning module.
As a preferable aspect of the reinforcement learning-based antilock brake control system according to the present invention, wherein: the extraction module specifically comprises an analysis unit for analyzing and summarizing the control strategy of the traditional anti-lock brake algorithm; the extraction unit is used for extracting two key parameters; and the quantization unit is used for quantizing the two key parameters.
As a preferable aspect of the reinforcement learning-based antilock brake control system according to the present invention, wherein: the definition module specifically comprises a function definition unit, a function selection unit and a function selection unit, wherein the function definition unit is used for respectively defining a wheel speed V1 variation range reward function value and a brake time variation range reward function value under the condition of the shortest brake distance L; and the normalizing unit is used for normalizing the two well-defined function values.
The invention has the beneficial effects that: in the reward function definition of reinforcement learning, the intelligent agent can learn comprehensive optimal rewards under different evaluation indexes by adding various types of rewards in the reward function, the condition that the intelligent agent only can perform excellent performance on one side is avoided, the definition of normalization is added, the reward value is scaled to 0-1, all the rewards are in a balanced state, a certain reward is prevented from occupying an absolute dominant position, the defect that the reward function definition is unreasonable in the existing reinforcement learning is overcome, the reinforcement learning algorithm defining a new reward function is applied to anti-lock brake control, and the performance which is more excellent than that of the traditional algorithm is obtained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:
FIG. 1 is a schematic flow chart of the operation of the present invention;
FIG. 2 is a schematic flow chart of system modules for implementing the present invention
FIG. 3 is a schematic flow chart of a general process embodying the present invention;
FIG. 4 is a brake effect diagram embodying the present invention;
fig. 5 is a diagram of braking effect of the prior art.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1
Referring to fig. 1 to 3, in the embodiment, it is proposed that a reinforcement learning algorithm is applied to the abs control, because the reinforcement learning algorithm is suitable for handling the problem of serialization, and the abs operation of the car during emergency braking conforms to the reinforcement learning characteristic, the present invention has strong practical operability, and can achieve excellent abs control by cooperating with the definition of the reward function in the present invention.
In particular to an anti-lock brake control method based on reinforcement learning, which comprises the following steps,
s1: extracting key parameters, and extracting wheel speed V by analyzing and summarizing the action principle of the traditional anti-lock brake algorithm1With vehicle body speed V2Difference ratio parameter and single braking time length parameter T1
Wherein, considering the wheel speed V at the time of braking1With vehicle body speed V2The difference between the two parameters is important, the first extracted key parameter is the proportional difference parameter of the two parameters, and the calculation function is (V)2-V1)/V2And limiting the calculated value of the difference ratio parameter within the range of 0-1 by the following limiting rule:
resetting the calculated value of the difference ratio parameter to 1 when the calculated value of the difference ratio parameter is larger than 1;
resetting the calculated value of the difference ratio parameter to 0 when the calculated value of the difference ratio parameter is less than 0;
and when the calculated value of the difference value proportion parameter is 0-1, resetting is not carried out.
And secondly, extracting a parameter T1 for controlling the time length of single braking in the reinforcement learning algorithm according to the mode of inching.
S2: quantifying the extracted key parameters and limiting the range of the key parameters, and specifically comprising the following steps of:
first, in the simulation system, different vehicles are comparedBody velocity V2Wheel speed V1The shortest braking distance L under the condition is obtained by quantifying key parameters and extracting the current wheel speed V1Is the vehicle body speed V270-90%, the parameter with the shortest braking distance, in general, the wheel speed V1Is the vehicle body speed V2When the brake force is 80%, the optimal brake distance can be obtained under the comprehensive condition;
then in a simulation system, analyzing the parameter T of the time length of single braking under the condition of the shortest braking distance1Extracting the single braking time length parameter T1And when the brake is between 50 and 150ms, the brake distance is shortest.
S3: and constructing the reinforcement learning module comprising the reward function, wherein the reinforcement learning module comprising the reward function needs to be reconstructed considering that in the algorithm, the reward is realized by the defined reward function, and the superiority of the defined reward function directly determines the construction quality of the learning module.
S4: the wheel speed variation range reward function value and the brake time variation range reward function value in the reward function are defined, and the wheel speed variation range reward function value and the brake time variation range reward function value are multiplied to output a result, and the method specifically comprises the following steps:
firstly, a wheel speed variation range reward function value is defined, and the method specifically comprises the following steps:
comparing different vehicle body speeds V2Wheel speed V1The shortest braking distance L under the condition;
secondly, defining the value range of the difference value proportion parameter under the condition of the shortest braking distance L, and extracting the current wheel speed V1Is the vehicle body speed V270-90%, the shortest braking distance;
thirdly, according to the vehicle body speed V2 and the wheel speed V under the shortest braking distance L1The wheel speed V1 range reward function value is defined as:
V1<75%V2at the time of wheel speed V1Value of variation range reward function 1- ((V)2-V1)/V2-25%)÷75%);
V1>85%V2At the time of wheel speed V1Range-of-variation reward function value 1- ((15% - (V)2-V1)/V2)÷15%);
When the wheel speed V1Is the vehicle body speed V275% -85%, wheel speed V1The variation range bonus function value is 1.
It should be noted that: when the wheel speed is higher than 85% of the vehicle body speed and lower than 75% of the vehicle body speed, awarding is carried out, so that the intelligent agent can learn a better difference value, and the condition that the intelligent agent only deviates to one direction is avoided;
the wheel speed reward value is zoomed to 0-1, the reward weight when the wheel speed is higher than 85% of the vehicle body speed and lower than 75% of the vehicle body speed is consistent, and the intelligent body is prevented from leaning to learn on the one hand;
when the wheel speed is 75% -85% of the vehicle speed, the reward value is constant 1, a buffer area can be provided for the intelligent body, and severe jumping is avoided.
Then, defining a brake time variation range reward function value, and specifically comprising the following steps:
comparing different vehicle body speeds V2Wheel speed V1The shortest braking distance L under the condition;
② defining single braking time parameter T under the condition of shortest braking distance L1Extracting the parameter T of the single braking time1When the brake distance is between 50 and 150ms, the brake distance is the shortest parameter;
thirdly, according to the value range, the value of the reward function of the variation range of the braking time is defined as:
when the duration is more than 150ms, the duration is 1- (t-150ms)/100 ms;
when the time is more than or equal to 150ms, t is 1;
wherein t represents a braking time.
It needs to be further explained that: wheel speed V1The variation range reward function value and the brake time variation range reward function value are normalized in the definition process, so that the algorithm can be balanced, the training efficiency is improved, and when the wheel speed V is obtained1Variable range reward functionThe method has the advantages that an enhanced learning module is built by an algorithm formed by multiplying a reward function value within the value and brake time variation range, the problem that the model learned by an intelligent body is biased and difficult to adjust due to the fact that the algorithm spends a large amount of time to explore and learn within an invalid and inefficient range is solved, meanwhile, a plurality of types of rewards are added in the reward function definition, the intelligent body can learn comprehensive optimal rewards under different evaluation indexes, the problem that the intelligent body can only perform excellently on one hand is solved, the normalization definition is added in the reward function definition, reward values are scaled to be 0-1, all rewards are in a balanced state, and the problem that certain rewards occupy an absolute leading position is avoided.
S5: inputting the output result to a reinforcement learning module for training;
s6: and controlling the anti-lock brake of the vehicle by using the trained reinforcement learning module.
It should be recognized that embodiments of the present invention can be realized and implemented in computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein. A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.
As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (10)

1. An anti-lock brake control method based on reinforcement learning is characterized in that: comprises the steps of (a) preparing a substrate,
extracting key parameters;
quantizing the extracted key parameters and limiting the range of the key parameters;
constructing a reinforcement learning module comprising a reward function;
defining a wheel speed change range reward function value and a brake time change range reward function value in the reward function, and multiplying the wheel speed change range reward function value and the brake time change range reward function value to output a result;
inputting the output result to the reinforcement learning module for training;
and controlling the anti-lock brake of the vehicle by using the trained reinforcement learning module.
2. The reinforcement learning-based antilock brake control method according to claim 1, wherein: the key parameters include the number of parameters that are,
wheel speed V1With vehicle body speed V2The difference ratio parameter of (2);
time length parameter T of single braking1
3. The reinforcement learning-based antilock brake control method according to claim 2, wherein: the calculation function of the difference ratio parameter is,
(V2-V1)/V2and limiting the calculated value of the difference ratio parameter within the range of 0-1.
4. The reinforcement learning-based antilock brake control method according to claim 3, wherein: the calculated value of the difference ratio parameter is defined as,
resetting the calculated value of the difference ratio parameter to 1 when the calculated value of the difference ratio parameter is greater than 1;
resetting the calculated value of the difference ratio parameter to 0 when the calculated value of the difference ratio parameter is less than 0;
and when the calculated value of the difference value proportion parameter is 0-1, resetting is not carried out.
5. The reinforcement learning-based antilock brake control method according to any one of claims 1 to 4, wherein: defining the wheel speed range reward function value includes the steps of,
comparing different vehicle body speeds V2Wheel speed V1The shortest braking distance L under the condition;
defining the value range of the difference value proportion parameter under the condition of the shortest braking distance L, and extracting the wheel speed V1Is said vehicle body speed V270-90%, the shortest braking distance;
according to the vehicle body speed V under the shortest braking distance L2And speed V of wheel1The difference ratio parameter of (2), the wheel speed V1The range-of-variation reward function value is defined as:
V1<75%V2at the time of wheel speed V1Range-dependent reward function value of 1- ((V)2-V1)/V2-25%)÷75%);
V1>85%V2Hour and wheel speed V1Variable range reward function value of 1- ((15% - (V)2-V1)/V2)÷15%);
When the wheel speed V is1Is said vehicle body speed V275% -85%, the wheel speed V1The variation range bonus function value is 1.
6. The reinforcement learning-based antilock brake control method according to any one of claims 1 to 4, wherein: defining the brake time variation range reward function value comprises the steps of,
defining the single braking time parameter T under the condition of the shortest braking distance L1Extracting the single braking time parameter T1When the brake time is between 50 and 150ms, the brake distance is the shortest parameter;
according to the value range, the brake time variation range reward function value is defined as:
when the duration is more than 150ms, the duration is 1- (t-150ms)/100 ms;
when the time is more than or equal to 150ms, t is 1;
wherein t represents a braking time.
7. The reinforcement learning-based antilock brake control method according to claim 5, wherein: when the wheel speed V is1Is said vehicle body speed V2And when the brake is 80%, the shortest brake distance L is the smallest.
8. An anti-lock brake control system based on reinforcement learning is characterized in that: comprises the steps of (a) preparing a mixture of a plurality of raw materials,
an extraction module (100) for extractingWheel speed V1With vehicle body speed V2Difference ratio parameter and single braking time length parameter T1These two key parameters, and quantizing both of said key parameters;
a limiting module (200) for limiting the range according to the two quantized key parameters to respectively obtain the difference proportion parameter and the single braking time length parameter T when the braking distance is shortest1The value range of (a);
a definition module (300) for respectively defining the wheel speeds V under the condition of the shortest braking distance L1A variation range reward function value and a brake time variation range reward function value;
the output module (400) is used for inputting an algorithm formed by multiplying two function values defined in the definition module (300) into a reinforcement learning module for training;
and the control module (500) controls the anti-lock brake of the vehicle by using the trained reinforcement learning module.
9. The reinforcement learning-based antilock brake control system according to claim 8, wherein: the extraction module (100) comprises in particular,
the analysis unit is used for analyzing and summarizing the control strategy of the traditional anti-lock brake algorithm;
the extraction unit is used for extracting two key parameters;
and the quantization unit is used for quantizing the two key parameters.
10. The reinforcement learning-based antilock brake control system according to claim 8, wherein: the definition module (300) comprises in particular,
a function defining unit for respectively defining wheel speeds V under the condition of the shortest braking distance L1A variation range reward function value and a brake time variation range reward function value;
and the normalizing unit is used for normalizing the two well-defined function values.
CN201911194029.7A 2019-11-28 2019-11-28 Anti-lock brake control method and system based on reinforcement learning Active CN110843746B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911194029.7A CN110843746B (en) 2019-11-28 2019-11-28 Anti-lock brake control method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911194029.7A CN110843746B (en) 2019-11-28 2019-11-28 Anti-lock brake control method and system based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN110843746A CN110843746A (en) 2020-02-28
CN110843746B true CN110843746B (en) 2022-06-14

Family

ID=69605967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911194029.7A Active CN110843746B (en) 2019-11-28 2019-11-28 Anti-lock brake control method and system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN110843746B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111605558B (en) * 2020-04-21 2022-07-19 浙江吉利控股集团有限公司 Vehicle speed determination method and device, electronic equipment and vehicle
CN112906304B (en) * 2021-03-10 2023-04-07 北京航空航天大学 Brake control method and device

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4093076B2 (en) * 2003-02-19 2008-05-28 富士重工業株式会社 Vehicle motion model generation apparatus and vehicle motion model generation method
US7590481B2 (en) * 2005-09-19 2009-09-15 Ford Global Technologies, Llc Integrated vehicle control system using dynamically determined vehicle conditions
CN100586769C (en) * 2008-01-31 2010-02-03 赵西安 Anti-lock method
CN101311047B (en) * 2008-05-04 2011-04-06 重庆邮电大学 Vehicle anti-lock brake control method based on least squares support vector machine
US8204667B2 (en) * 2008-06-09 2012-06-19 Ford Global Technologies Method for compensating for normal forces in antilock control
US20110175438A1 (en) * 2010-01-21 2011-07-21 Ford Global Technologies Llc Vehicle Line-Locking Braking System and Method
CN104015711B (en) * 2014-06-17 2016-06-01 广西大学 A kind of bi-fuzzy control method of automobile ABS
US10065654B2 (en) * 2016-07-08 2018-09-04 Toyota Motor Engineering & Manufacturing North America, Inc. Online learning and vehicle control method based on reinforcement learning without active exploration
US10503172B2 (en) * 2017-10-18 2019-12-10 Luminar Technologies, Inc. Controlling an autonomous vehicle based on independent driving decisions
EP3701432A1 (en) * 2018-02-09 2020-09-02 DeepMind Technologies Limited Distributional reinforcement learning using quantile function neural networks
US11106211B2 (en) * 2018-04-02 2021-08-31 Sony Group Corporation Vision-based sample-efficient reinforcement learning framework for autonomous driving
CN110136481B (en) * 2018-09-20 2021-02-02 初速度(苏州)科技有限公司 Parking strategy based on deep reinforcement learning
CN109709956B (en) * 2018-12-26 2021-06-08 同济大学 Multi-objective optimized following algorithm for controlling speed of automatic driving vehicle
CN109733415B (en) * 2019-01-08 2020-08-14 同济大学 Anthropomorphic automatic driving and following model based on deep reinforcement learning
CN109858630A (en) * 2019-02-01 2019-06-07 清华大学 Method and apparatus for intensified learning
CN109808706A (en) * 2019-02-14 2019-05-28 上海思致汽车工程技术有限公司 Learning type assistant driving control method, device, system and vehicle
CN110254408A (en) * 2019-05-21 2019-09-20 江苏大学 A kind of adaptive time-varying slip rate constraint control algolithm of intelligent automobile anti-lock braking system
CN110450771B (en) * 2019-08-29 2021-03-09 合肥工业大学 Intelligent automobile stability control method based on deep reinforcement learning

Also Published As

Publication number Publication date
CN110843746A (en) 2020-02-28

Similar Documents

Publication Publication Date Title
CN110843746B (en) Anti-lock brake control method and system based on reinforcement learning
CN109782730B (en) Method and apparatus for autonomic system performance and rating
CN112052776A (en) Unmanned vehicle autonomous driving behavior optimization method and device and computer equipment
CN114355793B (en) Training method and device for automatic driving planning model for vehicle simulation evaluation
US20210023905A1 (en) Damper control system, vehicle, information processing apparatus and control method thereof, and storage medium
CN112579966B (en) Method and device for calculating ABS reference vehicle speed, electronic equipment and medium
CN116552474B (en) Vehicle speed control method, device, equipment and medium based on reinforcement learning
CN113642832A (en) Method and system for evaluating driving behavior of commercial vehicle
CN112149908B (en) Vehicle driving prediction method, system, computer device, and readable storage medium
CN113625753B (en) Method for guiding neural network to learn unmanned aerial vehicle maneuver flight by expert rules
CN106114285A (en) Electric bicycle trailer controls
CN111736076B (en) Battery system state judging method and device, readable storage medium and electronic equipment
CN117400953A (en) Tire force direct control method, device, equipment and storage medium
CN111738046A (en) Method and apparatus for calibrating a physics engine of a virtual world simulator for deep learning based device learning
CN116039715A (en) Train virtual marshalling operation control method and device
CN111506963B (en) Layered optimization method and system based on smoothness of heavy commercial vehicle
CN113147711A (en) Nonlinear braking force compensation method of giant magnetostrictive brake-by-wire system
CN114332520A (en) Abnormal driving behavior recognition model construction method based on deep learning
CN114615505A (en) Point cloud attribute compression method and device based on depth entropy coding and storage medium
CN110901406A (en) Vehicle driving and braking combined braking control method and system
CN118061968A (en) Brake pressure control method, device, equipment and medium based on reinforcement learning
JP7495833B2 (en) DNN model compression system
CN117944637A (en) Vehicle brake control method, device, equipment and storage medium
CN118072541B (en) Hybrid vehicle queue robust data driving prediction control method and device
CN116976423B (en) Training method of pre-accident risk assessment model fusing post-accident vehicle dynamics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant