CN115202351B - Intelligent obstacle avoidance method considering intention and individual operation habit - Google Patents

Intelligent obstacle avoidance method considering intention and individual operation habit Download PDF

Info

Publication number
CN115202351B
CN115202351B CN202210830915.XA CN202210830915A CN115202351B CN 115202351 B CN115202351 B CN 115202351B CN 202210830915 A CN202210830915 A CN 202210830915A CN 115202351 B CN115202351 B CN 115202351B
Authority
CN
China
Prior art keywords
wheelchair robot
obstacle
obstacle avoidance
driver
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210830915.XA
Other languages
Chinese (zh)
Other versions
CN115202351A (en
Inventor
王义娜
张德龙
刘赛男
曹晨
杨俊友
周勃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang University of Technology
Original Assignee
Shenyang University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang University of Technology filed Critical Shenyang University of Technology
Priority to CN202210830915.XA priority Critical patent/CN115202351B/en
Publication of CN115202351A publication Critical patent/CN115202351A/en
Priority to ZA2023/00415A priority patent/ZA202300415B/en
Application granted granted Critical
Publication of CN115202351B publication Critical patent/CN115202351B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0255Control of position or course in two dimensions specially adapted to land vehicles using acoustic signals, e.g. ultra-sonic singals
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0214Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/60Other road transportation technologies with climate change mitigation effect
    • Y02T10/72Electric energy management in electromobility

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Acoustics & Sound (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to an obstacle avoidance method, in particular to an intelligent obstacle avoidance method considering intention and individual operation habits. The operation habit of the driver is considered, so that the wheelchair robot has self-learning capability, and the default steering amplitude of the wheelchair robot in the obstacle avoidance state can be dynamically adjusted, so that the individual requirements of the driver on the driving state of the wheelchair robot and the safe use requirements are met. Comprising the following steps: when the wheelchair robot avoids the obstacle, the driving intention of a driver is considered. And (3) carrying out self-learning according to the operation habit of a driver, and dynamically adjusting the default steering amplitude of the driver in the obstacle avoidance state. The risk evaluation standard is constructed, and safety restraint is carried out on reinforcement learning of the wheelchair robot.

Description

Intelligent obstacle avoidance method considering intention and individual operation habit
Technical Field
The invention relates to an obstacle avoidance method, in particular to an intelligent obstacle avoidance method considering intention and individual operation habits.
Background
Today, the popular application of wheelchair robots can improve the life well-being of the vulnerable population. In the process of driving the wheelchair robot to walk instead of walk, the old people give a running instruction to the wheelchair robot through the touch screen, so that the wheelchair robot runs according to the speed and the direction of the instruction, and the running intention of the driver is expressed. However, in an environment where people flow or ground obstacles are disordered, a driver needs extremely high skill and response speed when the driver finishes collision-free running through the operation, and the old is easy to fatigue and not concentrate attention, so that collision accidents caused by misoperation are extremely easy to occur, and secondary injury is caused to a weak group.
Meanwhile, it is difficult for the weak groups such as the elderly or the disabled to directly adjust the internal parameters of the wheelchair robot to enable the wheelchair robot to reach a satisfactory driving state, because it costs huge learning and trial and error costs and even causes very dangerous situations.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides an intelligent obstacle avoidance method considering intention and individual operation habit.
In order to achieve the above purpose, the present invention adopts the following technical scheme that:
when the wheelchair robot avoids the obstacle, the driving intention of a driver is considered;
according to the operation habit of a driver, self-learning is carried out, and the default steering amplitude of the driver in the obstacle avoidance state is dynamically adjusted;
the risk evaluation standard is constructed, and safety restraint is carried out on reinforcement learning of the wheelchair robot.
Further, acquiring distance information of the obstacle by adopting an ultrasonic sensor; establishing automatic obstacle avoidance fuzzy reasoning to determine the distance of the obstacle and the automatic obstacle avoidance direction angle beta of the wheelchair robot a Relationship between them.
Further, the wheelchair robot records the instruction direction as beta according to the operation instruction of the driver e The method comprises the steps of carrying out a first treatment on the surface of the Final direction of travel β performed by wheelchair robot f Determining how much to follow the instruction direction of the driver according to the operation weight w, as in formula (1):
β f =β e w+β a (1-w) (1)
further, based on deviation beta of the driving intention of the driver from the automatic obstacle avoidance direction angle b Current travel speed v of wheelchair robot r To determine the operation weight w, based on fuzzy reasoning, in beta b And v r The method comprises the steps that an input-output relationship is established for a front piece and w is used as a rear piece;
β b =|β ae | (2)
Figure BDA0003748319700000021
wherein v is x ,v y The current component speeds of the wheelchair robot along the x-axis and the y-axis directions can be obtained by the speeds of four wheels based on the kinematics of the wheelchair robot, and are as follows:
Figure BDA0003748319700000022
wherein θ is the current direction angle of the wheelchair robot; l is the distance from the center of the wheelchair robot to each wheel; v 1 ,v 2 ,v 3 ,v 4 Is the current speed of the four wheels.
Further, the direction beta of the operation instruction of the driver in one obstacle avoidance period is collected e Is a discrete number series of (a) and (b) e Classifying the discrete number sequence of the barrier direction, and recording the angle of the barrier direction in the data as theta Nk (k=1, 2,., n), the intended angle away from the obstacle is noted as θ Fk And carrying out normalization processing to obtain a characteristic value C of the operation habit, wherein the characteristic value C is shown in a formula (5):
Figure BDA0003748319700000023
when the operation habit characteristic value C is a positive number, the driver considers that the obstacle is too close to the obstacle in the obstacle avoidance driving process, so that an operation instruction deviating from the obstacle is adopted to compensate the driving state of the wheelchair robot in the obstacle avoidance period.
Still further, a reinforcement learning reward function is established:
final driving direction beta of wheelchair robot f Adding an operation weight compensation term s as shown in formula (6):
β f =β e (w+s)+β a [1-(w+s)] (6)
converting the operation habit characteristic value into a reward function as shown in formula (7):
Figure BDA0003748319700000031
wherein, the value of the operation weight compensation item s is initially 0, the value of C can be positive or negative, and the rewarding value G of the wheelchair robot after a obstacle avoidance period is determined t The method comprises the steps of carrying out a first treatment on the surface of the The value of the operational weight compensation term s is in accordance with the prize value G t Continuously overlapped in a plurality of obstacle avoidance periodsAnd (5) substituting for learning.
Further, a man-machine interaction reinforcement learning model is established:
the wheelchair robot is regarded as an intelligent body in the reinforcement learning model, the driving of one obstacle avoidance period is regarded as one action of the intelligent body, and the rewarding function G is used t As a reward signal given by the environment, iteratively learning the value of the operation weight compensation item s; as shown in formula (8):
V(s t )←V(s t )+α[G t -V(s t )] (8)
wherein V(s) t ) It is the value of the compensation term s, and α is the learning rate of the algorithm.
Still further, a risk degree polynomial is established to constrain the reinforcement learning model:
establishing the included angle beta between the real-time running direction of the wheelchair robot and the detection direction of the sensor by using the risk degree polynomial r Distance d of obstacle i And the risk level R, the formula is as follows:
Figure BDA0003748319700000032
/>
Figure BDA0003748319700000041
R=μ d R da R d R a
wherein mu d Sum mu a The values of two coefficients, d min The distance value of the obstacle which is the shortest and acceptable can be set independently;
d max the threshold value of the risk degree R is set independently for the furthest distance which can be detected by the sensor;
calculating the risk of each ultrasonic sensor installed on the wheelchair robot; take the value R of the maximum risk max : if in a certain obstacle avoidance period, R max The value of the reward function in the obstacle avoidance period is invalidated to aboutBeam effects.
Further, the number of the ultrasonic sensors is 6, and the 6 sensors have 6 dangerous values R 1 ,R 2 ,R 3 ,R 4 ,R 5 ,R 6
Taking the value of the maximum risk:
R max =max{R 1 ,…,R 6 } (10)
compared with the prior art, the invention has the beneficial effects.
According to the invention, the operation habit of the driver is considered, so that the wheelchair robot has self-learning capability, and the default steering amplitude of the wheelchair robot in the obstacle avoidance state can be dynamically adjusted, so that the personalized requirements and the safe use requirements of the driver on the driving state of the wheelchair robot are met.
Drawings
The invention is further described below with reference to the drawings and the detailed description. The scope of the present invention is not limited to the following description.
FIG. 1 is a schematic diagram of platform hardware data communication logic.
Fig. 2 is a schematic view of an ultrasonic sensor arrangement.
FIG. 3 is a graph of obstacle distance fuzzification and its membership function.
FIG. 4 is a graph showing obstacle avoidance direction ambiguity and its membership function.
Fig. 5 is a fuzzy rule table of fuzzy inference 1.
FIG. 6 is a system flow framework of the obstacle avoidance algorithm.
FIG. 7 is a membership function for overall velocity.
FIG. 8 is a membership function for an angle difference.
Fig. 9 is a fuzzy rule table of fuzzy inference 2.
Fig. 10 is a map of a test field.
Fig. 11 is a travel route map without operation.
Fig. 12 is a travel route map when there is an erroneous operation.
Fig. 13 is a diagram showing the comparison of the traveling direction and the command direction.
Fig. 14 is a conceptual diagram of obstacle avoidance distance.
Fig. 15 is a schematic diagram of an operation habit.
FIG. 16 is a general flowchart of an algorithm with learning mechanism.
Fig. 17 is a conceptual diagram of risk.
Fig. 18 is a graph of evaluation and operation weight compensation term data trend.
Fig. 19 is a diagram showing a comparison of initial-end state data.
Detailed Description
Specific examples: the intelligent obstacle avoidance method considering the intention and the individual operation habit comprises the following steps:
1. in the driving process of the wheelchair robot, if the wheelchair robot encounters an obstacle, the wheelchair robot can start the obstacle avoidance function, and the wheelchair robot can safely bypass the obstacle and simultaneously consider the driving intention of a driver to a certain extent.
2. The wheelchair robot can perform self-learning according to the operation habit of a driver, and dynamically adjusts the default steering amplitude of the wheelchair robot in an obstacle avoidance state so as to meet the personalized requirements and the safe use requirements of the driver on the driving state of the wheelchair robot.
3. The method for establishing the risk degree polynomial constructs a risk degree evaluation standard based on sensor information, and carries out safety constraint on a reinforcement learning algorithm of the wheelchair robot.
The invention relates to a method for interaction between a driver and a wheelchair robot, detection of environmental barriers and a driving scheme of the wheelchair robot, which are characterized in that data communication logic among various hardware is required to be determined first, and each functional module is coordinated and matched. A schematic diagram of the platform hardware data communication logic is shown in fig. 1. The Mysql5.5 database is adopted as data storage and processing software in the notebook computer, the Visual Studio 2017 is adopted as a programming development environment, and the C++ language is used for writing a designed control algorithm model, so that the PC has various necessary functions required by the PC as an upper computer master controller for coordinating hardware data communication. When the wheelchair robot keeps away the barrier and traveles, the ultrasonic sensor as the lower computer component sends the barrier information to the upper computer PC through the Arduino embedded development board. Meanwhile, a driver uses the android touch screen as a man-machine interface to give a running instruction to the wheelchair robot, and the running instruction carries out data transmission through a Socket network communication technology.
After the upper computer PC acquires the obstacle information and the driving instruction of the driver, the driving instruction is issued to the execution motor of the wheelchair robot through the RS232 data communication protocol according to the algorithm logic, so that the wheelchair robot completes driving and intelligent obstacle avoidance tasks under coordinated control.
The wheelchair robot is enabled to have the capability of automatically bypassing the obstacle in the obstacle avoidance driving process. The ultrasonic sensors symmetrically arranged on the whole body of the wheelchair robot are adopted to detect obstacle distance information in the environment in real time, and the situation of the installation position of the ultrasonic sensors on the whole body of the wheelchair robot is shown in figure 2. The symbols D1, D2, …, D6 in the drawings denote the mounting positions of the ultrasonic sensors on the wheelchair robot, and the detected distances in the respective directions. When the wheelchair robot runs, the upper computer PC continuously acquires distance information detected by the ultrasonic sensor in a clock period of 0.2s, and the distance information can be accurate to millimeter by taking meter (m) as a unit. The distance data is obfuscated and a suitable membership function is established, as shown in fig. 3. The distance of the obstacle is divided into a Far degree and a Near degree, and the fuzzy linguistic variable is { Far, near } = { Far, near }. The corresponding element of the fuzzy language set is F (far), N (near). The discourse domain [0,1.5m ] is taken, and the membership function is formed by combining Z-shaped and S-shaped curves.
The method comprises the steps of carrying out fuzzification processing on the direction angle of the wheelchair robot in real time, establishing a corresponding membership function, and setting the fuzzification segmentation limit of the forward driving direction angle of the wheelchair robot to be 5, wherein the fuzzification language variable is { maximum degree Right, general degree Right, forward, general degree Left, maximum degree Left } = { Right-Big, right-Middle, zero, left-Middle, left-Big }. The corresponding element of the fuzzy linguistic set is RB (largely to the right), RM (largely to the right), ZO (forward), LM (largely to the left), LB (largely to the left). The domain of arguments is [0, 180deg ], and the membership function is designed by adopting a triangle scheme, as shown in FIG. 4.
In order to enable the wheelchair robot to have the function of automatic obstacle avoidance, fuzzy reasoning rules of obstacle distances and automatic obstacle avoidance directions are established, fuzzy reasoning 1 is recorded, and the specific fuzzy rules are shown in fig. 5. Once the wheelchair robot detects an obstacle during running, the running direction capable of safely bypassing the obstacle can be obtained through fuzzy reasoning 1.
In the obstacle avoidance driving process of the wheelchair robot, the real-time automatic obstacle avoidance direction angle beta can be deduced by taking the obstacle distance information obtained by the ultrasonic sensor as the front piece input condition of the fuzzy reasoning 1 a . In the upper computer PC, the expected direction of the wheelchair robot actually running takes the automatic obstacle avoidance direction angle into consideration, so that the wheelchair robot has the capability of bypassing the obstacle in the environment in the obstacle avoidance process.
The invention ensures that the wheelchair robot does not completely deprive the operation authority of a driver in the obstacle avoidance driving process. The system flow frame of the obstacle avoidance algorithm is shown in fig. 6, and a driver gives an operation instruction to the wheelchair robot through the android touch screen device to express the driving intention, and records the instruction direction of the driver as beta e . Final direction of travel β performed by wheelchair robot f The instruction direction of the driver is followed to a certain extent according to the operation weight w, and the following formula is shown:
β f =β e w+β a (1-w) (1)
in the running process of the wheelchair robot, the upper computer PC acquires the information of rotary encoders of four execution motors of the omnidirectional wheel system in real time through a communication protocol, and finally acquires the rotation linear speeds v of four wheels through the reduction ratio of the speed reducer 1 ,v 2 ,v 3 ,v 4
Deviation beta based on driving intention of driver and automatic obstacle avoidance direction angle b Current travel speed v of wheelchair robot r To determine the operational weight w, we use β based on fuzzy reasoning b And v r And establishing an input-output relation of the front piece and the rear piece by taking w as the rear piece, and marking the relation as fuzzy reasoning 2.
β b =|β ae | (2)
Figure BDA0003748319700000081
Wherein v is x ,v y The current component speeds of the wheelchair robot along the x-axis and the y-axis directions can be obtained by the speeds of four wheels based on the kinematics of the wheelchair robot, and are as follows:
Figure BDA0003748319700000082
where θ is the current direction angle of the wheelchair robot and l is the distance from the center of the wheelchair robot to each wheel.
The value of the operation weight w obtained by the fuzzy reasoning 2 is used for balancing the following degree of the driving state of the wheelchair robot to the intention direction of the user in the obstacle avoidance driving process. And fuzzifying two input variables affecting the operation weight and establishing a fuzzy inference rule.
Overall speed v of wheelchair robot r The fuzzy division number of (a) is set to 3, the speed is divided into three degrees of 'Big', 'Middle', 'Small', the fuzzy language variable is { Big, middle and Small } = { Big, middle and Small }, and the element corresponding to the fuzzy language set is B (Big), M (Middle) and S (Small). The argumentation domain is [0,0.4m/s ]]The membership function is established using a triangular scheme, as shown in FIG. 7.
Will be the direction difference beta b The blurring processing is divided into three degrees of large, medium and Small, the blurring language variable is { large, medium and Small } = { Big, middle and Small }, and the element corresponding to the blurring language set is B (large), M (medium) and S (Small). The argument is [0, 180deg ]]A triangle membership function is established as shown in fig. 8.
From the general knowledge it is inferred that in an environment where complex obstacles are present, the higher the overall speed at which the wheelchair robot is travelling, the higher the technical requirements for the user to operate the wheelchair robot, in which case the user will be given a smaller weight in the intended directionThe probability of collision accidents caused by misoperation is reduced, and the driving process is easier. Equivalent difference beta b Too large, since it is indicated that the instruction issued by the driver to the wheelchair robot is to travel toward an obstacle, the value of the operation weight should be appropriately reduced in order to ensure that no collision occurs. Fuzzy inference 2 is built according to such a principle as shown in fig. 9.
The intelligent obstacle avoidance method is verified through experiments.
An obstacle environment as shown in fig. 10 is constructed, if the obstacle avoidance function is not present and the driver is not flexibly operated, the wheelchair robot can travel in a direction of vertically and forwards 90 degrees according to a default instruction of the wheelchair robot, and no matter how the starting position of the wheelchair robot is adjusted, no collision-free passing path can be realized. When the obstacle avoidance function is applied, a driver can drive around two obstacles by taking WR under the condition of no operation and safely drive without collision only by setting the expected speed. Meanwhile, in order to be rigorous in experiments, certain misoperation instructions or even operation instructions which are intentionally against the obstacle avoidance direction need to be given in the obstacle avoidance running process of the wheelchair robot, and the movement state of the wheelchair robot when the intentional intervention of a person exists is examined. The influence of misoperation on the motion state of the wheelchair robot is compared. The recruiter driven the wheelchair robot at the starting position shown in the figure to perform the experiment, with the aim of collision-free travel to the end point along the path shown by the green arrow, and each person performed two experiments to perform the swap test in order to ensure the randomness of the experiment. In this process, the contrast effect when there is an intention intervention of the driver is emphasized, and the operation history of the tester throughout the driving is recorded. And acquiring position information by using a UWB indoor positioning system, and acquiring motor revolution information by using an encoder. The data of a certain tester a is randomly selected and plotted, and when the expected speed is 0.2m/s, the travel route of the wheelchair robot in the experimental site is shown in fig. 11 and 12. Fig. 11 is a travel route of the wheelchair robot when the driver does not operate the wheelchair robot in the whole course, and fig. 12 is a travel route when there is an erroneous operation.
The rotation speeds of the four motors are converted by the speed reducers to respectively obtain the rotation speeds of the four wheels of the omnidirectional wheel system, and then the overall speed of the wheelchair robot in the running process can be obtained through inverse kinematics calculation. To examine the influence of the driver's intention participation on the motion state of the wheelchair robot, the overall speed direction change and the driver's intention instruction direction during obstacle avoidance travel of the wheelchair robots in two comparative experiments were recorded, as shown in fig. 13.
According to the invention, the individuation feeling and the demand of the driver on the running state of the wheelchair robot are judged according to the operation habit of the driver. And setting the whole process of detecting the obstacle to start obstacle avoidance behavior until the wheelchair robot completely bypasses the obstacle and finishes obstacle avoidance driving as an obstacle avoidance period in the driving process of the wheelchair robot.
The upper computer timer collects the direction angle beta every 0.2s clock period a In the direction angle beta a Not counting vertically forward by 90 degrees until beta a And (5) stopping timing until 90 degrees of the vertical forward direction is restored, so that the time of one obstacle avoidance cycle of the wheelchair robot can be obtained.
The default running state of the wheelchair robot is vertical forward, and obstacle avoidance actions are executed when obstacles are encountered, and the turning amplitude of the wheelchair robot determines subjective feelings of a driver on the obstacle avoidance distance. Fig. 14 is a schematic diagram of the obstacle avoidance distance concept, and it can be seen that the obstacle avoidance distance is larger when the turning range is larger, and conversely, the obstacle avoidance distance is smaller. The driver is sensitive to the obstacle avoidance distance in one obstacle avoidance period, and fear is generated due to collision when the obstacle avoidance distance is close, and the inner core generates a pressing sense. Otherwise, when the obstacle avoidance distance is too far, some testers feel the waste of space. According to the obstacle avoidance algorithm logic, a driver can add own operation to compensate the running state of the wheelchair robot so as to meet the individual requirements of the driver on the running state of the wheelchair robot. This results in a different operating habit belonging to each individual, as shown in fig. 15.
Collecting the instruction direction beta of the driver operation in a obstacle avoidance period at a frequency of 5 times per second e And an automatic obstacle avoidance direction beta a Is a discrete number of columns of (a). For angle value beta a And beta e A determination is made that the driver's direction of operation is far from the obstacle if both are greater than 90 degrees or less than 90 degrees, and that the driver's direction of operation is biased toward the obstacle if one of the two angle values is greater than 90 degrees and the other is less than 90 degrees.
Beta will be e Classifying the discrete number sequence of the barrier direction, and recording the angle of the barrier direction in the data as theta Nk (k=1, 2,., n), the intended angle away from the obstacle is noted as θ Fk And carrying out normalization processing to obtain a characteristic value C of the operation habit, wherein the specific calculation method comprises the following steps:
Figure BDA0003748319700000111
the operation habit characteristic value C is a positive number, which indicates that the driver considers that the obstacle is too close to the obstacle in the obstacle avoidance driving process, and a pressing sense is generated, so that an operation instruction deviating from the obstacle is adopted to compensate the driving state of the wheelchair robot most of the time in the obstacle avoidance period.
The present invention resides in establishing a suitable reinforcement learning reward function. Final driving direction beta of wheelchair robot f The operational weight compensation term s is added as follows:
β f =β e (w+s)+β a [1-(w+s)] (6)
converting the operation habit characteristic value into a reward function as follows:
Figure BDA0003748319700000112
wherein the value of the compensation term s is initially 0, and the operation habit characteristic values C and C can be obtained after an obstacle avoidance period, so that the rewarding value G of the wheelchair robot after the obstacle avoidance period is determined t
The invention establishes a man-machine interaction reinforcement learning model. The wheelchair robot is regarded as an intelligent body in the reinforcement learning model, and the driving of one obstacle avoidance period is regarded as intelligenceOne action of energy body, using reward function G t As a reward signal given by the environment, iterative learning is carried out on the operation weight compensation term s, and an iterative formula is as follows:
V(s t )←V(s t )+α[G t -V(s t )] (8)
wherein V(s) t ) It is the value of the compensation term s, and α is the learning rate of the algorithm. The algorithm flow of the intelligent obstacle avoidance method added with the self-learning mechanism is shown in figure 16.
The invention establishes a risk degree polynomial to constrain the reinforcement learning model. The risk is used to describe how close the wheelchair robot is to the collision situation during travel, the higher the risk means that the wheelchair robot is more likely to collide. It is inferred from common sense experience that when an obstacle detected by a sensor is very close, there is a high probability that a collision accident will occur, but not necessarily. As in the case of the D6 sensor in fig. 17, a close obstacle has been detected on the right side, but at this time, if the wheelchair robot remains in a vertically forward traveling state, no collision occurs. Therefore, the real-time driving direction of the wheelchair robot and the distance between the obstacles detected by the sensor need to be comprehensively considered, and the general situation is summarized. Included angle beta between real-time running direction of wheelchair robot and detection direction of sensor r Distance d of obstacle i And the risk level R, as follows:
Figure BDA0003748319700000121
Figure BDA0003748319700000122
R=μ d R da R d R a (9)
wherein mu d Sum mu a The values of two coefficients, d min The minimum acceptable obstacle distance value can be set autonomously. d, d max Is detected by a sensorThe furthest distance measured. The threshold value of the risk is set autonomously.
And calculating the risk of each ultrasonic sensor arranged on the whole body of the wheelchair robot. The 6 sensors have 6 dangerous values R 1 ,R 2 ,R 3 ,R 4 ,R 5 ,R 6 The method comprises the steps of carrying out a first treatment on the surface of the Taking the value of the maximum risk:
R max =max{R 1 ,…,R 6 } (10)
if in a certain obstacle avoidance period, R max And if the value is larger than the set threshold value, the value of the reward function in the obstacle avoidance period is invalidated, and the constraint effect is achieved.
The invention adopts the modes of experiments and questionnaires to determine the optimal learning rate and verifies the effectiveness of the method.
In order to verify the effectiveness of the interactive reinforcement learning model in practical application, the changes of subjective satisfaction degree of a driver in the obstacle avoidance driving process by using the wheelchair robot are necessarily investigated. If the initial running state of the wheelchair robot is satisfactory to the driver, the meaning of the self-learning mechanism of the wheelchair robot cannot be represented. Therefore, when a tester is recruited to perform experiments, a self-learning mechanism of the wheelchair robot needs to be canceled, and under the condition that parameters of an operation weight compensation item of the wheelchair robot are maintained unchanged, the famous person can perform sufficient driving experience until subjective feeling of the famous person on the driving state of the wheelchair robot is determined. Because the driver's main perception of the driving state of the wheelchair robot is about to look like the obstacle in the present obstacle avoidance driving process, the subjective perception of the driver is summarized into five types of labels, namely ' too close ', ' too far ', ' near ', ' far ', and ' proper ', and the person is selected and recorded in the five types of conditions after the tester determines the personal perception.
The two conditions of "too close" and "too far" mean that the tester is not satisfied with the running state of the wheelchair robot, and the opposite "suitable" means satisfaction. In order to highlight the transition from dissatisfaction to satisfaction of subjective feelings of testers in the experimental process, the initial state setting of the wheelchair robot is required to be adjusted to be dissatisfaction of the testers in combination with the questionnaire investigation condition of the testers, then a self-learning mechanism of the wheelchair robot is started, and whether the wheelchair robot can learn autonomously to achieve a satisfactory running state of the testers after driving for a plurality of obstacle avoidance periods is inspected. If yes, stopping the test when the subjective feeling of the tester is satisfied, and recording the rewarding value of each obstacle avoidance period, the change of the operation weight compensation item, the learning rate of the reinforcement learning model and the wheelchair robot running state data. If not, the learning rate of the interactive reinforcement learning model needs to be changed, and the experiment is continued to search for the value of the optimal learning rate. Since the degree of satisfaction itself is a subjective perception of the individual, if one wants to fully investigate the effectiveness of the interactive reinforcement learning model, it is not enough to rely on only a single person or individual data, and thus a certain number of testers need to be recruited, each of which performs the above-described experimental steps and records the questionnaire results.
In the experiment, 15 testers are recruited, the test is carried out according to the experimental plan introduced in the previous section, each tester at least guarantees to complete three experimental tests, and the situation that the testers have driving fatigue is noticed, so that the time interval of the three experiments is at least two days. After each person performs three experiments, the best learning rate is summarized by himself. Considering the stability requirements of wheelchair robot travel and the exploratory requirements of reinforcement learning algorithm, the learning rate is not set too high during the experiment, where the value of learning rate α is selected from only 0.05, 0.1, 0.15, 0.2 and 0.25. 7 persons consider 0.1 as the optimal learning rate, and 4 persons consider 0.15 as the optimal learning rate. Under the condition that the learning rate is 0.1, experiments are performed according to the steps required by the experimental design, and a set of data is randomly selected, as shown in fig. 18.
The test staff generating the group of data know that the initial state of the wheelchair robot is particularly dangerous, and the wheelchair robot always starts obstacle avoidance action only when being close to an obstacle in the obstacle avoidance driving process, and the turning amplitude is not large. Therefore, in the obstacle avoidance process, a tester adds own operation instruction to compensate the running state of the wheelchair robot, and enlarges the obstacle avoidance angle. After driving for several obstacle avoidance cycles, the test person is satisfied with the driving state. In order to facilitate observation of the transition of the running state of the wheelchair robot, the wheelchair robot can avoid the obstacle to run at the expected speed of 0.2m/s and record the actual running direction angle value at the same initial position of the same obstacle environment, and as shown in fig. 19, the change condition of the running state of the wheelchair robot can be seen. Thus, the effectiveness of the method is demonstrated both experimentally and by questionnaires.
It should be understood that the foregoing detailed description of the present invention is provided for illustration only and is not limited to the technical solutions described in the embodiments of the present invention, and those skilled in the art should understand that the present invention may be modified or substituted for the same technical effects; as long as the use requirement is met, the invention is within the protection scope of the invention.

Claims (5)

1. An intelligent obstacle avoidance method considering intention and individual operation habit is characterized in that: the method comprises the following steps:
step 1, when the wheelchair robot avoids an obstacle, considering the driving intention of a driver;
step 2, self-learning is carried out according to the operation habit of a driver, and the default steering amplitude of the driver in the obstacle avoidance state is dynamically adjusted;
step 3, constructing a risk evaluation standard, and performing safety restraint on reinforcement learning of the wheelchair robot;
acquiring distance information of the obstacle by adopting an ultrasonic sensor; establishing automatic obstacle avoidance fuzzy reasoning to determine the distance of the obstacle and the automatic obstacle avoidance direction angle beta of the wheelchair robot a A relationship between;
the wheelchair robot records the instruction direction as beta according to the operation instruction of the driver e The method comprises the steps of carrying out a first treatment on the surface of the Final direction of travel β performed by wheelchair robot f Determining how much to follow the instruction direction of the driver according to the operation weight w, as in formula (1):
β f =β e w+β a (1-w)(1)
based on driver's intention of traveling and automatic avoidanceDeviation beta of obstacle direction angle b Current travel speed v of wheelchair robot r To determine the operation weight w, based on fuzzy reasoning, in beta b And v r The method comprises the steps that an input-output relationship is established for a front piece and w is used as a rear piece;
β b =|β ae |(2)
Figure FDA0004076077990000011
wherein v is x ,v y The current component speeds of the wheelchair robot along the x-axis and the y-axis directions can be obtained by the speeds of four wheels based on the kinematics of the wheelchair robot, and are as follows:
Figure FDA0004076077990000012
wherein θ is the current direction angle of the wheelchair robot; l is the distance from the center of the wheelchair robot to each wheel; v 1 ,v 2 ,v 3 ,v 4 Current speeds for four wheels;
collecting the operating instruction direction beta of a driver in a obstacle avoidance period e Is a discrete number series of (a) and (b) e Classifying the discrete number sequence of the barrier direction, and recording the angle of the barrier direction in the data as theta Nk (k=1, 2,., n), the intended angle away from the obstacle is noted as θ Fk And carrying out normalization processing to obtain a characteristic value C of the operation habit, wherein the characteristic value C is shown in a formula (5):
Figure FDA0004076077990000021
when the operation habit characteristic value C is a positive number, the driver considers that the obstacle is too close to the obstacle in the obstacle avoidance driving process, so that an operation instruction deviating from the obstacle is adopted to compensate the driving state of the wheelchair robot in the obstacle avoidance period.
2. The method according to claim 1, characterized in that: establishing a reinforcement learning reward function:
final driving direction beta of wheelchair robot f Adding an operation weight compensation term s as shown in formula (6):
β f =β e (w+s)+β a [1-(w+s)](6)
converting the operation habit characteristic value into a reward function as shown in formula (7):
Figure FDA0004076077990000022
wherein, the value of the operation weight compensation item s is initially 0, the value of C can be positive or negative, and the rewarding value G of the wheelchair robot after a obstacle avoidance period is determined t The method comprises the steps of carrying out a first treatment on the surface of the The value of the operational weight compensation term s is in accordance with the prize value G t And (5) continuously iterating learning in a plurality of obstacle avoidance periods.
3. The method according to claim 2, characterized in that: establishing a man-machine interaction reinforcement learning model:
the wheelchair robot is regarded as an intelligent body in the reinforcement learning model, the driving of one obstacle avoidance period is regarded as one action of the intelligent body, and the rewarding function G is used t As a reward signal given by the environment, iteratively learning the value of the operation weight compensation item s; as shown in formula (8):
V(s t )←V(s t )+α[G t -V(s t )](8)
wherein V(s) t ) It is the value of the compensation term s, and α is the learning rate of the algorithm.
4. A method according to claim 3, characterized in that: establishing a risk degree polynomial to constrain the reinforcement learning model:
establishing the included angle beta between the real-time running direction of the wheelchair robot and the detection direction of the sensor by using the risk degree polynomial r Obstacle and obstacleDistance d i And the risk level R, the formula is as follows:
Figure FDA0004076077990000031
R=μ d R da R d R a (9)
wherein mu d Sum mu a The values of two coefficients, d min The distance value of the obstacle which is the shortest and acceptable can be set independently;
d max the threshold value of the risk degree R is set independently for the furthest distance which can be detected by the sensor;
calculating the risk of each ultrasonic sensor installed on the wheelchair robot; take the value R of the maximum risk max : if in a certain obstacle avoidance period, R max And if the value is larger than the set threshold value, the value of the reward function in the obstacle avoidance period is invalidated, and the constraint effect is achieved.
5. The method according to claim 4, wherein: the number of the ultrasonic sensors is 6, and 6 sensors have 6 dangerous values R 1 ,R 2 ,R 3 ,R 4 ,R 5 ,R 6
Taking the value of the maximum risk:
R max =max{R 1 ,…,R 6 } (10)。
CN202210830915.XA 2022-07-15 2022-07-15 Intelligent obstacle avoidance method considering intention and individual operation habit Active CN115202351B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210830915.XA CN115202351B (en) 2022-07-15 2022-07-15 Intelligent obstacle avoidance method considering intention and individual operation habit
ZA2023/00415A ZA202300415B (en) 2022-07-15 2023-01-10 Intelligent obstacle avoidance method considering intention and individual operation habits

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210830915.XA CN115202351B (en) 2022-07-15 2022-07-15 Intelligent obstacle avoidance method considering intention and individual operation habit

Publications (2)

Publication Number Publication Date
CN115202351A CN115202351A (en) 2022-10-18
CN115202351B true CN115202351B (en) 2023-04-25

Family

ID=83582675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210830915.XA Active CN115202351B (en) 2022-07-15 2022-07-15 Intelligent obstacle avoidance method considering intention and individual operation habit

Country Status (2)

Country Link
CN (1) CN115202351B (en)
ZA (1) ZA202300415B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108357594A (en) * 2018-01-26 2018-08-03 浙江大学 A kind of unmanned bicycle of self-balancing based on intellectual evolution and its control method of competition and cooperation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699124A (en) * 2013-12-04 2014-04-02 北京工业大学 Fuzzy neural network control method for omni-directional intelligent wheelchair to avoid obstacle
CN108469815A (en) * 2018-02-27 2018-08-31 重庆嵩岳贸易服务有限公司 A kind of self-navigation of computer deep learning and control loop and its method based on intention
CN109966064B (en) * 2019-04-04 2021-02-19 北京理工大学 Wheelchair with detection device and integrated with brain control and automatic driving and control method
US11458987B2 (en) * 2020-02-26 2022-10-04 Honda Motor Co., Ltd. Driver-centric risk assessment: risk object identification via causal inference with intent-aware driving models

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108357594A (en) * 2018-01-26 2018-08-03 浙江大学 A kind of unmanned bicycle of self-balancing based on intellectual evolution and its control method of competition and cooperation

Also Published As

Publication number Publication date
CN115202351A (en) 2022-10-18
ZA202300415B (en) 2023-03-29

Similar Documents

Publication Publication Date Title
Doshi et al. Tactical driver behavior prediction and intent inference: A review
Zhao et al. Design of a control system for an autonomous vehicle based on adaptive-pid
Mendonça et al. Autonomous navigation system using event driven-fuzzy cognitive maps
CN112356841B (en) Vehicle control method and device based on brain-computer interaction
Pasquier et al. Fuzzylot: a novel self-organising fuzzy-neural rule-based pilot system for automated vehicles
Rath et al. Personalised lane keeping assist strategy: Adaptation to driving style
Pradhan et al. Neuro-fuzzy technique for navigation of multiple mobile robots
Das et al. A machine learning approach for collision avoidance and path planning of mobile robot under dense and cluttered environments
Tahboub et al. A neuro-fuzzy reasoning system for mobile robot navigation
Rojas et al. Performance evaluation of an autonomous vehicle using resilience engineering
CN115202351B (en) Intelligent obstacle avoidance method considering intention and individual operation habit
Lee et al. Fuzzy wall-following control of a wheelchair
Kong et al. Path Planning of a Multifunctional Elderly Intelligent Wheelchair Based on the Sensor and Fuzzy Bayesian Network Algorithm
Verma et al. Scalable robot fault detection and identification
Guo et al. Adaptive dynamic surface longitudinal tracking control of autonomous vehicles
Bao et al. Data-Driven Risk-Sensitive Control for Personalized Lane Change Maneuvers
Buchholz et al. Towards adaptive worker assistance in monitoring tasks
Ullah et al. Integrated collision avoidance and tracking system for mobile robot
US11794780B2 (en) Reward function for vehicles
Alonso et al. Knowledge-based intelligent diagnosis of ground robot collision with non detectable obstacles
Luo et al. Automatic guided intelligent wheelchair system using hierarchical grey-fuzzy motion decision-making algorithms
Narayanan When is it right and good for an intelligent autonomous vehicle to take over control (and hand it back)?
Shitsukane Fuzzy logic model for obstacles avoidance mobile robot in static unknown environment
Tsuneyoshi et al. A brake assisting function for railway vehicles using fuzzy logic: A comparison study for different fuzzy inference types
Al-Din Decomposed fuzzy controller for reactive mobile robot navigation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant