CN112472530A - Reward function establishing method based on walking ratio trend change - Google Patents

Reward function establishing method based on walking ratio trend change Download PDF

Info

Publication number
CN112472530A
CN112472530A CN202011387443.2A CN202011387443A CN112472530A CN 112472530 A CN112472530 A CN 112472530A CN 202011387443 A CN202011387443 A CN 202011387443A CN 112472530 A CN112472530 A CN 112472530A
Authority
CN
China
Prior art keywords
walking ratio
sequence
reward
walking
flexion angle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011387443.2A
Other languages
Chinese (zh)
Other versions
CN112472530B (en
Inventor
孙磊
李云飞
董恩增
佟吉刚
陈鑫
曾德添
龚欣翔
李成辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Technology
Original Assignee
Tianjin University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Technology filed Critical Tianjin University of Technology
Priority to CN202011387443.2A priority Critical patent/CN112472530B/en
Publication of CN112472530A publication Critical patent/CN112472530A/en
Application granted granted Critical
Publication of CN112472530B publication Critical patent/CN112472530B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H3/00Appliances for aiding patients or disabled persons to walk about
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H2201/00Characteristics of apparatus not provided for in the preceding codes
    • A61H2201/16Physical interface with patient
    • A61H2201/1602Physical interface with patient kind of interface, e.g. head rest, knee support or lumbar support
    • A61H2201/165Wearable interfaces
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H2201/00Characteristics of apparatus not provided for in the preceding codes
    • A61H2201/16Physical interface with patient
    • A61H2201/1657Movement of interface, i.e. force application means
    • A61H2201/1659Free spatial automatic movement of interface within a working area, e.g. Robot
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H2201/00Characteristics of apparatus not provided for in the preceding codes
    • A61H2201/50Control means thereof
    • A61H2201/5058Sensors or detectors
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H2201/00Characteristics of apparatus not provided for in the preceding codes
    • A61H2201/50Control means thereof
    • A61H2201/5097Control means thereof wireless

Abstract

The invention discloses a method for establishing a reward function based on walking ratio trend change, which comprises the following steps: calculating a step length D of a wearer of the exoskeleton robot; calculating a gait cycle T (k); calculating a walking ratio W according to the step length D and the gait cycle T (k); establishing a walking ratio sampling sequence and scoring sampling sequences in the walking ratio sampling sequence; and establishing a reward function model. The reward function model based on the walking ratio trend change can be applied to an algorithm for optimizing exoskeleton parameters, so that the efficiency of reinforcement learning is enhanced, and the exoskeleton parameters are promoted to be rapidly converged.

Description

Reward function establishing method based on walking ratio trend change
The technical field is as follows:
the invention belongs to the technical field of robots, and relates to a method for establishing a walking ratio reward function of a gait rehabilitation flexible exoskeleton robot, which can be applied to a control parameter self-adaptive control task of a flexible exoskeleton based on a reinforcement learning method.
(II) background technology:
the flexible exoskeleton robot can assist the old people with inconvenience in walking and strengthening the leg strength of the human body. Has wide application in rehabilitation, daily trip and other aspects. Due to the fact that large individual difference exists between people, at present, control parameters of the exoskeleton robot mostly need to be adjusted according to the self motion characteristics of a wearer, time and labor are consumed, and the body change of the wearer cannot be tracked.
The reinforcement learning can search the optimal strategy in the interaction with the environment and can learn autonomously. Therefore, the parameter adaptability of the robot can be greatly improved by applying reinforcement learning to the exoskeleton. Since the goal of reinforcement learning is to maximize the cumulative prize, the prize function plays a very important role. In supervised learning, a supervisory signal is provided by the training data. In reinforcement learning, the reward function plays a role of monitoring signals, and an Agent (Agent) carries out strategy optimization according to rewards.
The reward function is the key of the learning efficiency of the intelligent agent, the reward function mostly depends on the design of human experts at present, and the reward function which is difficult to design is difficult to solve some complex decision problems. Therefore, researchers put forward Meta Learning (Meta Learning), simulation Learning (emulation Learning) and other modes, and the intelligent agent learns to summarize corresponding reward functions from good strategies for guiding the reinforcement Learning process. However, the simulation Learning requires alternating iterations of Inverse Reinforcement Learning (Inverse Reinforcement Learning) and Reinforcement Learning, the process is complicated, and the simulation Learning depends on expert samples, which is not suitable for some occasions lacking expert samples. Researchers propose solutions for this purpose, including setting auxiliary tasks, introducing curiosity mechanisms and the like, which are still limited by generalization capability, and corresponding prior information needs to be provided by experts according to specific tasks, so that the sparse reward problem of reinforcement learning cannot be solved in a general sense.
How to design a reward function for promoting the exoskeleton parameters to be rapidly converged aiming at the problem of flexible exoskeleton parameter self-adaptation is a problem which needs to be solved urgently at present.
(III) the invention content:
the invention aims to provide a method for establishing a reward function based on walking ratio trend change, which can overcome the defects of the prior art, can reflect the trend change of the walking ratio, calculates the step length and the gait cycle by utilizing the output data of an MEMS (Micro-Electro-Mechanical System) attitude sensor to obtain the walking ratio, and establishes the reward function based on the walking ratio trend change to promote the rapid convergence of flexible exoskeleton parameters and enhance the adaptivity of the parameters.
The technical scheme of the invention is as follows: a method for establishing a reward function based on walking ratio trend changes is characterized by comprising the following steps:
(1) collecting hip joint flexion angle parameter signals of a wearer of the flexible exoskeleton robot, and finding out the maximum flexion angle theta of the hip jointmaxAnd minimum flexion angle thetaminIf the leg length of the wearer of the flexible exoskeleton robot is known to be l, the step length D of the wearer of the flexible exoskeleton robot can be obtained;
D=l(θmaxmin) (1)
(2) the method comprises the steps of placing a sensor at the middle position of the rear parts of left and right thighs of a wearer of the flexible exoskeleton robot, collecting hip joint flexion angle parameters of the wearer during normal walking in real time to obtain a flexion angle parameter curve of the hip joint of the wearer, and recording a wave trough moment as tTrough of waveAnd then the current gait cycle can be calculated as follows:
T(k)=ttrough of wave(k)-tTrough of wave(k-1) (2)
Namely: the current gait cycle is calculated by the values of two adjacent valley points;
the method for acquiring the flexion angle parameter curve of the hip joint of the wearer in the step (2) is as follows:
(2-1) collecting hip joint flexion angle parameter signals of a wearer of the flexible exoskeleton robot by using an attitude sensor, converting the hip joint flexion angle parameter signals into digital quantity signals, sending the digital quantity signals to a single chip microcomputer, and sending the digital quantity signals to a Personal Computer (PC) end; the data transmission between the single chip microcomputer and the PC end is that the single chip microcomputer transmits data to the PC end through a serial port communication and a Bluetooth module by using a wireless network.
(2-2) acquiring hip joint flexion angle parameter signals by using a serial port interface in MATLAB (matrix laboratory) installed at a PC (personal computer) end, and drawing a hip joint flexion angle parameter real-time curve through a plot function;
the real-time curve of the hip joint flexion angle parameter can also be directly displayed by using third-party upper computer software, such as an anonymous upper computer.
The collection of hip joint flexion angle parameter signals of the wearer of the flexible exoskeleton robot in the step (1) and the step (2) is realized through an MEMS attitude sensor, and the MEMS attitude sensor is provided with an ADC (analog to Digital converter) conversion module.
(3) Calculating the real sampling walking ratio W at the end of one gait cycle according to the step length D obtained in the step (1) and the gait cycle T (k) obtained in the step (2), as shown in the formula (3):
Figure BDA0002810077890000031
wherein W is the walking ratio, D is the step length, and the unit is m, N is the step frequency, and the unit is steps/min, TStep by stepIs gait cycle, in units of min;
(4) establishing a walking ratio sampling sequence, taking a sampling point every other period, setting a scoring mechanism shown in the formula (4) according to the number sequence convergence condition, and scoring sequence values in the analysis sequence;
|Wat present-WTarget|<|WLast sampling point-WTarget| (4)
Wherein, WAt presentWalk ratio, W, of the current sample pointTargetTo set a good walking ratio for healthy elderly, WLast sampling pointThe walking ratio of the last sampling point;
in the step (4), the sequence values in the analysis sequence are scored according to a scoring mechanism, specifically:
when WAt present>WTargetIf the walking ratio of the current sampling point is smaller than that of the last sampling point, setting the sequence value corresponding to the current time to be 1, otherwise, setting the sequence value to be 0;
when WAt present<WTargetIf the walking ratio of the current sampling point is larger than that of the last sampling point, setting the sequence value corresponding to the current time to be 1, otherwise, setting the sequence value to be 0;
selecting m sequence values containing the current time point from the analysis sequence, recording the number of 1 in the analysis sequence as P and the number of 0 in the analysis sequence as Q, and calculating the reward value after the exoskeleton robot executes the previous action according to a formula (5):
Figure BDA0002810077890000041
wherein Maximun is a maximum reward value set artificially, P is the number of 1 in an analysis sequence, Q is the number of 0 in the analysis sequence, and m is the number of sampling points in the analysis sequence;
Figure BDA0002810077890000042
a trend representing walking ratios over a plurality of cycles;
(5) sequentially scoring sampling sequences in the walking ratio sampling sequence to respectively obtain the number of P and Q, and obtaining a global reward value based on the walking ratio after the exoskeleton robot executes the previous action according to a global reward function formula (5); when the P value is high, i.e. P > Q, the walking ratio converges according to the expected trend, i.e. towards the walking ratio for a given healthy elderly, then the exoskeleton robot will get a positive reward; when Q is higher, i.e. Q > P walking ratio will diverge away from expected trend, the exoskeleton robot will get a negative reward;
(6) the reward function model is applied to a reinforcement learning algorithm for optimizing exoskeleton parameters, when the value function shown in the formula (6) is the maximum, the obtained strategy is the optimal strategy, and the adjustment of the walking ratio can be realized, so that the exoskeleton robot assists the old to walk and plays a role in rehabilitation.
vπ(s)=Eπ(Rt+1+γRt+22Rt+3+...|St=s) (6)
Wherein v isπ(s) is a cost function after action is taken in the case of strategy pi and state s; r is the reward function model mentioned above, Rt+1The reward at the moment t + 1; gamma is a reward attenuation factor at [0,1]To (c) to (d); stIs the state of the environment at time t.
The working principle of the invention is as follows: the range of the reward function is an important parameter that relates to the effectiveness of shaping and demonstrates the strongest impact on a simple reinforcement learning algorithm at runtime. Therefore, a maximum prize value is set, and the prize value is determined according to the proportion, so that the prize range can be restricted. When setting the global reward function, the goal is to constrain the overall trend of walking ratio over multiple cycles. Therefore, the trend of the walking ratio is scored, and if the current trend is convergent according to the target walking ratio, the current trend is set to be 1; if the current trend is divergence, setting 0; the number of 1 is P, the number of 0 is Q,
Figure BDA0002810077890000051
may represent a trend of walking ratio, P, over a plurality of cycles>Q is convergence, giving a positive reward, Q>P is divergence, giving a negative reward. The magnitude of the ratio indicates the degree of divergence or convergence, with better convergence yielding a larger reward value and overall more divergence yielding a smaller reward. Thus, the reward function is determined as shown in equation (5).
The problem of the walking ratio in a plurality of cycles converging according to an expected trend is solved through a global constraint reward function. The purpose of this reward function setting is to solve the problem of exoskeleton parameter adaptation. And judging the self-adaptation of the exoskeleton parameters by taking whether the value of the walking ratio is the walking ratio of the healthy old people or not as a reference. Through the reward function, after the intelligent agent executes the last action, the value of the current walking ratio is calculated, and the reward is obtained according to the reward function. The agent, the intelligent agent part of the robot, is the agent of the robot that is put into the environment to explore and learn. The intelligent agent accumulates the maximum reward, further adjusts the next action, outputs the parameters more suitable for the old to walk, obtains the action with the maximum reward, is the parameter of the action which enables the walking ratio to be always kept at the healthy walking ratio of the old, and is beneficial to the self-adaptive optimization of the exoskeleton parameters.
The method is mainly used for a parameter optimization algorithm of the exoskeleton robot. The criterion of the reasonability of the current exoskeleton parameters is whether the gait information accords with the gait information of healthy old people. In order to judge gait in real time, the project adopts the concept of walk ratio (walk radio) to describe the motion state of a human body, and the value is defined as the ratio of step length (m) to step frequency (step/s). Previous studies have shown that the walking ratio can be used to describe the gait pattern, which does not vary significantly for a particular subject with respect to physical performance, walking stability, concentration, etc. of the subject. The walking ratio has no significant difference for different healthy individuals, and the walking ratio of the normal gait of the old aged 60 years old or older is between 0.0044 and 0.0055.
The invention has the advantages that: the reward mechanism has the advantages that the reward value is determined according to the variation trend of the walking ratio, the walking ratio is subjected to global constraint in a plurality of periods, the problem that the behaviors of the flexible exoskeleton robot need to be subjected to constraint scoring by a reward function in reinforcement learning is solved, and the reward mechanism is simple and easy to implement; on the basis of improving the learning efficiency of the intelligent agent, the problem of sparse reward is avoided, namely the problem that the reward cannot be obtained in a long period of time in the learning of the intelligent agent does not exist. The blind exploration of the whole algorithm can be effectively avoided, the reinforcement learning efficiency of the flexible exoskeleton robot is improved, the robustness of the exoskeleton robot is enhanced, and the convergence of exoskeleton parameters according to an expected trend is ensured; the adaptability of exoskeleton parameters is improved.
(IV) description of the drawings:
fig. 1 is a schematic diagram illustrating an analysis sequence scoring principle of a walking trending reward mechanism in a method for establishing a reward function based on walking ratio trend changes according to the present invention.
Fig. 2 is a schematic view illustrating a gait cycle calculation principle in a method for establishing a reward function based on a walking ratio trend change according to the present invention.
Fig. 3 is a graph schematically showing the time-dependent change of the hip joint flexion angle in the method for establishing the reward function based on the walking ratio trend change according to the present invention.
(V) specific embodiment:
example (b): a method for establishing a reward function based on walking ratio trend changes is characterized by comprising the following steps:
(1) collecting hip joint flexion angle parameter signals of a wearer of the flexible exoskeleton robot by using the MEMS attitude sensor, and finding out the maximum flexion angle theta of the hip jointmaxAnd minimum flexion angle thetaminIf the leg length of the wearer of the flexible exoskeleton robot is known to be l, the step length D of the wearer of the flexible exoskeleton robot can be obtained;
D=l(θmaxmin) (1)
(2) placing the MEMS attitude sensors at the middle positions of the rear parts of the left thigh and the right thigh of a wearer of the flexible exoskeleton robot, and acquiring the hip joint flexion angle parameters of the wearer during normal walking in real time to acquire the flexion angle parameter curve of the hip joint of the wearer, wherein the wave trough time is recorded as t as shown in fig. 2Trough of waveAnd then the current gait cycle can be calculated as follows:
T(k)=ttrough of wave(k)-tTrough of wave(k-1) (2)
Namely: the current gait cycle is calculated by the values of two adjacent valley points;
the method for acquiring the flexion angle parameter curve of the hip joint of the wearer specifically comprises the following steps:
(2-1) converting the hip joint flexion angle parameter signal of the wearer of the flexible exoskeleton robot into a digital quantity signal, sending the digital quantity signal to a single chip microcomputer, and transmitting the digital quantity signal to a PC (personal computer) end by the single chip microcomputer through serial port communication and a Bluetooth module by using a wireless network;
(2-2) acquiring hip joint flexion angle parameter signals by using a serial port interface in MATLAB (matrix laboratory) installed at a PC (personal computer) end, and drawing a hip joint flexion angle parameter real-time curve through a plot function; and third-party upper computer software can also be directly used for displaying the curve, such as an anonymous upper computer.
(3) Calculating the real sampling walking ratio W at the end of one gait cycle according to the step length D obtained in the step (1) and the gait cycle T (k) obtained in the step (2), as shown in the formula (3):
Figure BDA0002810077890000071
wherein W is the walking ratio, D is the step length, and the unit is m, N is the step frequency, and the unit is steps/min, TStep by stepIs gait cycle, in units of min;
(4) establishing a walking ratio sampling sequence, taking a sampling point every other period, setting a scoring mechanism shown as a formula (4) according to a number sequence convergence condition as shown in figure 1, and scoring sequence values in an analysis sequence;
|Wat present-WTarget|<|WLast sampling point-WTarget| (4)
Wherein, WAt presentWalk ratio, W, of the current sample pointTargetTo set a good walking ratio for healthy elderly, WLast sampling pointThe walking ratio of the last sampling point;
when WAt present>WTargetIf the walking ratio of the current sampling point is smaller than that of the last sampling point, setting the sequence value corresponding to the current time to be 1, otherwise, setting the sequence value to be 0;
when WAt present<WTargetIf the walking ratio of the current sampling point is larger than that of the last sampling point, setting the sequence value corresponding to the current time to be 1, otherwise, setting the sequence value to be 0;
selecting m sequence values containing the current time point from the analysis sequence, recording the number of 1 in the analysis sequence as P and the number of 0 in the analysis sequence as Q, and calculating the reward value after the exoskeleton robot executes the previous action according to a formula (5) if the specific working mode is shown in figure 1:
Figure BDA0002810077890000081
wherein Maximun is a maximum reward value set artificially, P is the number of 1 in an analysis sequence, Q is the number of 0 in the analysis sequence, and m is the number of sampling points in the analysis sequence;
Figure BDA0002810077890000082
a trend representing walking ratios over a plurality of cycles;
(5) sequentially scoring sampling sequences in the walking ratio sampling sequence to respectively obtain the number of P and Q, and obtaining a global reward value based on the walking ratio after the exoskeleton robot executes the previous action according to a global reward function formula (5); when the P value is high, i.e. P > Q, the walking ratio converges according to the expected trend, i.e. towards the walking ratio for a given healthy elderly, then the exoskeleton robot will get a positive reward; when Q is higher, i.e. Q > P walking ratio will diverge away from expected trend, the exoskeleton robot will get a negative reward;
(6) the reward function model is applied to a reinforcement learning algorithm for optimizing exoskeleton parameters, when the value function shown in the formula (6) is the maximum, the obtained strategy is the optimal strategy, and the adjustment of the walking ratio can be realized, so that the exoskeleton robot assists the old to walk and plays a role in rehabilitation.
vπ(s)=Eπ(Rt+1+γRt+22Rt+3+...|St=s) (6)
Wherein v isπ(s) is a cost function after action is taken in the case of strategy pi and state s; r is the reward function model mentioned above, Rt+1The reward at the moment t + 1; gamma is a bonus attenuation factor and is a number,in [0,1 ]]To (c) to (d); stIs the state of the environment at time t.
The following examples are given for illustrative purposes. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. After reading the detailed steps and associated contents of the present invention, the skilled person can make various modifications or applications of the present invention, and the equivalents thereof also fall within the scope of the claims appended to the present application.
For example, in an algorithm that optimizes the exoskeleton's power assist parameters, the actuator selects a according to an action strategy given initial parameterstGiving the flexible exoskeleton execution of at
A is atThe behavior is selected by Agent at time t, and after the behavior is executed by environment, the environment state is represented by stConversion to st+1;stThe method comprises the steps that the Agent receives the state from the flexible exoskeleton at the time t; st+1Is a scalar award r for receiving feedback from a flexible exoskeletontAnd is in the next state;
rti.e. the reward function in the present invention. Is shown as
Figure BDA0002810077890000091
Wherein Maximum is a Maximum reward value set artificially, P is the number of 1 in an analysis sequence, Q is the number of 0 in the analysis sequence, and m is the number of sampling points in the analysis sequence;
Figure BDA0002810077890000092
a trend representing walking ratios over a plurality of cycles;
the derivation of Q and P in the reward function is shown in the scoring mechanism of fig. 1, which establishes a sequence of step ratio samples, taking one sample at every other cycle:
when WAt present>WTargetIf the walking ratio of the current sampling point is smaller than that of the last sampling point, setting the sequence value corresponding to the current time to be 1, otherwise, setting the sequence value to be 0;
when WAt present<WTargetIf the walking ratio of the current sampling point is larger than that of the last sampling point, setting the sequence value corresponding to the current time to be 1, otherwise, setting the sequence value to be 0;
then m sequence values containing the current time point are selected, the number of 1 in the analytic sequence is P, and the number of 0 in the analytic sequence is Q.
The walking ratio W is calculated by the formula:
Figure BDA0002810077890000093
wherein W is the walking ratio, D is the step length, and the unit is m, N is the step frequency, and the unit is steps/min, TStep by stepIs gait cycle, in units of min;
in the formula, the calculation formula of the step length D is D ═ l (θ)maxmin)。
Wherein, thetamaxIs the maximum flexion angle of the hip joint, thetaminIs the minimum flexion angle of the hip joint, and l is the leg length of the wearer of the flexible exoskeleton robot. Maximum flexion angle theta to hip jointmaxAnd minimum flexion angle thetaminThe measurement of (2) is realized by a MEMS attitude sensor.
As shown in fig. 2, in one gait cycle, there are two wave troughs and one wave crest, and the time of the wave trough is marked as tTrough of waveThen gait cycle TStep by stepIs given by the formula T (k) ═ tTrough of wave(k)-tTrough of wave(k-1)。
Flexible exoskeleton execution atReturning to r obtained by obtaining the flexible exoskeleton heuristictAnd st+1
The actuator converts the state into a process: (s)t,at,rt,st+1) Storing the parameters into an experience pool, and obtaining the parameters of the current time state and action after re-observation through a long short-term memory network
Figure BDA0002810077890000101
And will be
Figure BDA0002810077890000102
As a data set for training the online network. At the same time, willHeuristically derived(s)t,at,rt,st+1) And putting a reward function into the system, aiming at carrying out reward constraint, providing online strategy network and online Q network reference data and promoting the rapid convergence of the flexible exoskeleton parameters.

Claims (7)

1. A method for establishing a reward function based on walking ratio trend changes is characterized by comprising the following steps:
(1) collecting hip joint flexion angle parameter signals of a wearer of the flexible exoskeleton robot, and finding out the maximum flexion angle theta of the hip jointmaxAnd minimum flexion angle thetaminIf the leg length of the wearer of the flexible exoskeleton robot is known to be l, the step length D of the wearer of the flexible exoskeleton robot can be obtained;
D=l(θmaxmin) (1)
(2) the method comprises the steps of placing a sensor at the middle position of the rear parts of left and right thighs of a wearer of the flexible exoskeleton robot, collecting hip joint flexion angle parameters of the wearer during normal walking in real time to obtain a flexion angle parameter curve of the hip joint of the wearer, and recording a wave trough moment as tTrough of waveAnd then the current gait cycle can be calculated as follows:
T(k)=ttrough of wave(k)-tTrough of wave(k-1) (2)
Namely: the current gait cycle is calculated by the values of two adjacent valley points;
(3) calculating the real sampling walking ratio W at the end of one gait cycle according to the step length D obtained in the step (1) and the gait cycle T (k) obtained in the step (2), as shown in the formula (3):
Figure FDA0002810077880000011
wherein W is the walking ratio, D is the step length, and the unit is m, N is the step frequency, and the unit is steps/min, TStep by stepIs gait cycle, in units of min;
(4) establishing a walking ratio sampling sequence, taking a sampling point every other period, setting a scoring mechanism shown in the formula (4) according to the number sequence convergence condition, and scoring sequence values in the analysis sequence;
|Wat present-WTarget|<|WLast sampling point-WTarget| (4)
Wherein, WAt presentWalk ratio, W, of the current sample pointTargetTo set a good walking ratio for healthy elderly, WLast sampling pointThe walking ratio of the last sampling point;
(5) sequentially scoring sampling sequences in the walking ratio sampling sequence to respectively obtain the number of P and Q, and obtaining a global reward value based on the walking ratio after the exoskeleton robot executes the previous action according to a global reward function formula (5); when the P value is high, i.e. P > Q, the walking ratio converges according to the expected trend, i.e. towards the walking ratio for a given healthy elderly, then the exoskeleton robot will get a positive reward; when Q is higher, i.e. Q > P walking ratio will diverge away from expected trend, the exoskeleton robot will get a negative reward;
(6) when the reward function model is applied to a reinforcement learning algorithm for optimizing exoskeleton parameters, when the value function shown in formula (6) is the maximum, the obtained strategy is the optimal strategy, namely the adjustment of the walking ratio can be realized, so that the exoskeleton robot assists the old to walk and plays a role in rehabilitation;
vπ(s)=Eπ(Rt+1+γRt+22Rt+3+...|St=s) (6)
wherein v isπ(s) is a cost function after action is taken in the case of strategy pi and state s; r is the reward function model mentioned above, Rt+1The reward at the moment t + 1; gamma is a reward attenuation factor at [0,1]To (c) to (d); stIs the state of the environment at time t.
2. The method for establishing a reward function based on the trend change of the walking ratio as claimed in claim 1, wherein the curve of the flexion angle parameter of the hip joint of the wearer in the step (2) is obtained by:
(2-1) collecting hip joint flexion angle parameter signals of a wearer of the flexible exoskeleton robot by using an attitude sensor, converting the hip joint flexion angle parameter signals into digital quantity signals, sending the digital quantity signals to a single chip microcomputer, and sending the digital quantity signals to a PC (personal computer) end;
(2-2) acquiring the hip joint flexion angle parameter signal by using a serial port interface in MATLAB installed at a PC end, and drawing a hip joint flexion angle parameter real-time curve through a plot function.
3. The method according to claim 2, wherein in the step (2-1), the data transmission between the singlechip and the PC is that the singlechip transmits the data to the PC through a Bluetooth module via serial communication and wireless network.
4. The method as claimed in claim 2, wherein the real-time curve of the hip flexion angle parameter in step (2-2) can also be displayed directly by using third-party host computer software.
5. The method as claimed in claim 4, wherein the third-party host computer software is an anonymous host computer.
6. The method for establishing the reward function based on the walking ratio trend changes as claimed in claim 1, wherein the step (1) and the step (2) are implemented by a MEMS attitude sensor with an ADC conversion module for acquiring the hip flexion angle parameter signals of the wearer of the flexible exoskeleton robot.
7. The method for creating a reward function based on walking ratio trend changes according to claim 1, wherein the scoring of the sequence values in the analysis sequence according to the scoring mechanism in the step (4) specifically comprises:
when WAt present>WTargetIf the walking ratio of the current sampling point is smaller than that of the last sampling point, setting the sequence value corresponding to the current time to be 1, otherwise, setting the sequence value to be 0;
when WAt present<WTargetIf the walking ratio of the current sampling point is larger than that of the last sampling point, setting the sequence value corresponding to the current time to be 1, otherwise, setting the sequence value to be 0;
selecting m sequence values containing the current time point from the analysis sequence, recording the number of 1 in the analysis sequence as P and the number of 0 in the analysis sequence as Q, and calculating the reward value after the exoskeleton robot executes the previous action according to a formula (5):
Figure FDA0002810077880000031
wherein Maximun is a maximum reward value set artificially, P is the number of 1 in an analysis sequence, Q is the number of 0 in the analysis sequence, and m is the number of sampling points in the analysis sequence;
Figure FDA0002810077880000032
indicating a trend in walking ratio over multiple cycles.
CN202011387443.2A 2020-12-01 2020-12-01 Reward function establishing method based on walking ratio trend change Active CN112472530B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011387443.2A CN112472530B (en) 2020-12-01 2020-12-01 Reward function establishing method based on walking ratio trend change

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011387443.2A CN112472530B (en) 2020-12-01 2020-12-01 Reward function establishing method based on walking ratio trend change

Publications (2)

Publication Number Publication Date
CN112472530A true CN112472530A (en) 2021-03-12
CN112472530B CN112472530B (en) 2023-02-03

Family

ID=74938781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011387443.2A Active CN112472530B (en) 2020-12-01 2020-12-01 Reward function establishing method based on walking ratio trend change

Country Status (1)

Country Link
CN (1) CN112472530B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007063633A1 (en) * 2005-11-30 2007-06-07 Japan Science And Technology Agency Phase reaction curve learning method and device, periodic motion control method and device, and walking control device
CN108785997A (en) * 2018-05-30 2018-11-13 燕山大学 A kind of lower limb rehabilitation robot Shared control method based on change admittance
CN109199783A (en) * 2017-07-04 2019-01-15 中国科学院沈阳自动化研究所 A kind of control method controlling rehabilitation of anklebone equipment rigidity using sEMG
US20190083002A1 (en) * 2017-09-20 2019-03-21 Samsung Electronics Co., Ltd. Method and apparatus for updating personalized gait policy
CN110764416A (en) * 2019-11-11 2020-02-07 河海大学 Humanoid robot gait optimization control method based on deep Q network
CN110812131A (en) * 2019-11-28 2020-02-21 深圳市迈步机器人科技有限公司 Gait control method and control system of exoskeleton robot and exoskeleton robot
CN111515938A (en) * 2020-05-28 2020-08-11 河北工业大学 Lower limb exoskeleton walking trajectory tracking method based on inheritance type iterative learning control
CN111546349A (en) * 2020-06-28 2020-08-18 常州工学院 New deep reinforcement learning method for humanoid robot gait planning
US20200272905A1 (en) * 2019-02-26 2020-08-27 GE Precision Healthcare LLC Artificial neural network compression via iterative hybrid reinforcement learning approach
CN111604890A (en) * 2019-12-30 2020-09-01 合肥工业大学 Motion control method suitable for exoskeleton robot
US20200323726A1 (en) * 2018-02-08 2020-10-15 Parker-Hannifin Corporation Advanced gait control system and methods enabling continued walking motion of a powered exoskeleton device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007063633A1 (en) * 2005-11-30 2007-06-07 Japan Science And Technology Agency Phase reaction curve learning method and device, periodic motion control method and device, and walking control device
CN109199783A (en) * 2017-07-04 2019-01-15 中国科学院沈阳自动化研究所 A kind of control method controlling rehabilitation of anklebone equipment rigidity using sEMG
US20190083002A1 (en) * 2017-09-20 2019-03-21 Samsung Electronics Co., Ltd. Method and apparatus for updating personalized gait policy
US20200323726A1 (en) * 2018-02-08 2020-10-15 Parker-Hannifin Corporation Advanced gait control system and methods enabling continued walking motion of a powered exoskeleton device
CN108785997A (en) * 2018-05-30 2018-11-13 燕山大学 A kind of lower limb rehabilitation robot Shared control method based on change admittance
US20200272905A1 (en) * 2019-02-26 2020-08-27 GE Precision Healthcare LLC Artificial neural network compression via iterative hybrid reinforcement learning approach
CN110764416A (en) * 2019-11-11 2020-02-07 河海大学 Humanoid robot gait optimization control method based on deep Q network
CN110812131A (en) * 2019-11-28 2020-02-21 深圳市迈步机器人科技有限公司 Gait control method and control system of exoskeleton robot and exoskeleton robot
CN111604890A (en) * 2019-12-30 2020-09-01 合肥工业大学 Motion control method suitable for exoskeleton robot
CN111515938A (en) * 2020-05-28 2020-08-11 河北工业大学 Lower limb exoskeleton walking trajectory tracking method based on inheritance type iterative learning control
CN111546349A (en) * 2020-06-28 2020-08-18 常州工学院 New deep reinforcement learning method for humanoid robot gait planning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孟凡成: "上肢康复机器人的增强学习控制方法研究", 《中国博士学位论文全文数据库信息科技辑》 *

Also Published As

Publication number Publication date
CN112472530B (en) 2023-02-03

Similar Documents

Publication Publication Date Title
Bao et al. A CNN-LSTM hybrid model for wrist kinematics estimation using surface electromyography
CN107378944B (en) Multidimensional surface electromyographic signal artificial hand control method based on principal component analysis method
CN109262618B (en) Muscle cooperation-based upper limb multi-joint synchronous proportional myoelectric control method and system
EP3743901A1 (en) Real-time processing of handstate representation model estimates
CN110675933B (en) Finger mirror image rehabilitation training system
CN110232412B (en) Human gait prediction method based on multi-mode deep learning
CN109106351B (en) Human body fatigue detection and slow release system and method based on Internet of things perception
CN109223453B (en) Power-assisted exoskeleton device based on regular walking gait learning
Xu et al. A prosthetic arm based on EMG pattern recognition
CN112494282A (en) Exoskeleton main power parameter optimization method based on deep reinforcement learning
Rai et al. Mode-free control of prosthetic lower limbs
CN101695444A (en) Foot acceleration information acquisition system and acceleration information acquisition method thereof
CN112472530B (en) Reward function establishing method based on walking ratio trend change
CN113262088B (en) Multi-degree-of-freedom hybrid control artificial hand with force feedback and control method
CN109765906A (en) A kind of intelligent ship tracking method based on Compound Orthogonal Neural Network PREDICTIVE CONTROL
CN117472183A (en) Personalized dynamic rehabilitation man-machine interaction method and related equipment
CN110109904B (en) Environment-friendly big data oriented water quality soft measurement method
CN111062247A (en) Human body movement intention prediction method oriented to exoskeleton control
Mishra et al. Error minimization and energy conservation by predicting data in wireless body sensor networks using artificial neural network and analysis of error
Shi et al. Wearable device monitoring exercise energy consumption based on Internet of things
CN109431510A (en) A kind of flexible gait monitoring device calculated based on artificial intelligence
CN111403019B (en) Method for establishing ankle joint artificial limb model, model-free control method and verification method
CN114118371A (en) Intelligent agent deep reinforcement learning method and computer readable medium
Zhang et al. The prediction of heart rate during running using Bayesian combined predictor
CN117621051A (en) Active human-computer cooperation method based on human body multi-mode information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant