CN112472530B - Reward function establishing method based on walking ratio trend change - Google Patents

Reward function establishing method based on walking ratio trend change Download PDF

Info

Publication number
CN112472530B
CN112472530B CN202011387443.2A CN202011387443A CN112472530B CN 112472530 B CN112472530 B CN 112472530B CN 202011387443 A CN202011387443 A CN 202011387443A CN 112472530 B CN112472530 B CN 112472530B
Authority
CN
China
Prior art keywords
walking ratio
sequence
walking
reward
flexion angle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011387443.2A
Other languages
Chinese (zh)
Other versions
CN112472530A (en
Inventor
孙磊
李云飞
董恩增
佟吉刚
陈鑫
曾德添
龚欣翔
李成辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Technology
Original Assignee
Tianjin University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Technology filed Critical Tianjin University of Technology
Priority to CN202011387443.2A priority Critical patent/CN112472530B/en
Publication of CN112472530A publication Critical patent/CN112472530A/en
Application granted granted Critical
Publication of CN112472530B publication Critical patent/CN112472530B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H3/00Appliances for aiding patients or disabled persons to walk about
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H2201/00Characteristics of apparatus not provided for in the preceding codes
    • A61H2201/16Physical interface with patient
    • A61H2201/1602Physical interface with patient kind of interface, e.g. head rest, knee support or lumbar support
    • A61H2201/165Wearable interfaces
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H2201/00Characteristics of apparatus not provided for in the preceding codes
    • A61H2201/16Physical interface with patient
    • A61H2201/1657Movement of interface, i.e. force application means
    • A61H2201/1659Free spatial automatic movement of interface within a working area, e.g. Robot
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H2201/00Characteristics of apparatus not provided for in the preceding codes
    • A61H2201/50Control means thereof
    • A61H2201/5058Sensors or detectors
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H2201/00Characteristics of apparatus not provided for in the preceding codes
    • A61H2201/50Control means thereof
    • A61H2201/5097Control means thereof wireless

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Health & Medical Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Physical Education & Sports Medicine (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Operations Research (AREA)
  • Epidemiology (AREA)
  • Pain & Pain Management (AREA)
  • Databases & Information Systems (AREA)
  • Rehabilitation Therapy (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Rehabilitation Tools (AREA)

Abstract

The invention discloses a method for establishing a reward function based on walking ratio trend change, which comprises the following steps: calculating a step length D of a wearer of the exoskeleton robot; calculating a gait cycle T (k); calculating a walking ratio W according to the step length D and the gait cycle T (k); establishing a walking ratio sampling sequence and scoring sampling sequences in the walking ratio sampling sequence; and establishing a reward function model. The reward function model based on the walking ratio trend change can be applied to an algorithm for optimizing exoskeleton parameters, so that the efficiency of reinforcement learning is enhanced, and the exoskeleton parameters are promoted to be rapidly converged.

Description

Reward function establishing method based on walking ratio trend change
The technical field is as follows:
the invention belongs to the technical field of robots, and relates to a method for establishing a walking ratio reward function of a gait rehabilitation flexible exoskeleton robot, which can be applied to a control parameter self-adaptive control task of a flexible exoskeleton based on a reinforcement learning method.
(II) background art:
the flexible exoskeleton robot can assist the old people with inconvenience in walking and strengthening the leg strength of the human body. Has wide application in rehabilitation, daily trip and other aspects. Due to the fact that large individual difference exists between people, at present, control parameters of the exoskeleton robot mostly need to be adjusted according to the self motion characteristics of a wearer, time and labor are consumed, and the body change of the wearer cannot be tracked.
The reinforcement learning can search the optimal strategy in the interaction with the environment and can learn autonomously. Therefore, the parameter adaptability of the robot can be greatly improved by applying the reinforcement learning to the exoskeleton. Since the goal of reinforcement learning is to maximize the cumulative prize, the prize function plays a very important role. In supervised learning, a supervisory signal is provided by the training data. In reinforcement learning, the reward function plays a role of monitoring signals, and an Agent performs strategy optimization according to rewards.
The reward function is the key of the learning efficiency of the intelligent agent, the reward function mostly depends on the design of human experts at present, and the reward function which is difficult to design is difficult to solve some complex decision problems. Therefore, researchers put forward Meta Learning (Meta Learning), simulation Learning (emulation Learning) and other modes, and the intelligent agent learns to summarize corresponding reward functions from good strategies for guiding the reinforcement Learning process. However, the simulation Learning requires alternating iterations of Inverse Reinforcement Learning (Inverse Reinforcement Learning) and Reinforcement Learning, the process is complicated, and the simulation Learning depends on expert samples, which is not suitable for some occasions lacking expert samples. Researchers propose solutions for this purpose, including setting auxiliary tasks, introducing curiosity mechanisms and the like, which are still limited by generalization capability, and corresponding prior information needs to be provided by experts according to specific tasks, so that the sparse reward problem of reinforcement learning cannot be solved in a general sense.
How to design a reward function for promoting the exoskeleton parameters to be rapidly converged aiming at the problem of flexible exoskeleton parameter self-adaptation is a problem which needs to be solved urgently at present.
(III) the invention content:
the invention aims to provide a method for establishing a reward function based on walking ratio trend change, which can overcome the defects of the prior art, can reflect the trend change of the walking ratio, calculates the step length and the gait cycle by utilizing the output data of an MEMS (Micro-Electro-Mechanical System) attitude sensor to obtain the walking ratio, and establishes the reward function based on the walking ratio trend change to promote the rapid convergence of flexible exoskeleton parameters and enhance the adaptability of the parameters.
The technical scheme of the invention is as follows: a reward function establishing method based on walking ratio trend change is characterized by comprising the following steps:
(1) Collecting hip joint flexion angle parameter signals of a wearer of the flexible exoskeleton robot, and finding out the maximum flexion angle theta of the hip joint max And a minimum flexion angle theta min If the leg length of the wearer of the flexible exoskeleton robot is known to be l, the step length D of the wearer of the flexible exoskeleton robot can be obtained;
D=l(θ maxmin ) (1)
(2) The method comprises the steps of placing a sensor at the middle position of the rear parts of left and right thighs of a wearer of the flexible exoskeleton robot, collecting hip joint flexion angle parameters of the wearer during normal walking in real time to obtain a flexion angle parameter curve of the hip joint of the wearer, and recording a wave trough moment as t Trough of wave And then the current gait cycle can be calculated as follows:
T(k)=t trough of wave (k)-t Trough of wave (k-1) (2)
Namely: the current gait cycle is calculated by the values of two adjacent valley points;
the method for acquiring the flexion angle parameter curve of the hip joint of the wearer in the step (2) is as follows:
(2-1) collecting hip joint flexion angle parameter signals of a wearer of the flexible exoskeleton robot by using an attitude sensor, converting the hip joint flexion angle parameter signals into digital quantity signals, sending the digital quantity signals to a single chip microcomputer, and sending the digital quantity signals to a Personal Computer (PC) end; the data transmission between the single chip microcomputer and the PC end is that the single chip microcomputer transmits data to the PC end through a serial port communication and a Bluetooth module by using a wireless network.
(2-2) acquiring hip joint flexion angle parameter signals by using a serial port interface in MATLAB (matrix laboratory) arranged at a PC (personal computer) end, and drawing a hip joint flexion angle parameter real-time curve through a plot function;
the real-time curve of the hip joint flexion angle parameter can also be directly displayed by using third-party upper computer software, such as an anonymous upper computer.
The collection of hip joint flexion angle parameter signals of the wearer of the flexible exoskeleton robot in the step (1) and the step (2) is realized through an MEMS attitude sensor, and the MEMS attitude sensor is provided with an ADC (Analog to Digital Converter) conversion module.
(3) Calculating the real sampling walking ratio W at the end of one gait cycle according to the step length D obtained in the step (1) and the gait cycle T (k) obtained in the step (2), as shown in the formula (3):
Figure BDA0002810077890000031
wherein W is the walking ratio, D is the step length, and the unit is m, N is the step frequency, and the unit is steps/min, T Step by step Is gait cycle, unit is min;
(4) Establishing a walking ratio sampling sequence, taking a sampling point every other period, setting a scoring mechanism shown in a formula (4) according to the convergence condition of a sequence, and scoring sequence values in an analysis sequence;
|W at present -W Target |<|W Last sampling point -W Target | (4)
Wherein, W At present Walk ratio, W, of the current sample point Target To set a good walking ratio for healthy elderly, W Last sampling point The walking ratio of the last sampling point;
in the step (4), the sequence values in the analysis sequence are scored according to a scoring mechanism, specifically:
(1) when W is At present >W Target If the walking ratio of the current sampling point is smaller than that of the last sampling point, setting the sequence value corresponding to the current time to be 1, and otherwise, setting the sequence value to be 0;
(2) when W At present <W Target If the walking ratio of the current sampling point is larger than that of the last sampling point, the current time corresponds toSetting the sequence value to 1, otherwise setting the sequence value to 0;
(3) selecting m sequence values containing the current time point from the analysis sequence, recording the number of 1 in the analysis sequence as P and the number of 0 as Q, and calculating the reward value after the exoskeleton robot executes the previous action according to a formula (5):
Figure BDA0002810077890000041
wherein Maximun is a maximum reward value set artificially, P is the number of 1 in an analysis sequence, Q is the number of 0 in the analysis sequence, and m is the number of sampling points in the analysis sequence;
Figure BDA0002810077890000042
a trend representing the walking ratio over a plurality of cycles;
(5) Sequentially scoring sampling sequences in the walking ratio sampling sequence to respectively obtain the number of P and Q, and obtaining a global reward value based on the walking ratio after the exoskeleton robot executes the previous action according to a global reward function formula (5); when the P value is high, i.e. P > Q, the walking ratio converges according to the expected trend, i.e. towards the walking ratio for a given healthy elderly, then the exoskeleton robot will get a positive reward; when Q is higher, i.e. Q > P walking ratio will diverge away from expected trend, the exoskeleton robot will get a negative reward;
(6) The reward function model is applied to a reinforcement learning algorithm for optimizing exoskeleton parameters, when the value function shown in the formula (6) is the maximum, the obtained strategy is the optimal strategy, and the adjustment of the walking ratio can be realized, so that the exoskeleton robot assists the old to walk and plays a role in rehabilitation.
v π (s)=E π (R t+1 +γR t+22 R t+3 +...|S t =s) (6)
Wherein v is π (s) is a cost function after action is taken in the case of strategy pi and state s; r is the reward function model mentioned above, R t+1 The reward at the moment t + 1; gamma is reward decayBy a factor of reduction in [0,1 ]]In the middle of; s. the t The state of the environment at time t.
The working principle of the invention is as follows: the range of the reward function is an important parameter that relates to the effectiveness of shaping and demonstrates the strongest impact on a simple reinforcement learning algorithm at runtime. Therefore, a maximum prize value is set, and the prize range can be restricted by determining the value of the prize according to the ratio. When setting the global reward function, the goal is to constrain the overall trend of walking ratio over multiple cycles. Therefore, the trend of the walking ratio is scored, and if the current trend is converged according to the target walking ratio, the current trend is set to be 1; if the current trend is divergence, setting 0; the number of 1 is P, the number of 0 is Q,
Figure BDA0002810077890000051
may represent a trend of walking ratio, P, over a plurality of cycles>Q is convergence, giving a positive reward, Q>P is divergence, giving a negative reward. The magnitude of the ratio indicates the degree of divergence or convergence, with better convergence yielding a larger reward value and overall more divergence yielding a smaller reward. Thus, the reward function is determined as shown in equation (5).
The problem of the walking ratio in a plurality of cycles converging according to an expected trend is solved through a global constraint reward function. The purpose of this reward function setting is to solve the problem of exoskeleton parameter adaptation. And judging the self-adaptation of the exoskeleton parameters by taking whether the value of the walking ratio is the walking ratio of the healthy old people or not as a reference. Through the reward function, after the intelligent agent executes the last action, the value of the current walking ratio is calculated, and the reward is obtained according to the reward function. The agent, the intelligent agent part of the robot, is the agent of the robot that is put into the environment to explore and learn. The intelligent agent accumulates the maximum reward, further adjusts the next action, outputs the parameters more suitable for the old to walk, obtains the action with the maximum reward, is the parameter of the action which enables the walking ratio to be always kept at the healthy walking ratio of the old, and is beneficial to the self-adaptive optimization of the exoskeleton parameters.
The method is mainly used for a parameter optimization algorithm of the exoskeleton robot. The criterion of the reasonability of the current exoskeleton parameters is whether the gait information accords with the gait information of healthy old people. In order to judge gait in real time, the project adopts the concept of walk ratio (walk radio) to describe the motion state of a human body, and the value is defined as the ratio of step length (m) to step frequency (step/s). Previous studies have shown that the walking ratio can be used to describe the gait pattern, which does not vary significantly for a particular subject with respect to physical performance, walking stability, concentration, etc. of the subject. The walking ratio has no significant difference for different healthy individuals, and the walking ratio of the normal gait of the old aged 60 years old or older is between 0.0044 and 0.0055.
The invention has the advantages that: the reward mechanism has the advantages that the reward value is determined according to the variation trend of the walking ratio, the walking ratio is subjected to global constraint in a plurality of periods, the problem that the behaviors of the flexible exoskeleton robot need to be subjected to constraint scoring by a reward function in reinforcement learning is solved, and the reward mechanism is simple and easy to implement; on the basis of improving the learning efficiency of the intelligent agent, the problem of sparse reward is avoided, namely the problem that the reward cannot be obtained in a long period of time in the learning of the intelligent agent does not exist. The blind exploration of the whole algorithm can be effectively avoided, the reinforcement learning efficiency of the flexible exoskeleton robot is improved, the robustness of the exoskeleton robot is enhanced, and the convergence of exoskeleton parameters according to an expected trend is ensured; the adaptability of exoskeleton parameters is improved.
(IV) description of the drawings:
fig. 1 is a schematic diagram illustrating an analysis sequence scoring principle of a walking trending reward mechanism in a method for establishing a reward function based on walking ratio trend changes according to the present invention.
Fig. 2 is a schematic view illustrating a gait cycle calculation principle in a method for establishing a reward function based on a walking ratio trend change according to the present invention.
Fig. 3 is a schematic diagram of obtaining a graph of the change of the hip joint flexion angle with time in the method for establishing the reward function based on the walking ratio trend change according to the present invention.
The concrete implementation mode is as follows:
example (b): a method for establishing a reward function based on walking ratio trend changes is characterized by comprising the following steps:
(1) Collecting hip joint flexion angle parameter signals of a wearer of the flexible exoskeleton robot by using the MEMS attitude sensor, and finding out the maximum flexion angle theta of the hip joint max And minimum flexion angle theta min If the leg length of the wearer of the flexible exoskeleton robot is known to be l, the step length D of the wearer of the flexible exoskeleton robot can be obtained;
D=l(θ maxmin ) (1)
(2) Placing the MEMS attitude sensors at the middle positions of the rear parts of the left thigh and the right thigh of a wearer of the flexible exoskeleton robot, and acquiring the hip joint flexion angle parameters of the wearer during normal walking in real time to acquire the flexion angle parameter curve of the hip joint of the wearer, wherein the wave trough time is recorded as t as shown in fig. 2 Trough of wave And then the current gait cycle can be calculated as follows:
T(k)=t trough of wave (k)-t Trough of wave (k-1) (2)
Namely: the current gait cycle is calculated by the values of two adjacent valley points;
the method for acquiring the flexion angle parameter curve of the hip joint of the wearer specifically comprises the following steps:
(2-1) converting the hip joint flexion angle parameter signal of the wearer of the flexible exoskeleton robot into a digital quantity signal, sending the digital quantity signal to a single chip microcomputer, and transmitting the digital quantity signal to a PC (personal computer) end by the single chip microcomputer through serial port communication and a Bluetooth module by using a wireless network;
(2-2) acquiring hip joint flexion angle parameter signals by using a serial port interface in MATLAB (matrix laboratory) installed at a PC (personal computer) end, and drawing a hip joint flexion angle parameter real-time curve through a plot function; and third-party upper computer software can also be directly used for displaying the curve, such as an anonymous upper computer.
(3) Calculating the real sampling walking ratio W at the end of one gait cycle according to the step length D obtained in the step (1) and the gait cycle T (k) obtained in the step (2), as shown in the formula (3):
Figure BDA0002810077890000071
wherein W is the walking ratio, D is the step length, and the unit is m, N is the step frequency, and the unit is steps/min, T Step by step Is gait cycle, in units of min;
(4) Establishing a walking ratio sampling sequence, taking a sampling point every other period, setting a scoring mechanism shown as a formula (4) according to a number sequence convergence condition as shown in figure 1, and scoring sequence values in an analysis sequence;
|W at present -W Target |<|W Last sampling point -W Target | (4)
Wherein, W At present Walk ratio, W, of the current sample point Target To set a good walking ratio for healthy elderly people, W Last sampling point The walking ratio of the last sampling point;
(1) when W At present >W Target If the walking ratio of the current sampling point is smaller than that of the last sampling point, setting the sequence value corresponding to the current time to be 1, otherwise, setting the sequence value to be 0;
(2) when W At present <W Target If the walking ratio of the current sampling point is larger than that of the last sampling point, setting the sequence value corresponding to the current time to be 1, otherwise, setting the sequence value to be 0;
(3) selecting m sequence values including the current time point from the analysis sequence, recording the number of 1 in the analysis sequence as P and the number of 0 as Q, and calculating the reward value after the exoskeleton robot executes the previous action according to a formula (5) if the specific working mode is as shown in fig. 1:
Figure BDA0002810077890000081
wherein Maximun is a maximum reward value set artificially, P is the number of 1 in the analysis sequence, Q is the number of 0 in the analysis sequence, and m is the number of sampling points in the analysis sequence;
Figure BDA0002810077890000082
a trend representing the walking ratio over a plurality of cycles;
(5) Sequentially scoring sampling sequences in the walking ratio sampling sequence to respectively obtain the number of P and Q, and obtaining a global reward value based on the walking ratio after the exoskeleton robot executes the previous action according to a global reward function formula (5); when the P value is high, i.e. P > Q, the walking ratio converges according to the expected trend, i.e. towards the walking ratio for a given healthy elderly, then the exoskeleton robot will get a positive reward; when Q is higher, i.e. Q > P walking ratio will diverge away from expected trend, the exoskeleton robot will get a negative reward;
(6) The reward function model is applied to a reinforcement learning algorithm for optimizing exoskeleton parameters, when the value function shown in the formula (6) is the maximum, the obtained strategy is the optimal strategy, and the adjustment of the walking ratio can be realized, so that the exoskeleton robot assists the old to walk and plays a role in rehabilitation.
v π (s)=E π (R t+1 +γR t+22 R t+3 +...|S t =s) (6)
Wherein v is π (s) is a cost function after action is taken in the case of strategy pi and state s; r is the reward function model mentioned above, R t+1 The reward at the moment t + 1; gamma is a reward attenuation factor at [0,1 ]]To (c) to (d); s t Is the state of the environment at time t.
The following examples are given for illustrative purposes. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. After reading the detailed description and the related contents, the related technical staff can make various modifications or applications to the invention, and the equivalents thereof also belong to the scope defined by the appended claims of the present application.
For example, in an algorithm that optimizes the exoskeleton's power assist parameters, the actuator selects a according to an action strategy given initial parameters t Giving the flexible exoskeleton execution of a t
A is a t Is meant to be atAt the moment t, the behavior selected by the Agent (Agent) is executed by the environment, and the environment state is represented by s t Conversion to s t+1 ;s t The method comprises the steps that the Agent receives the state from the flexible exoskeleton at the time t; s t+1 Is a scalar award r for receiving feedback from a flexible exoskeleton t And is in the next state;
r t i.e. the reward function in the present invention. Is shown as
Figure BDA0002810077890000091
Wherein, maximum is the Maximum reward value set artificially, P is the number of 1 in the analysis sequence, Q is the number of 0 in the analysis sequence, and m is the number of sampling points in the analysis sequence;
Figure BDA0002810077890000092
a trend representing walking ratios over a plurality of cycles;
the derivation of Q and P in the reward function is shown in the scoring mechanism of fig. 1, which establishes a sequence of step ratio samples, taking one sample at every other cycle:
(1) when W is At present >W Target If the walking ratio of the current sampling point is smaller than that of the last sampling point, setting the sequence value corresponding to the current time to be 1, otherwise, setting the sequence value to be 0;
(2) when W At present <W Target If the walking ratio of the current sampling point is larger than that of the last sampling point, setting the sequence value corresponding to the current time to be 1, otherwise, setting the sequence value to be 0;
then m sequence values containing the current time point are selected, the number of 1 in the analytic sequence is P, and the number of 0 in the analytic sequence is Q.
The walking ratio W is calculated by the formula:
Figure BDA0002810077890000093
wherein W is the walking ratio, D is the step length, and the unit is m, N is the step frequency, and the unit is steps/min, T Step by step Is gait cycle, unit is min;
in the formula, the calculation formula of the step length D is D = l (θ) maxmin )。
Wherein, theta max Is the maximum flexion angle of the hip joint, theta min Is the minimum flexion angle of the hip joint, and l is the leg length of the wearer of the flexible exoskeleton robot. Maximum flexion angle theta to hip joint max And minimum flexion angle theta min The measurement of (2) is realized by a MEMS attitude sensor.
As shown in FIG. 2, there are two troughs and one peak in a gait cycle, and the time of the trough is marked as t Trough of wave Then gait cycle T Step by step Is T (k) = T Trough of wave (k)-t Trough of wave (k-1)。
Flexible exoskeleton execution a t Returning to r obtained by obtaining the flexible exoskeleton heuristic t And s t+1
The actuator converts the state: (s) t ,a t ,r t ,s t+1 ) Storing the parameters into an experience pool, and obtaining the parameters of the current time state and action after re-observation through a long short-term memory network
Figure BDA0002810077890000101
And will be
Figure BDA0002810077890000102
As a data set for training the online network. At the same time,(s) obtained by probing t ,a t ,r t ,s t+1 ) And putting a reward function into the system, aiming at carrying out reward constraint, providing online strategy network and online Q network reference data and promoting the rapid convergence of the flexible exoskeleton parameters.

Claims (6)

1. A method for establishing a reward function based on walking ratio trend changes is characterized by comprising the following steps:
(1) Collecting hip joint flexion angle parameter signals of a wearer of the flexible exoskeleton robot, and finding out the maximum flexion angle theta of the hip joint max And minimum flexion angle theta min If the leg length of the wearer of the flexible exoskeleton robot is known to be l, the flexible exoskeleton robot can be wornThe step size D of the user;
D=l(θ maxmin ) (1)
(2) The method comprises the steps of placing a sensor at the middle position of the rear parts of left and right thighs of a wearer of the flexible exoskeleton robot, collecting hip joint flexion angle parameters of the wearer during normal walking in real time to obtain a flexion angle parameter curve of the hip joint of the wearer, and recording a wave trough moment as t Trough of wave And then the current gait cycle can be calculated as follows:
T(k)=t trough of wave (k)-t Trough of wave (k-1) (2)
Namely: the current gait cycle is calculated by the values of two adjacent valley points;
(3) Calculating the real sampling walking ratio W at the end of one gait cycle according to the step length D obtained in the step (1) and the gait cycle T (k) obtained in the step (2), as shown in the formula (3):
Figure FDA0003929690840000011
wherein W is the walking ratio, D is the step length, and the unit is m, N is the step frequency, and the unit is steps/min, T Step (b) Is gait cycle, in units of min;
(4) Establishing a walking ratio sampling sequence, taking a sampling point every other period, setting a scoring mechanism shown in the formula (4) according to the number sequence convergence condition, and scoring sequence values in the analysis sequence;
|W at present -W Target |<|W Last sampling point -W Target | (4)
Wherein, W At present Walk ratio, W, of the current sample point Target To set a good walking ratio for healthy elderly, W Last sampling point The walking ratio of the last sampling point;
the method comprises the following steps of scoring sequence values in an analysis sequence according to a scoring mechanism, wherein the scoring specifically comprises the following steps:
(1) when W At present >W Target If the walking ratio of the current sampling point is smaller than that of the last sampling pointIf yes, setting the sequence value corresponding to the current time to be 1, otherwise, setting the sequence value to be 0;
(2) when W At present <W Target If the walking ratio of the current sampling point is larger than that of the last sampling point, setting the sequence value corresponding to the current time to be 1, otherwise, setting the sequence value to be 0;
(3) selecting m sequence values containing the current time point from the analysis sequence, recording the number of 1 in the analysis sequence as P and the number of 0 as Q, and calculating the reward value after the exoskeleton robot executes the previous action according to a formula (5):
Figure FDA0003929690840000021
wherein Maximun is a maximum reward value set artificially, P is the number of 1 in an analysis sequence, Q is the number of 0 in the analysis sequence, and m is the number of sampling points in the analysis sequence;
Figure FDA0003929690840000022
a trend representing walking ratios over a plurality of cycles;
(5) Sequentially scoring sampling sequences in the walking ratio sampling sequence to respectively obtain the number of P and Q, and obtaining a global reward value based on the walking ratio after the exoskeleton robot executes the previous action according to a global reward function formula (5); when the value of P is higher, namely P is more than Q, the walking ratio converges towards the expected trend, namely the walking ratio of the given healthy old people, and the exoskeleton robot is rewarded positively; when the Q value is higher, namely Q is larger than P, the walking ratio deviates from the expected trend and diverges, and the exoskeleton robot is rewarded negatively;
(6) When the reward function is applied to a reinforcement learning algorithm for optimizing exoskeleton parameters, and the value function shown in formula (6) is the maximum, the obtained strategy is the optimal strategy, and the adjustment of the walking ratio can be realized;
v π (s)=E π (R t+1 +γR t+22 R t+3 +…|S t =s) (6)
wherein,v π (s) is a cost function after action is taken in the case of strategy pi and state s; r is the reward function mentioned above, R t+1 The reward at the moment t + 1; gamma is the reward attenuation factor at [0,1 ]]In the middle of; s t Is the state of the environment at time t.
2. The method for establishing a reward function based on walking ratio trend changes according to claim 1, wherein the curve of the flexion angle parameter of the hip joint of the wearer in the step (2) is obtained by the following method:
(2-1) collecting hip joint flexion angle parameter signals of a wearer of the flexible exoskeleton robot by using an attitude sensor, converting the hip joint flexion angle parameter signals into digital quantity signals, sending the digital quantity signals to a single chip microcomputer, and sending the digital quantity signals to a PC (personal computer) end;
(2-2) acquiring the hip joint flexion angle parameter signal by using a serial port interface in MATLAB installed at a PC end, and drawing a hip joint flexion angle parameter real-time curve through a plot function.
3. The method according to claim 2, wherein in the step (2-1), the data transmission between the singlechip and the PC is that the singlechip transmits the data to the PC through a Bluetooth module via serial communication and wireless network.
4. The method as claimed in claim 2, wherein the real-time curve of the hip flexion angle parameter in step (2-2) can be displayed directly by using third-party host computer software.
5. The method for establishing the reward function based on the walking ratio trend change as claimed in claim 4, wherein the third-party host computer software is an anonymous host computer.
6. The method for establishing the reward function based on the walking ratio trend changes as claimed in claim 1, wherein the step (1) and the step (2) are implemented by a MEMS attitude sensor with an ADC conversion module for acquiring the hip flexion angle parameter signals of the wearer of the flexible exoskeleton robot.
CN202011387443.2A 2020-12-01 2020-12-01 Reward function establishing method based on walking ratio trend change Active CN112472530B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011387443.2A CN112472530B (en) 2020-12-01 2020-12-01 Reward function establishing method based on walking ratio trend change

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011387443.2A CN112472530B (en) 2020-12-01 2020-12-01 Reward function establishing method based on walking ratio trend change

Publications (2)

Publication Number Publication Date
CN112472530A CN112472530A (en) 2021-03-12
CN112472530B true CN112472530B (en) 2023-02-03

Family

ID=74938781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011387443.2A Active CN112472530B (en) 2020-12-01 2020-12-01 Reward function establishing method based on walking ratio trend change

Country Status (1)

Country Link
CN (1) CN112472530B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108785997A (en) * 2018-05-30 2018-11-13 燕山大学 A kind of lower limb rehabilitation robot Shared control method based on change admittance
CN110764416A (en) * 2019-11-11 2020-02-07 河海大学 Humanoid robot gait optimization control method based on deep Q network
CN110812131A (en) * 2019-11-28 2020-02-21 深圳市迈步机器人科技有限公司 Gait control method and control system of exoskeleton robot and exoskeleton robot
CN111515938A (en) * 2020-05-28 2020-08-11 河北工业大学 Lower limb exoskeleton walking trajectory tracking method based on inheritance type iterative learning control
CN111604890A (en) * 2019-12-30 2020-09-01 合肥工业大学 Motion control method suitable for exoskeleton robot

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007175860A (en) * 2005-11-30 2007-07-12 Japan Science & Technology Agency Method and device for learning phase reaction curve, method and device for controlling cyclic movement, and walking movement controller
CN109199783B (en) * 2017-07-04 2020-06-09 中国科学院沈阳自动化研究所 Control method for controlling stiffness of ankle joint rehabilitation equipment by using sEMG
KR102550887B1 (en) * 2017-09-20 2023-07-06 삼성전자주식회사 Method and apparatus for updatting personalized gait policy
ES2907244T3 (en) * 2018-02-08 2022-04-22 Parker Hannifin Corp Advanced gait control system and procedures that enable continuous gait movement of a powered exoskeleton device
US20200272905A1 (en) * 2019-02-26 2020-08-27 GE Precision Healthcare LLC Artificial neural network compression via iterative hybrid reinforcement learning approach
CN111546349A (en) * 2020-06-28 2020-08-18 常州工学院 New deep reinforcement learning method for humanoid robot gait planning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108785997A (en) * 2018-05-30 2018-11-13 燕山大学 A kind of lower limb rehabilitation robot Shared control method based on change admittance
CN110764416A (en) * 2019-11-11 2020-02-07 河海大学 Humanoid robot gait optimization control method based on deep Q network
CN110812131A (en) * 2019-11-28 2020-02-21 深圳市迈步机器人科技有限公司 Gait control method and control system of exoskeleton robot and exoskeleton robot
CN111604890A (en) * 2019-12-30 2020-09-01 合肥工业大学 Motion control method suitable for exoskeleton robot
CN111515938A (en) * 2020-05-28 2020-08-11 河北工业大学 Lower limb exoskeleton walking trajectory tracking method based on inheritance type iterative learning control

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
上肢康复机器人的增强学习控制方法研究;孟凡成;《中国博士学位论文全文数据库信息科技辑》;20160415(第04期);全文 *

Also Published As

Publication number Publication date
CN112472530A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
Bao et al. A CNN-LSTM hybrid model for wrist kinematics estimation using surface electromyography
US11498203B2 (en) Controls optimization for wearable systems
CN107378944B (en) Multidimensional surface electromyographic signal artificial hand control method based on principal component analysis method
CN109262618B (en) Muscle cooperation-based upper limb multi-joint synchronous proportional myoelectric control method and system
EP3743901A1 (en) Real-time processing of handstate representation model estimates
CN110675933B (en) Finger mirror image rehabilitation training system
CN109106351B (en) Human body fatigue detection and slow release system and method based on Internet of things perception
CN112022619B (en) Multi-mode information fusion sensing system of upper limb rehabilitation robot
CN110232412B (en) Human gait prediction method based on multi-mode deep learning
CN109223453B (en) Power-assisted exoskeleton device based on regular walking gait learning
Xu et al. A prosthetic arm based on EMG pattern recognition
CN101695444A (en) Foot acceleration information acquisition system and acceleration information acquisition method thereof
CN112472530B (en) Reward function establishing method based on walking ratio trend change
CN115761787A (en) Hand gesture measuring method with fusion constraints
CN113262088B (en) Multi-degree-of-freedom hybrid control artificial hand with force feedback and control method
CN109765906A (en) A kind of intelligent ship tracking method based on Compound Orthogonal Neural Network PREDICTIVE CONTROL
CN110109904B (en) Environment-friendly big data oriented water quality soft measurement method
CN111062247A (en) Human body movement intention prediction method oriented to exoskeleton control
Mishra et al. Error minimization and energy conservation by predicting data in wireless body sensor networks using artificial neural network and analysis of error
CN109303565B (en) Sleep state prediction method and device
CN115147768A (en) Fall risk assessment method and system
Zhang et al. The prediction of heart rate during running using Bayesian combined predictor
CN114118371A (en) Intelligent agent deep reinforcement learning method and computer readable medium
CN117621051A (en) Active human-computer cooperation method based on human body multi-mode information
CN111452022A (en) Bayesian optimization-based upper limb rehabilitation robot active training reference track complexity adjusting method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant