CN112494282B - Exoskeleton main assistance parameter optimization method based on deep reinforcement learning - Google Patents

Exoskeleton main assistance parameter optimization method based on deep reinforcement learning Download PDF

Info

Publication number
CN112494282B
CN112494282B CN202011383180.8A CN202011383180A CN112494282B CN 112494282 B CN112494282 B CN 112494282B CN 202011383180 A CN202011383180 A CN 202011383180A CN 112494282 B CN112494282 B CN 112494282B
Authority
CN
China
Prior art keywords
exoskeleton
moment
network
time
hip joint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011383180.8A
Other languages
Chinese (zh)
Other versions
CN112494282A (en
Inventor
孙磊
陈鑫
董恩增
佟吉刚
李云飞
曾德添
龚欣翔
李成辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Technology
Original Assignee
Tianjin University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Technology filed Critical Tianjin University of Technology
Priority to CN202011383180.8A priority Critical patent/CN112494282B/en
Publication of CN112494282A publication Critical patent/CN112494282A/en
Application granted granted Critical
Publication of CN112494282B publication Critical patent/CN112494282B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H3/00Appliances for aiding patients or disabled persons to walk about
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/30ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to physical therapies or activities, e.g. physiotherapy, acupressure or exercising
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H2201/00Characteristics of apparatus not provided for in the preceding codes
    • A61H2201/16Physical interface with patient
    • A61H2201/1602Physical interface with patient kind of interface, e.g. head rest, knee support or lumbar support
    • A61H2201/165Wearable interfaces
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H2201/00Characteristics of apparatus not provided for in the preceding codes
    • A61H2201/16Physical interface with patient
    • A61H2201/1657Movement of interface, i.e. force application means
    • A61H2201/1659Free spatial automatic movement of interface within a working area, e.g. Robot
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H2201/00Characteristics of apparatus not provided for in the preceding codes
    • A61H2201/50Control means thereof
    • A61H2201/5058Sensors or detectors
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H2201/00Characteristics of apparatus not provided for in the preceding codes
    • A61H2201/50Control means thereof
    • A61H2201/5058Sensors or detectors
    • A61H2201/5069Angle sensors

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Epidemiology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Artificial Intelligence (AREA)
  • Physical Education & Sports Medicine (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Pain & Pain Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Rehabilitation Therapy (AREA)
  • Animal Behavior & Ethology (AREA)
  • Development Economics (AREA)
  • Veterinary Medicine (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Rehabilitation Tools (AREA)

Abstract

The invention discloses an optimization method of exoskeleton main assistance parameters based on deep reinforcement learning, which adopts a compound sinusoidal exoskeleton assistance curve equation to determine the exoskeleton main assistance parameters, utilizes a depth deterministic strategy gradient method in the deep reinforcement learning to solve the problem of flexible exoskeleton continuity control, builds a strategy network and an evaluation network, collects and processes hip joint flexion angle information of an exoskeleton wearer in real time, is used for generating a parameter training data set, carries out training optimization of the exoskeleton main assistance parameters, and realizes self-adaptive optimization of the exoskeleton main assistance parameters.

Description

Exoskeleton main assistance parameter optimization method based on deep reinforcement learning
(one) technical field:
the invention relates to the technical field of robots, in particular to an exoskeleton main assistance parameter optimization method based on deep reinforcement learning.
(II) background art:
for traditional lower limb rehabilitation training, the method is guided by a professional doctor and completed with the assistance of nurses or family members, and the method has the advantages of long time consumption, low effect and high labor intensity. In order to reduce the manpower burden and realize high-efficiency rehabilitation service, the gait rehabilitation flexible exoskeleton is widely applied.
The gait rehabilitation flexible exoskeleton combines the intelligent robot technology and the rehabilitation medical theory, can replace a professional doctor, and helps a patient to complete lower limb rehabilitation training. The appearance of the traditional Chinese medicine provides a new choice for rehabilitation of patients with lower limb dysfunction, and makes up for the deficiency of clinical treatment of patients with lower limb dysfunction.
The gait rehabilitation flexible exoskeleton is treated by fixing the lower limb of a patient and the exoskeleton together through a flexible band. The exoskeleton drives the lower limbs of the patient to complete various set rehabilitation training actions and stimulates the nerve control system of the joints and muscles of the lower limbs of the human body, so that the function of recovering the movement of the lower limbs of the patient is realized. The service object of the gait rehabilitation flexible exoskeleton determines that the gait rehabilitation flexible exoskeleton needs good comfort and adaptability, can bring better rehabilitation experience to patients and can be suitable for lower limb dysfunction patients of different people. Therefore, the power assisting parameter optimization of how to realize deep reinforcement learning of the exoskeleton is one of the core technologies of comfort and reliability of gait rehabilitation flexible exoskeleton.
The traditional lower limb rehabilitation training is guided by a professional doctor and is finished with the assistance of nurses or family members, and the mode is long in time consumption, low in effect and high in labor intensity. In the rehabilitation treatment of patients with lower limb dysfunction, a series of rehabilitation training of continuous actions is needed to be carried out on the patients, because the lower limb conditions of the patients are different, accurate assistance is needed to the patients, and the action is ended when the leg of the patients does not get the appointed gesture due to the fact that the assistance is too small, so that the lower limb rehabilitation training effect is poor; too much assistance can cause excessive stretching of the patient's legs, easily leading to secondary injury and unnecessary injury to the patient.
In a PID (proportional-integral-derivative control) control method in a conventional power assisting method, a control amount is calculated by using a linear combination of three terms of proportional, integral and derivative according to a system error to assist. While the PID algorithm is widely used in present day control due to its advantages of simple principle, easy parameter adjustment, etc., it may cause unexpected results in some cases. If the desired value differs too much from the actual value, the motor will generate too high a speed to reach the desired value, which will often lead to overshoot and oscillation phenomena, which is quite dangerous for gait rehabilitation flexible exoskeletons. In addition, the method cannot realize the optimization of the power assisting parameters, so that the efficiency is low, and the parameters still have larger errors.
Aiming at the defects of the prior art, a method for optimizing main exoskeleton assistance parameters based on deep reinforcement learning is needed to solve the problem of continuous assistance of gait rehabilitation flexible exoskeleton.
(III) summary of the invention:
the invention aims to provide a deep reinforcement learning-based exoskeleton main assistance parameter optimization method, which can overcome the defects of the prior art, is a parameter optimization method with simple principle and easy realization of technology, and can cope with the gait rehabilitation flexible exoskeleton continuous assistance problem, thereby effectively solving the personalized matching problem of the exoskeleton.
The technical scheme of the invention is as follows: the exoskeleton main assistance parameter optimization method based on deep reinforcement learning is characterized by comprising the following steps of:
(1) Determining optimization parameters;
determining optimization parameters according to an exoskeleton boosting curve equation, wherein the curve equation is in a compound sinusoidal form shown in a formula (1):
Figure BDA0002809008340000031
in the formula ,Fassist For the real-time power assisting, A is swing phase power assisting amplitude, t * Is the time between the current time and the power-assisting starting time, T b For the swing phase period of the current gait cycle, alpha is an exoskeleton main assistance parameter, and serves as a waveform control parameter of a formula (1) to change the assistance peak position, wherein the value range is-1;
the swing phase power-assisted amplitude A is determined by the rated output value of the power-assisted component, and the swing phase power-assisted amplitude is a known value and can be set manually under the rated operation of the power-assisted component.
Swing phase period T of the current gait cycle b The hip joint buckling angle parameters of the wearer during walking are acquired by utilizing MEMS (Micro-Electro-Mechanical System ) attitude sensors to acquire a buckling angle parameter curve of the hip joint of the wearer, and the first three of the buckling angle parameter curves are adopted The method for averaging the swing phase period of the next gait from the swing phase periods is characterized in that the swing phase period of the next gait is averaged from the previous three swing phase periods and is taken as the swing phase period of the current gait period. Thus, the swing phase period of the current gait cycle corresponds to a known value, obtained by equation (2):
Figure BDA0002809008340000032
swing phase period T of the current gait cycle b The specific calculation method of (2) is as follows:
the MEMS attitude sensor is placed at the middle position of the rear parts of the left thigh and the right thigh of a wearer of the flexible exoskeleton robot, hip joint flexion angle parameters of the wearer during normal walking of the wearer are acquired in real time, so as to acquire a flexion angle parameter curve of the hip joint of the wearer, and the peak moment is recorded as t Wave crest The trough moment is recorded as t Trough of wave And the hip joint buckling angle corresponding to the peak hip joint buckling angle and the trough hip joint buckling angle are recorded, and the current gait cycle shown in the formula (3) and the swing phase cycle of the gait cycle shown in the formula (4) can be further calculated as follows:
T(k)=t trough of wave (k)-t Trough of wave (k-1) (3)
T b (k)=t Wave crest (k)-t Trough of wave (k) (4)
Wherein, the expression (3) indicates that the current gait cycle is calculated by the values of two adjacent wave valley points, wherein T is the current gait cycle; the swing phase period of the gait cycle is calculated by the values of adjacent wave peak points and wave trough points;
Correspondingly, the maximum hip joint buckling angle theta corresponding to the current gait cycle can be obtained max (k) Minimum hip flexion angle θ min (k) For the state of the exoskeleton at the initial time in step (5) and the state of the exoskeleton at the next time in step (8).
The method for acquiring the flexion angle parameter curve of the hip joint of the wearer in the step (1) comprises the following steps:
(1-1) acquiring hip joint buckling angle parameter signals of a wearer of the flexible exoskeleton robot by using an MEMS attitude sensor, converting the hip joint buckling angle parameter signals into digital quantity signals, transmitting the digital quantity signals to a singlechip, and transmitting the digital quantity signals to a PC (Personal Computer ) end by using the singlechip;
the data transmission between the singlechip and the PC end in the step (1-1) is that the singlechip transmits to the PC end through serial communication and a Bluetooth module by utilizing a wireless network.
(1-2) obtaining a hip joint buckling angle parameter signal by utilizing a serial port interface in MATLAB installed at the PC end, and drawing a real-time curve of the hip joint buckling angle parameter through a plot function.
(2) Setting parameters:
setting the walking time interval of each walking time of the exoskeleton wearer to be tau=5-7 s, properly increasing the time interval, ensuring that the exoskeleton wearer can walk at least 3 steps for acquiring the swing phase period of the current gait cycle, enabling the exoskeleton wearer to stably stand when finishing each walking time interval, and judging the boosting condition again after each advancing; presetting a maximum scenario number E, a batch sampling number N and a maximum time wheel T of each scenario max
The setting of the maximum scenario number E in the step (2) refers to setting the convergence number of optimizing the exoskeleton main assistance parameter α by using the deep reinforcement learning method, that is: the primary scenario corresponds to convergence of the primary parameters; said setting maximum time per episode wheel T max The number of rounds to be performed under each scenario is set, and each round corresponds to the number of time intervals, namely: the maximum convergence of the main assistance parameter alpha of the exoskeleton is needed to be completed at each time max Wheels, each wheel requiring a time interval for the exoskeleton wearer to walk τ; and, starting the number of rounds, recording a time, wherein the number of rounds starting time is defined as T moment, namely, the starting time of the first round corresponds to t=1 moment, and so on, the T-th moment max The start time of the number of rounds corresponds to t=t max Time of day.
(3) Standard configuration in design depth deterministic policy gradient method (Deep Deterministic Policy Gradient, DDPG), specifically including policy network and evaluation networkIs designed according to the design of (2); wherein the policy network comprises an online policy network μ (s|α μ ) And a target policy network μ (s|α μ' ) The method comprises the steps of carrying out a first treatment on the surface of the The evaluation network comprises an online evaluation network Q (s, a|alpha Q ) And a target evaluation network Q (s, a|α Q' );
The design of the strategy network and the evaluation network by using the depth deterministic strategy gradient method in the step (3) specifically comprises the following steps:
(3-1) network μ (s|α) for online policies μ ) On-line evaluation network Q (s, a|α Q ) Initializing;
(3-2) construction and on-line policy network μ (s|α μ ) Target policy network μ (s|α μ' ) Construction and on-line evaluation of network Q (s, a|α Q ) Target evaluation network Q (s, a|α Q' ) And copying parameters of the online policy network and the online evaluation network to respective target network parameters, namely alpha μ' ←α μ and αQ' ←α Q The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the main assistance parameter alpha of the exoskeleton is used as a parameter to be optimized based on a deep reinforcement learning method, s refers to the state of the exoskeleton, and a refers to the action of the exoskeleton; the experience playback pool R is initialized.
The exoskeleton state s in the step (3-2) comprises swing phase assistance amplitude A, current gait cycle T and swing phase cycle T of the current gait cycle b Flexion angle θ of hip joint of exoskeleton wearer, maximum flexion angle θ of hip joint at current gait cycle max Minimum flexion angle theta to hip joint in current gait cycle min The method comprises the steps of carrying out a first treatment on the surface of the The action a of the exoskeleton is the power assisting amount of the exoskeleton, and the power assisting direction of the exoskeleton is always positive, namely, the exoskeleton is vertically upward.
(4) Enumerating the scenario number E from 1 to E, namely performing E times of convergence on the main assistance parameter alpha of the exoskeleton, wherein the state of the exoskeleton at the initial moment can be obtained at the beginning of each scenario;
(5) Acquiring an initial state:
when each scenario in step (4) begins, it is necessary to have the exoskeleton wearer walk tau normally without assistance for a time interval and obtain the exoskeletonState of iliac as state s of exoskeleton at initial time at time t=1 1 Specifically comprises an initial moment swing phase power-assisted amplitude A 1 Initial moment exoskeleton wearer hip joint flexion angle θ 1 Gait cycle T at initial moment 1 Swing phase period T of initial moment gait cycle b1 Maximum flexion angle theta of hip joint in initial moment gait cycle max 1, minimum flexion angle theta of hip joint in gait cycle at initial moment min ,1;
The state s of the exoskeleton at the initial time in the step (5) 1 The specific obtaining method of (2) comprises the following steps:
(5-1) enabling the exoskeleton wearer to walk tau normally without assistance, placing MEMS attitude sensors at middle positions of the rear parts of the left and right thighs of the wearer of the flexible exoskeleton robot, collecting hip joint flexion angle parameters of the wearer during normal walking in real time, and taking the flexion angle of the hip joint at the end of walking of the exoskeleton wearer as the flexion angle theta of the hip joint of the exoskeleton wearer at the initial time 1
(5-2) acquiring the hip joint flexion angle parameters of the wearer during normal walking without assistance in real time, acquiring the flexion angle parameter curves of the hip joint of the wearer through the steps (1-1) and (1-2), and recording the peak time as t Wave crest The trough moment is recorded as t Trough of wave And recording hip joint flexion angles corresponding to the peaks and hip joint flexion angles of the valleys;
(5-3) subtracting the last trough moment from the trough moment appearing before the end of the interval of normal walking tau under the condition of no power of the wearer as the initial moment gait cycle T 1
(5-4) taking the last wave trough moment appearing before the end of the interval of the wearer walking tau minus the wave trough moment before the wave trough moment as the swing phase period I of the gait cycle of the initial moment, and recording as T b1,1
(5-5) the swing phase period II of the gait cycle, which is obtained by subtracting the previous peak time from the trough time appearing the next-to-last time, as the initial time, is denoted as T b1,2
(5-6) the swing phase period III of the gait cycle, which is obtained by subtracting the previous peak time from the last trough time as the initial time, is denoted as T b1,3
(5-7) averaging the three swing phase periods obtained in the steps (5-4, 5-5) and (5-6), and obtaining the swing phase period of the next gait cycle, wherein the swing phase period is taken as the swing phase period of the gait cycle at the initial moment, namely:
Figure BDA0002809008340000071
(5-8) taking the hip joint flexion angle corresponding to the last trough moment as the minimum flexion angle theta of the hip joint in the gait cycle at the initial moment min 1, the maximum flexion angle theta of the hip joint under the gait cycle of the initial moment of the flexion angle of the hip joint corresponding to the last peak moment max ,1;
(5-9) initial time swing phase Power amplitude A 1 The amplitude A of the swing phase power assisting is equal to the amplitude A of the swing phase power assisting which is set by people;
(6) Time wheel from 1 to T max Enumerating, namely, recording T moment at the beginning of each time round, wherein the enumeration time round is to perform T in each scenario number max Sub-steps (7) to (13) aimed at the exoskeleton performing the selection of T by the online policy network under each scenario max And the exoskeleton acts, so that enough data sets are generated for parameter training, and the reliability of training results is improved. And T is max Often the value of (c) is large enough to enable the optimized parameters to converge.
(7) The online policy network selects the action of the exoskeleton at the time t according to the following steps:
a t =μ(s tμ )+Noise (6)
the Noise is used for expanding the value range, so that the range of actions of selecting the exoskeleton at the moment t is larger;
(8) The exoskeleton performs the action selected in step (7), and the exoskeleton wearer performs according to the exoskeletonThe action lasts for a time interval of tau, and scalar rewards r of flexible exoskeleton feedback can be obtained t And the exoskeleton state s at the next moment t+1
Scalar rewards r of the flexible exoskeleton feedback in step (8) t The specific form is as follows:
Figure BDA0002809008340000081
wherein W is the walking ratio, W tv The walking ratio of the healthy elderly is set.
The value of the walking ratio in the step (8) is defined as the ratio of the step length to the step frequency, and the specific form is shown in the formula (8):
Figure BDA0002809008340000082
in the formula ,Dt+1 For the next time step length, the unit is m, N is the step frequency, the unit is steps/s, T t+1 For the next moment gait cycle, the unit is s;
the next time step size can be obtained by:
D t+1 =l(θ max ,t+1-θ min ,t+1) (9)
where l is the leg length of the wearer of the flexible exoskeleton robot; θ max T+1 is the maximum flexion angle of the hip joint at the next moment in gait cycle, θ min T+1 is the minimum flexion angle of the hip joint at the next moment in the gait cycle.
The exoskeleton state s at the next time in the step (8) t+1 Comprises a swing phase power-assisted amplitude A at the next moment t+1 The next moment in time the angle of flexion theta of the hip joint of the exoskeleton wearer t+1 Gait cycle T at next moment t+1 Swing phase period T of next moment gait period bt+1 Maximum flexion angle theta of hip joint in next moment gait cycle max T+1, minimum flexion angle θ of hip joint at next moment gait cycle min T+1; the lower part is provided withExoskeleton state s at one time t+1 Obtained by the following steps:
(8-1) the exoskeleton executes the action selected in the step (7), the time interval of the exoskeleton wearer walking tau is used for acquiring hip joint buckling angle parameters of the exoskeleton wearer during walking in real time through the MEMS attitude sensor, and the buckling angle of the hip joint of the exoskeleton wearer at the moment of the end of walking of the exoskeleton wearer is used as the buckling angle theta of the hip joint of the exoskeleton wearer at the next moment t+1
(8-2) acquiring hip joint flexion angle parameters of the exoskeleton wearer during walking in real time, acquiring a flexion angle parameter curve of the hip joint of the exoskeleton wearer through the steps (1-1) and (1-2), and recording the peak time as t Wave crest The trough moment is recorded as t Trough of wave And recording hip joint buckling angles corresponding to the wave crests and hip joint buckling angles corresponding to the wave troughs;
(8-3) subtracting the last trough moment from the last trough moment occurring before the end of the interval of walking τ of the exoskeleton wearer as the next moment in gait cycle T t+1 The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, the last wave trough moment appearing before the end of the time interval of walking tau of the exoskeleton wearer minus the previous wave crest moment of the wave trough moment is taken as the swing phase period I of the gait period of the next moment and is marked as T bt+1,1 And the swing phase period II of the gait cycle at the next moment is marked as T by subtracting the previous peak moment of the trough moment from the trough moment appearing at the last time bt+1,2 The swing phase period III of the gait cycle at the next moment is marked as T by subtracting the previous peak moment of the trough moment from the trough moment appearing at the third time at the last time bt+1,3 Averaging the three swing phase periods, as shown in formula (10), to obtain the swing phase period of the next gait cycle, and taking the swing phase period as the swing phase period of the next gait cycle:
Figure BDA0002809008340000091
(8-4) taking the hip joint flexion angle corresponding to the last peak moment as the hip joint under the gait cycle of the next momentMaximum flexion angle θ of the joint max T+1, the hip joint buckling angle corresponding to the last trough moment is taken as the minimum buckling angle theta of the hip joint in the gait cycle at the next moment min ,t+1。
(8-5) amplitude A of swing phase assistance at the next time t+1 The amplitude A of the swing phase power assisting is equal to the amplitude A of the swing phase power assisting which is set by people;
(9) State transition process:
state s of exoskeleton at time t t Action a of exoskeleton at t time obtained in step (7) t The state s of the exoskeleton at the moment next to t obtained in the step (8) t+1 Scalar rewards r for flexible exoskeleton feedback t Stored as a training data set in the experience playback pool R for parameter training;
state s of exoskeleton at time t t The acquisition method of (a) specifically refers to: the state of the exoskeleton at the time t is the same as the state of the exoskeleton at the next time obtained by executing the step (8) under the t-1 time round of the current number of nodes.
(10) Randomly sampling N state conversion processes in the step (9) as batch training data to perform parameter training;
the parameter training in the step (10) specifically comprises the following steps:
(10-1) calculating a loss of the online evaluation network, the loss being defined as a mean square error (Mean Squared Error, MSE) form, as shown in equation (11), for updating the online evaluation network parameters:
Figure BDA0002809008340000101
wherein ,L(αQ ) The loss function value of the network is evaluated on line and is used for training and optimizing; q(s) i ,a iQ ) The method is an evaluation value of an online evaluation network, namely a Q value, and the input of the online evaluation network is the state and action of the exoskeleton in the ith state conversion process; y is i Refers to the target of Q value, namely:
y i =r i +γQ'(s i+1 ,μ'(s i+1μ' )|α Q ) (12)
in the formula ,ri Scalar rewards referring to the ith state transition process; s is(s) i+1 Refers to the next exoskeleton state of the ith state transition procedure; gamma is the discount factor, gamma is [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the After Q'(s) i+1 ,μ'(s i+1μ' )|α Q' ) Is a nesting of two functions, the first being Q'(s) i+1 ,μ'(s i+1μ' )|α Q' ) The Q function generated by the target evaluation network is input as the next exoskeleton state and action of the ith state transition process, the next exoskeleton action of the ith state transition process is generated by the target policy network and is a second function μ'(s) i+1μ' ) The input is the next exoskeleton state of the ith state transition procedure;
(10-2) updating the online policy network parameters as shown in formula (13):
Figure BDA0002809008340000111
wherein ,
Figure BDA0002809008340000112
gradient values of network parameters of the online strategy are referred; />
Figure BDA0002809008340000117
Refers to the online evaluation of the gradient of the Q value of the network to action a, which is defined by μ (s iμ ) Generating an online strategy network; />
Figure BDA0002809008340000113
The gradient of network parameters of the online strategy is referred; in the formula->
Figure BDA0002809008340000114
And->
Figure BDA0002809008340000115
Is a multiplication relationship;
(10-3) updating the target policy network parameters and the target evaluation network parameters as shown in formula (14):
Figure BDA0002809008340000116
wherein ,αμ' Refers to target policy network parameters; alpha μ Refers to online policy network parameters; alpha Q' Refers to target evaluation network parameters; alpha Q The network parameters are evaluated on line; sigma refers to an updated scaling parameter, and sigma generally takes a small value.
In summary, after completing the step (10), network parameters in the policy network and the evaluation network can be updated once to promote the parameters of each network in the policy network and the evaluation network to converge, wherein the network parameters in the policy network include an online policy network parameter α of an online policy network μ And target policy network parameter alpha of the target policy network μ' The method comprises the steps of carrying out a first treatment on the surface of the The network parameters in the evaluation network include an online evaluation network parameter α of the online evaluation network Q And a target evaluation network parameter alpha of the target evaluation network Q' The method comprises the steps of carrying out a first treatment on the surface of the Finally, parameter convergence of each network in the strategy network and the evaluation network is realized, namely, the walking ratio of the exoskeleton wearer is promoted to approach to the walking ratio of the set healthy elderly, and finally, the walking ratio of the exoskeleton wearer is stabilized at the walking ratio of the set healthy elderly.
(11) After the step (7) to the step (10) are executed, completing one time of time round, ending enumeration, adding 1 to the time round, and continuing to execute the step (7) to the step (10); until the parameters of each network in the strategy network and the evaluation network are converged, the exoskeleton main assistance parameter alpha to be optimized based on the deep reinforcement learning method is equal to the target strategy network parameter alpha of the target strategy network in the strategy network μ' Target policy network parameter alpha of target policy network in policy network μ' Convergence, namely convergence of the main assistance parameter alpha of the exoskeleton, which represents the optimization based on the deep reinforcement learning method under the condition number, is carried out, wherein the walking ratio of the exoskeleton wearer is stabilized at the set walking ratio of the healthy elderly people, and thenEnding the current number of the emotion nodes and carrying out the next number of the emotion nodes;
(12) After the steps (5) to (11) are executed, the number e of the nodes is counted once, the enumeration is finished, and e=e+1 is caused to continue to execute the steps (5) to (11); until the end of each scenario, the target policy network parameter alpha of the target policy network in the policy network μ' All converge on the same value, namely, the main assistance parameters alpha representing the exoskeleton are converged on the same value, then the main assistance parameters alpha representing the exoskeleton are regarded as being based on the main assistance parameters alpha of the exoskeleton to be optimized by the deep reinforcement learning method, and the main assistance parameters alpha of the exoskeleton can be utilized to realize the optimal assistance of the exoskeleton, so that the walking ratio of the exoskeleton wearer is always stabilized at the walking ratio of the set healthy elderly, and the rehabilitation exercise of the exoskeleton wearer is realized.
The working principle of the invention is as follows: the exoskeleton main assistance parameter alpha is optimized by adopting a deep reinforcement learning-based optimization method; the deep reinforcement learning optimization method, namely a depth deterministic strategy gradient method (Deep Deterministic Policy Gradient, DDPG), builds a strategy network and an evaluation network, solves the problem of flexible exoskeleton continuity control, and generates a data set for parameter training by acquiring and processing hip joint buckling angle information of an exoskeleton wearer in real time so as to realize self-adaptive optimization of main assistance parameters of the exoskeleton;
The invention has the advantages that: the method is applied to rehabilitation treatment of lower limb dysfunctional patients, realizes gait rehabilitation flexible exoskeleton continuous assistance, determines exoskeleton main assistance parameters by an exoskeleton assistance curve equation, solves the problem of flexible exoskeleton continuous control by adopting a depth deterministic strategy gradient method in deep reinforcement learning, and generates a data set for parameter training by acquiring and processing hip joint flexion angle information of exoskeleton wearers in real time, thereby realizing flexible exoskeleton main assistance parameter self-adaptive optimization and ensuring safety of lower limb dysfunctional patients in the rehabilitation training process to a greater extent.
(IV) description of the drawings:
fig. 1 is a schematic diagram of gait cycle in an exoskeleton main power-assisted parameter optimization method based on deep reinforcement learning according to the present invention.
Fig. 2 is a schematic flow diagram of an exoskeleton main assistance parameter optimization method based on deep reinforcement learning.
(V) the specific embodiment:
a more detailed description will be given below in connection with specific embodiments. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. After reading the specific steps and related matters taught by the present invention, those skilled in the relevant arts can make various modifications or applications of the present invention, which equivalent forms are also within the scope of the claims appended hereto.
Determining optimization parameters according to an exoskeleton boosting curve equation, wherein the curve equation is in a compound sinusoidal form shown in a formula (1):
Figure BDA0002809008340000131
in the formula ,Fassist For the real-time power assisting, A is swing phase power assisting amplitude, t * Is the time between the current time and the power-assisting starting time, T b For the swing phase period of the current gait cycle, alpha is an exoskeleton main assistance parameter, and serves as a waveform control parameter of a formula (1) to change the assistance peak position, wherein the value range is-1;
the swing phase power-assisted amplitude A can be regarded as a known value under the rated work of the power-assisted components and is determined by the rated output value of the power-assisted components, and the swing phase power-assisted amplitude is a known value and can be set manually under the rated work of the power-assisted components. For example: if the direct current motor is selected as the power assisting component, the rated output torque is T Force of force =9549×p/N, in N/m. Wherein P is the rated power of the motor, N is the rated rotation speed of the motor, and the rated output force of the motor can be obtained according to actual conditions, and the unit is N.
Swing phase period T of current gait cycle b The method comprises the steps of acquiring the flexion angle parameters of the hip joint of a wearer during walking through an MEMS attitude sensor to acquire the flexion angle parameter curves of the hip joint of the wearer, and averaging the previous three swing phase periods to obtain the next swing phase period The swing phase period method of the gait, namely the swing phase period of the next gait obtained by averaging can be used as the swing phase period of the current gait period by the first three swing phase periods, so the swing phase period T of the current gait period b Can be regarded as a known value, obtained from formula (2):
Figure BDA0002809008340000141
the method specifically comprises the following steps: the MEMS attitude sensor is placed at the middle position of the rear parts of the left thigh and the right thigh of a wearer of the flexible exoskeleton robot, and hip joint flexion angle parameters of the wearer during normal walking of the wearer are acquired in real time to acquire a flexion angle parameter curve of the hip joint of the wearer, and as shown in figure 1, the peak moment is recorded as t Wave crest The trough moment is recorded as t Trough of wave And the hip joint buckling angle corresponding to the peak hip joint buckling angle and the trough hip joint buckling angle are recorded, and the current gait cycle shown in the formula (3) and the swing phase cycle of the gait cycle shown in the formula (4) can be further calculated as follows:
T(k)=t trough of wave (k)-t Trough of wave (k-1) (3)
T b (k)=t Wave crest (k)-t Trough of wave (k) (4)
Wherein, the expression (3) indicates that the current gait cycle is calculated by the values of two adjacent wave valley points, wherein T is the current gait cycle; the swing phase period of the gait cycle is calculated by the values of adjacent wave peak points and wave trough points;
Correspondingly, the maximum hip joint buckling angle theta corresponding to the current gait cycle in fig. 2 can be obtained max (k) Minimum hip flexion angle θ min (k) For the state of the exoskeleton at the initial time in step (5) and the state of the exoskeleton at the next time in step (8).
The method for acquiring the buckling angle parameter curve of the hip joint of the wearer comprises the following steps:
(1-1) acquiring hip joint buckling angle parameter signals of a wearer of the flexible exoskeleton robot by using an MEMS attitude sensor, converting the hip joint buckling angle parameter signals into digital quantity signals, transmitting the digital quantity signals to a singlechip, and transmitting the digital quantity signals to a PC (Personal Computer ) end by using the singlechip;
the data transmission between the singlechip and the PC end in the step (1-1) is that the singlechip transmits to the PC end through serial communication and a Bluetooth module by utilizing a wireless network.
(1-2) obtaining a hip joint buckling angle parameter signal by utilizing a serial port interface in MATLAB installed at the PC end, and drawing a real-time curve of the hip joint buckling angle parameter through a plot function.
In the step (1), under the rated operation of the power assisting component, the swing phase power assisting amplitude A can be regarded as a known value; acquiring hip joint flexion angle parameters of a wearer during walking through an MEMS attitude sensor to acquire a flexion angle parameter curve of the hip joint of the wearer, and adopting a method that the first three swing phase periods in the flexion angle parameter curve are averaged to obtain the swing phase period of the next gait, namely, the swing phase period of the next gait is averaged to obtain the swing phase period of the current gait period as the swing phase period of the current gait period, so that the swing phase period T of the current gait period b Can be regarded as a known value. The invention only optimizes the waveform control parameter alpha based on the deep reinforcement learning method.
The method for acquiring the buckling angle parameter curve of the hip joint of the wearer comprises the following steps:
(1-1) acquiring hip joint buckling angle parameter signals of a wearer of the flexible exoskeleton robot by using an MEMS attitude sensor, converting the hip joint buckling angle parameter signals into digital quantity signals, transmitting the digital quantity signals to a singlechip, and transmitting the digital quantity signals to a PC (personal computer) end by using the singlechip;
the data transmission between the singlechip and the PC end in the step (1-1) is that the singlechip transmits to the PC end through serial communication and a Bluetooth module by utilizing a wireless network.
(1-2) obtaining a hip joint buckling angle parameter signal by utilizing a serial port interface in MATLAB installed at the PC end, and drawing a real-time curve of the hip joint buckling angle parameter through a plot function.
The exoskeleton main assistance parameter training optimization is realized by using a depth deterministic strategy gradient method (DDPG) in the deep reinforcement learning, and the exoskeleton main assistance parameter is determined to be alpha. The specific flow is shown in a flow chart of FIG. 2 based on deep reinforcement learning assistance parameter optimization.
(2) Parameters are set. The method comprises the steps that the walking time interval of an exoskeleton wearer is set to be tau=5-7 s each time, the time interval can be properly increased, the exoskeleton wearer can walk for at least 3 steps, the swing phase period of the current gait cycle is acquired, the exoskeleton wearer can stably stand when finishing the walking time interval each time, and the exoskeleton can judge the boosting condition again after each time of advancing; presetting a maximum scenario number E, a batch sampling number N and a maximum time wheel T of each scenario max The method comprises the steps of carrying out a first treatment on the surface of the And, the primary scenario corresponds to convergence of the primary parameters; said setting maximum time per episode wheel T max The number of rounds to be performed under each scenario is set, and each round corresponds to the number of time intervals, namely: the maximum convergence of the main assistance parameter alpha of the exoskeleton is needed to be completed at each time max Wheels, each wheel requiring a time interval for the exoskeleton wearer to walk τ; and, starting the number of rounds, recording a time, wherein the number of rounds starting time is defined as T moment, namely, the starting time of the first round corresponds to t=1 moment, and so on, the T-th moment max The start time of the number of rounds corresponds to t=t max Time of day.
(3) Standard configurations in depth deterministic policy gradient method (DDPG) were designed, including policy network, evaluation network as shown in fig. 2. The policy network includes an online policy network and a target online network, and the evaluation network includes an online evaluation network and a target evaluation network. Initializing an online policy network μ (s|α μ ) On-line evaluation network Q (s, a|α Q ) Constructing a target policy network mu (s|alpha μ' ) Constructing a target evaluation network Q (s, a|alpha Q' ) And copying parameters of the online policy network and the online evaluation network to respective target network parameters, namely alpha μ' ←α μ and αQ' ←α Q The method comprises the steps of carrying out a first treatment on the surface of the Wherein alpha refers to parameters to be optimized based on a deep reinforcement learning method, s refers to the state of the exoskeleton, and a refers to the action of the exoskeleton; initializing experience playbackPool R;
specifically, the exoskeleton state s includes swing phase assist amplitude a, current gait cycle T, and swing phase period T of the current gait cycle b Flexion angle θ of hip joint of exoskeleton wearer, maximum flexion angle θ of hip joint at current gait cycle max Minimum flexion angle theta to hip joint in current gait cycle min The method comprises the steps of carrying out a first treatment on the surface of the The action a of the exoskeleton is the power assisting amount of the exoskeleton, and the power assisting direction of the exoskeleton is always positive, namely, the exoskeleton is vertically upward. The swing phase power-assisted amplitude is determined by the rated output value of a power-assisted component and can be set manually.
The design of the strategy network and the evaluation network by using the depth deterministic strategy gradient method comprises the following steps:
(3-1) network μ (s|α) for online policies μ ) On-line evaluation network Q (s, a|α Q ) Initializing;
(3-2) construction and on-line policy network μ (s|α μ ) Target policy network μ (s|α μ' ) Construction and on-line evaluation of network Q (s, a|α Q ) Target evaluation network Q (s, a|α Q' ) And copying parameters of the online policy network and the online evaluation network to respective target network parameters, namely alpha μ' ←α μ and αQ' ←α Q The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the main assistance parameter alpha of the exoskeleton is used as a parameter to be optimized based on a deep reinforcement learning method, s refers to the state of the exoskeleton, and a refers to the action of the exoskeleton; the experience playback pool R is initialized.
The exoskeleton state s in the step (3-2) comprises swing phase assistance amplitude A, current gait cycle T and swing phase cycle T of the current gait cycle b Flexion angle θ of hip joint of exoskeleton wearer, maximum flexion angle θ of hip joint at current gait cycle max Minimum flexion angle theta to hip joint in current gait cycle min The method comprises the steps of carrying out a first treatment on the surface of the The action a of the exoskeleton is the power assisting amount of the exoskeleton, and the power assisting direction of the exoskeleton is always positive, namely, the exoskeleton is vertically upward.
(4) Enumerating the scenario number E from 1 to E, namely performing E times of convergence on the main assistance parameter alpha of the exoskeleton, wherein the state of the exoskeleton at the initial moment can be obtained at the beginning of each scenario;
(5) An initial state is acquired.
When each scenario in step (4) begins, it is necessary to have the exoskeleton wearer walk τ normally without assistance for a time interval and obtain the exoskeleton state as the initial exoskeleton state s at time t=1 1 Specifically comprises an initial moment swing phase power-assisted amplitude A 1 Initial moment exoskeleton wearer hip joint flexion angle θ 1 Gait cycle T at initial moment 1 Swing phase period T of initial moment gait cycle b1 Maximum flexion angle theta of hip joint in initial moment gait cycle max 1, minimum flexion angle theta of hip joint in gait cycle at initial moment min ,1;
The state s of the exoskeleton at the initial time in the step (5) 1 The specific obtaining method of (2) comprises the following steps:
(5-1) enabling the exoskeleton wearer to walk tau normally without assistance, placing MEMS attitude sensors at middle positions of the rear parts of the left and right thighs of the wearer of the flexible exoskeleton robot, collecting hip joint flexion angle parameters of the wearer during normal walking in real time, and taking the flexion angle of the hip joint at the end of walking of the exoskeleton wearer as the flexion angle theta of the hip joint of the exoskeleton wearer at the initial time 1
(5-2) acquiring the hip joint flexion angle parameters of the wearer during normal walking without assistance in real time, acquiring the flexion angle parameter curves of the hip joint of the wearer through the steps (1-1) and (1-2), and recording the peak time as t as shown in fig. 1 Wave crest The trough moment is recorded as t Trough of wave And recording hip joint flexion angles corresponding to the peaks and hip joint flexion angles of the valleys;
(5-3) subtracting the last trough moment from the trough moment appearing before the end of the interval of normal walking tau under the condition of no power of the wearer as the initial moment gait cycle T 1
(5-4) subtracting the last trough moment occurring before the end of the interval of wearer walking τ from thisThe swing phase period I of the gait cycle taking the previous peak moment of the trough moment as the initial moment is recorded as T b1,1
(5-5) the swing phase period II of the gait cycle, which is obtained by subtracting the previous peak time from the trough time appearing the next-to-last time, as the initial time, is denoted as T b1,2
(5-6) the swing phase period III of the gait cycle, which is obtained by subtracting the previous peak time from the last trough time as the initial time, is denoted as T b1,3
(5-7) averaging the three swing phase periods obtained in the steps (5-4, 5-5) and (5-6), and obtaining the swing phase period of the next gait cycle, wherein the swing phase period is taken as the swing phase period of the gait cycle at the initial moment, namely:
Figure BDA0002809008340000191
(5-8) taking the hip joint flexion angle corresponding to the last trough moment as the minimum flexion angle theta of the hip joint in the gait cycle at the initial moment min 1, the maximum flexion angle theta of the hip joint under the gait cycle of the initial moment of the flexion angle of the hip joint corresponding to the last peak moment max ,1;
(5-9) initial time swing phase Power amplitude A 1 The amplitude A of the swing phase power assisting is equal to the amplitude A of the swing phase power assisting which is set by people;
(6) Enumerating time wheel from 1 to T max Enumeration is performed, and the time t is recorded at the beginning of each time round.
(7) The online policy network selects the action of the exoskeleton at time t as in fig. 2 according to the following equation:
a t =μ(s tμ )+Noise
the Noise is used for expanding the value range, so that the range of actions of selecting the exoskeleton at the moment t is larger;
(8) The exoskeleton performs the action selected in step (7), and the exoskeleton wearer can obtain the actions shown in fig. 2 for a time interval of τ based on the action performed by the exoskeletonScalar rewards r for flexible exoskeleton feedback t And the exoskeleton state s at the next moment t+1
Scalar rewards r for flexible exoskeleton feedback t The specific form is as follows:
Figure BDA0002809008340000201
wherein W is the walking ratio, W tv The walking ratio of the healthy elderly is set.
The results of previous studies indicate that the walking ratio can be used to describe gait patterns, which do not vary significantly with the physical ability of the subject, the stability of walking, the degree of concentration, etc., for a particular subject. While for different healthy individuals, the walking ratio is not significantly different, and the walking ratio of the normal gait of the old aged over 60 years is usually between 0.0044 and 0.0055.
The value of the walking ratio is defined as the ratio of the step size to the step frequency, and the specific form is shown in the following formula:
Figure BDA0002809008340000202
in the formula ,Dt+1 For the next time step length, the unit is m, N is the step frequency, the unit is steps/s, T t+1 For the next moment gait cycle, the unit is s;
the next time step size can be obtained by:
D t+1 =l(θ max ,t+1-θ min ,t+1)
where l is the leg length of the wearer of the flexible exoskeleton robot; θ max T+1 is the maximum flexion angle of the hip joint at the next moment in gait cycle, θ min T+1 is the minimum flexion angle of the hip joint at the next moment in the gait cycle.
Exoskeleton state s at next moment t+1 Comprises a swing phase power-assisted amplitude A at the next moment t+1 The next moment in time the angle of flexion theta of the hip joint of the exoskeleton wearer t+1 Gait at next momentPeriod T t+1 Swing phase period T of next moment gait period bt+1 Maximum flexion angle theta of hip joint in next moment gait cycle max T+1, minimum flexion angle θ of hip joint at next moment gait cycle min T+1; the exoskeleton state s at the next moment t+1 Obtained by the following steps:
(8-1) the exoskeleton executes the action selected in the step (7), the time interval of the exoskeleton wearer walking tau is used for acquiring hip joint buckling angle parameters of the exoskeleton wearer during walking in real time through the MEMS attitude sensor, and the buckling angle of the hip joint of the exoskeleton wearer at the moment of the end of walking of the exoskeleton wearer is used as the buckling angle theta of the hip joint of the exoskeleton wearer at the next moment t+1
(8-2) acquiring hip joint flexion angle parameters of the exoskeleton wearer during walking in real time, acquiring a flexion angle parameter curve of the hip joint of the exoskeleton wearer through the steps (1-1) and (1-2), and recording the peak time as t Wave crest The trough moment is recorded as t Trough of wave And recording hip joint buckling angles corresponding to the wave crests and hip joint buckling angles corresponding to the wave troughs;
(8-3) subtracting the last trough moment from the last trough moment occurring before the end of the interval of walking τ of the exoskeleton wearer as the next moment in gait cycle T t+1 The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, the last wave trough moment appearing before the end of the time interval of walking tau of the exoskeleton wearer minus the previous wave crest moment of the wave trough moment is taken as the swing phase period I of the gait period of the next moment and is marked as T bt+1,1 And the swing phase period II of the gait cycle at the next moment is marked as T by subtracting the previous peak moment of the trough moment from the trough moment appearing at the last time bt+1,2 The swing phase period III of the gait cycle at the next moment is marked as T by subtracting the previous peak moment of the trough moment from the trough moment appearing at the third time at the last time bt+1,3 Averaging the three swing phase periods, as shown in formula (10), to obtain the swing phase period of the next gait cycle, and taking the swing phase period as the swing phase period of the next gait cycle:
Figure BDA0002809008340000211
(8-4) taking the hip joint flexion angle corresponding to the last peak moment as the maximum flexion angle θ of the hip joint in the gait cycle at the next moment max T+1, the hip joint buckling angle corresponding to the last trough moment is taken as the minimum buckling angle theta of the hip joint in the gait cycle at the next moment min ,t+1。
(8-5) amplitude A of swing phase assistance at the next time t+1 The amplitude A of the swing phase power assisting is equal to the amplitude A of the swing phase power assisting which is set by people;
(9) State transition process: state s of exoskeleton at time t t Action a of exoskeleton at t time obtained in step (7) t The state s of the exoskeleton at the moment next to t obtained in the step (8) t+1 Scalar rewards r for flexible exoskeleton feedback t Stored as a training data set in the experience playback pool R for parameter training;
state s of exoskeleton at time t t The acquisition method of (a) specifically refers to: the state of the exoskeleton at the time t is the same as the state of the exoskeleton at the next time obtained by executing the step (8) under the t-1 time round of the current number of nodes. For example t=2 moment exoskeleton state s t And the exoskeleton state s obtained by the step (8) at the next moment in time under the 1 st time round of the current number of nodes t+1 The same applies to the state s of the exoskeleton at time t t Including the flexion angle theta of the hip joint of the exoskeleton wearer at time t t Gait cycle T at time T t Maximum flexion angle theta of hip joint in gait cycle at moment t max Minimum buckling angle theta of hip joint in gait cycle at time t and time t min And t. Specifically, at time t=1, the state of the exoskeleton at time t is obtained by step (5).
(10) Randomly sampling N steps (8) state transitions, as shown in FIG. 2, N (s i ,a i ,r i ,s i+1 ) Performing parameter training as a batch of training data, wherein steps (10) to (12) are specific processes of parameter training; wherein s is i Representing the state of the exoskeleton of the ith state transition process, a i Representing the motion of the exoskeleton of the ith state transition process, r i Scalar rewards, s, representing the ith state transition process i+1 Representing the next exoskeleton state of the ith state transition process.
(10-1) calculating a loss of the online evaluation network, the loss being defined as a mean square error (Mean Squared Error, MSE) form, as shown in equation (11), for updating the online evaluation network parameters:
Figure BDA0002809008340000221
wherein ,L(αQ ) The loss function value of the network is evaluated on line and is used for training and optimizing; q(s) i ,a iQ ) The method is an evaluation value of an online evaluation network, namely a Q value, and the input of the online evaluation network is the state and action of the exoskeleton in the ith state conversion process; y is i Refers to the target of Q value, namely:
y i =r i +γQ'(s i+1 ,μ'(s i+1μ' )|α Q ) (12)
in the formula ,ri Scalar rewards referring to the ith state transition process; s is(s) i+1 Refers to the next exoskeleton state of the ith state transition procedure; gamma is the discount factor, gamma is [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the After Q'(s) i+1 ,μ'(s i+1μ' )|α Q' ) Is a nesting of two functions, the first being Q'(s) i+1 ,μ'(s i+1μ' )|α Q' ) The Q function generated by the target evaluation network is input as the next exoskeleton state and action of the ith state transition process, the next exoskeleton action of the ith state transition process is generated by the target policy network and is a second function μ'(s) i+1μ' ) The input is the next exoskeleton state of the ith state transition procedure;
(10-2) updating the online policy network parameters as shown in formula (13):
Figure BDA0002809008340000231
wherein ,
Figure BDA0002809008340000232
gradient values of network parameters of the online strategy are referred; />
Figure BDA0002809008340000233
Refers to the online evaluation of the gradient of the Q value of the network to action a, which is defined by μ (s iμ ) Generating an online strategy network; />
Figure BDA0002809008340000234
The gradient of network parameters of the online strategy is referred; in the formula->
Figure BDA0002809008340000235
And->
Figure BDA0002809008340000236
Is a multiplication relationship;
(10-3) see fig. 2 for slow updates of target policy network parameters and target evaluation network parameters, as shown in equation (14):
Figure BDA0002809008340000237
wherein ,αμ' Refers to target policy network parameters; alpha μ Refers to online policy network parameters; alpha Q' Refers to target evaluation network parameters; alpha Q The network parameters are evaluated on line; sigma refers to the updated scaling parameters, and typically takes on very small values, e.g. 0.001, i.e. it means that updating of the target policy network parameters and the target evaluation network parameters is a slow process, largely preserving the values of the target policy network parameters and the target evaluation network parameters.
In summary, after completing the step (10), the network parameters in the policy network and the evaluation network can be updated once to promote the parameters of each network in the policy network and the evaluation network to converge, wherein,the network parameters in the policy network include an online policy network parameter α of the online policy network μ And target policy network parameter alpha of the target policy network μ' The method comprises the steps of carrying out a first treatment on the surface of the The network parameters in the evaluation network include an online evaluation network parameter α of the online evaluation network Q And a target evaluation network parameter alpha of the target evaluation network Q' The method comprises the steps of carrying out a first treatment on the surface of the Finally, parameter convergence of each network in the strategy network and the evaluation network is realized, namely, the walking ratio of the exoskeleton wearer is promoted to approach to the walking ratio of the set healthy elderly, and finally, the walking ratio of the exoskeleton wearer is stabilized at the walking ratio of the set healthy elderly.
(11) After the step (7) to the step (10) are executed, completing one time of time round, ending enumeration, adding 1 to the time round, and continuing to execute the step (7) to the step (10); until the parameters of each network in the strategy network and the evaluation network are converged, the exoskeleton main assistance parameter alpha to be optimized based on the deep reinforcement learning method is equal to the target strategy network parameter alpha of the target strategy network in the strategy network μ' Target policy network parameter alpha of target policy network in policy network μ' Convergence, namely convergence of exoskeleton main power assisting parameters alpha which represent the exoskeleton to be optimized based on the deep reinforcement learning method under the condition number, wherein the walking ratio of an exoskeleton wearer is stabilized at the set walking ratio of healthy old people, and ending the current condition number and carrying out the next condition number;
(12) After the steps (5) to (11) are executed, the number e of the nodes is counted once, the enumeration is finished, and e=e+1 is caused to continue to execute the steps (5) to (11); until the end of each scenario, the target policy network parameter alpha of the target policy network in the policy network μ' All converge on the same value, namely, the main assistance parameters alpha representing the exoskeleton are converged on the same value, then the main assistance parameters alpha representing the exoskeleton are regarded as being based on the main assistance parameters alpha of the exoskeleton to be optimized by the deep reinforcement learning method, and the main assistance parameters alpha of the exoskeleton can be utilized to realize the optimal assistance of the exoskeleton, so that the walking ratio of the exoskeleton wearer is always stabilized at the walking ratio of the set healthy elderly, and the rehabilitation exercise of the exoskeleton wearer is realized.
In the case of one episode, the number of episodes,the main assistance parameter alpha of the exoskeleton to be optimized based on the deep reinforcement learning method converges in advance, namely when the main assistance parameter alpha of the exoskeleton to be optimized based on the deep reinforcement learning method converges at a certain value and does not change any more, the current time round number is calculated <T max The scenario number is enumerated to end and let e=e+1.
In continuous motion space, motion is a continuous floating point number (e.g., exoskeleton power E [0, A is controlled by the exoskeleton power curve equation)]Including not only the magnitude of the force, but also the direction of the force), in said step (7) the online policy network employs a deterministic policy μ (s|α μ ) The output value of the selected action, namely the deterministic strategy, is a specific floating point number which represents the specific action, so that the method is suitable for continuous action space and can be used for solving the problem of flexible exoskeleton continuity control.
Finally, when the number of nodes in each case is over, the target strategy network parameters of the target strategy network in the strategy network are converged to the same value, namely, the main assistance parameters of the representative exoskeleton are converged to the same value, and the main assistance parameters of the exoskeleton are optimized based on the deep reinforcement learning method, so that the optimal assistance of the exoskeleton can be realized by utilizing the main assistance parameters of the exoskeleton, the walking ratio of the exoskeleton wearer is stabilized at the set walking ratio of healthy elderly people, and the rehabilitation exercise of the exoskeleton wearer is realized.

Claims (10)

1. The exoskeleton main assistance parameter optimization method based on deep reinforcement learning is characterized by comprising the following steps of:
(1) Determining optimization parameters;
determining optimization parameters according to an exoskeleton boosting curve equation, wherein the curve equation is in a compound sinusoidal form shown in a formula (1):
Figure FDA0004040150970000011
in the formula ,Fassist For the real-time power assisting, A is swing phase power assisting amplitude, t * Is at presentTime from moment to moment of starting assistance, T b For the swing phase period of the current gait cycle, alpha is an exoskeleton main assistance parameter, and serves as a waveform control parameter of a formula (1) to change the assistance peak position, wherein the value range is-1;
(2) Setting parameters:
setting the walking time interval of the exoskeleton wearer to be T=5-7 s, properly increasing the time interval, ensuring that the exoskeleton wearer can walk at least 3 steps for acquiring the swing phase period of the current gait cycle, enabling the exoskeleton wearer to stably stand when finishing the walking time interval, and judging the boosting condition again after each advancing; presetting a maximum scenario number E, a batch sampling number N and a maximum time wheel T of each scenario max
(3) Standard configuration in a depth deterministic strategy gradient method is designed, which specifically comprises the design of a strategy network and an evaluation network; wherein the policy network comprises an online policy network μ (s|α μ ) And a target policy network μ (s|α μ' ) The method comprises the steps of carrying out a first treatment on the surface of the The evaluation network comprises an online evaluation network Q (s, a|a Q ) And a target evaluation network Q (s, a|α Q' );
(4) Enumerating the scenario number E from 1 to E, namely performing E times of convergence on the main assistance parameter alpha of the exoskeleton, wherein the state of the exoskeleton at the initial moment can be obtained at the beginning of each scenario;
(5) Acquiring an initial state:
when each scenario in step (4) begins, it is necessary to have the exoskeleton wearer walk normally without assistance for a time interval of T and obtain the state of the exoskeleton as the state s of the exoskeleton at the initial time of time t=1 1 Specifically comprises an initial moment swing phase power-assisted amplitude A 1 Initial moment exoskeleton wearer hip joint flexion angle θ 1 Gait cycle T at initial moment 1 Swing phase period T of initial moment gait cycle b1 Maximum flexion angle theta of hip joint in initial moment gait cycle max 1, minimum flexion angle theta of hip joint in gait cycle at initial moment min ,1;
(6) Time of dayThe space wheel is from 1 to T max Enumerating, namely, recording T moment at the beginning of each time round, wherein the enumeration time round is to perform T in each scenario number max Sub-steps (7) to (13) aimed at the exoskeleton performing the selection of T by the online policy network under each scenario max The action of the secondary exoskeleton is performed, so that a data set is generated for parameter training, and the reliability of training results is improved; and T is max The larger the value of (1), the more times of enumeration and thus the more data are generated, so as to enable the optimized parameters to be converged;
(7) The online policy network selects the action of the exoskeleton at the time t according to the following steps:
a t =μ(s t |αμ)+Noise (6)
the Noise is used for expanding the value range, so that the range of actions of selecting the exoskeleton at the moment t is larger;
(8) The exoskeleton performs the action selected in the step (7), and the exoskeleton wearer can obtain scalar rewards r of the flexible exoskeleton feedback according to the action performed by the exoskeleton for a time interval of T t And the exoskeleton state s at the next moment t+1
(9) State transition process:
state s of exoskeleton at time t t Action a of exoskeleton at t time obtained in step (7) t The state s of the exoskeleton at the moment next to t obtained in the step (8) t+1 Scalar rewards r for flexible exoskeleton feedback t Stored as a training data set in the experience playback pool R for parameter training;
(10) Randomly sampling N state conversion processes in the step (9) as batch training data to perform parameter training;
(11) After the step (7) to the step (10) are executed, completing one time of time round, ending enumeration, adding 1 to the time round, and continuing to execute the step (7) to the step (10); until the parameters of each network in the strategy network and the evaluation network are converged, the exoskeleton main assistance parameter alpha to be optimized based on the deep reinforcement learning method is equal to the target strategy network parameter alpha of the target strategy network in the strategy network μ' Policy network meshTarget policy network parameter alpha of target policy network μ' Convergence, namely convergence of exoskeleton main power assisting parameters alpha which represent the exoskeleton to be optimized based on the deep reinforcement learning method under the condition number, wherein the walking ratio of an exoskeleton wearer is stabilized at the set walking ratio of healthy old people, and ending the current condition number and carrying out the next condition number;
(12) After the steps (5) to (11) are executed, the number e of the nodes is counted once, the enumeration is finished, and e=e+1 is caused to continue to execute the steps (5) to (11); until the end of each scenario, the target policy network parameter alpha of the target policy network in the policy network μ' All converge on the same value, namely, the main assistance parameters alpha representing the exoskeleton are converged on the same value, then the main assistance parameters alpha representing the exoskeleton are regarded as being based on the main assistance parameters alpha of the exoskeleton to be optimized by the deep reinforcement learning method, and the main assistance parameters alpha of the exoskeleton can be utilized to realize the optimal assistance of the exoskeleton, so that the walking ratio of the exoskeleton wearer is always stabilized at the walking ratio of the set healthy elderly, and the rehabilitation exercise of the exoskeleton wearer is realized.
2. The method for optimizing the main assistance parameters of the exoskeleton based on the deep reinforcement learning according to claim 1, wherein the swing phase assistance amplitude A is determined by a rated output value of an assistance component, and the swing phase assistance amplitude is a known value and can be set manually under the rated operation of the assistance component; swing phase period T of the current gait cycle b The method comprises the steps of collecting hip joint buckling angle parameters of a wearer during walking by using an MEMS attitude sensor to obtain a buckling angle parameter curve of the hip joint of the wearer, and adopting a method for averaging the first three swing phase periods in the buckling angle parameter curve to obtain the swing phase period of the next gait, namely averaging the first three swing phase periods to obtain the swing phase period of the next gait as the swing phase period of the current gait period; thus, the swing phase period of the current gait cycle corresponds to a known value, obtained by equation (2).
Figure FDA0004040150970000041
3. The method for optimizing exoskeleton main power assisting parameters based on deep reinforcement learning according to claim 2, wherein the swing phase period T of the current gait cycle b The specific calculation method of (2) is as follows:
the MEMS attitude sensor is placed at the middle position of the rear parts of the left thigh and the right thigh of a wearer of the flexible exoskeleton robot, hip joint flexion angle parameters of the wearer during normal walking of the wearer are acquired in real time, so as to acquire a flexion angle parameter curve of the hip joint of the wearer, and the peak moment is recorded as t Wave crest The trough moment is recorded as t Trough of wave And the hip joint buckling angle corresponding to the peak hip joint buckling angle and the trough hip joint buckling angle are recorded, and the current gait cycle shown in the formula (3) and the swing phase cycle of the gait cycle shown in the formula (4) can be further calculated as follows:
T(k)=t Trough of wave (k)-t Trough of wave (k-1)(3)T b (k)=t Wave crest (k)-t Trough of wave (k)(4)
Wherein, the expression (3) indicates that the current gait cycle is calculated by the values of two adjacent wave valley points, wherein T is the current gait cycle; the swing phase period of the gait cycle is calculated by the values of adjacent wave peak points and wave trough points; further can obtain the maximum hip joint buckling angle theta corresponding to the current gait cycle max (k) Minimum hip flexion angle θ min (k)。
4. The method for optimizing exoskeleton main assistance parameters based on deep reinforcement learning according to claim 2, wherein the method for acquiring the buckling angle parameter curve of the hip joint of the wearer comprises the following steps:
(1-1) acquiring hip joint buckling angle parameter signals of a wearer of the flexible exoskeleton robot by using an MEMS attitude sensor, converting the hip joint buckling angle parameter signals into digital quantity signals, transmitting the digital quantity signals to a singlechip, and transmitting the digital quantity signals to a PC (personal computer) end by using the singlechip; the data transmission between the singlechip and the PC end is that the singlechip transmits to the PC end through a Bluetooth module by utilizing a wireless network through serial communication;
(1-2) obtaining a hip joint buckling angle parameter signal by utilizing a serial port interface in MATLAB installed at the PC end, and drawing a real-time curve of the hip joint buckling angle parameter through a plot function.
5. The method for optimizing exoskeleton main assistance parameters based on deep reinforcement learning according to claim 1, wherein the setting of the maximum scenario number E in the step (2) means setting the convergence number of the exoskeleton main assistance parameters a optimized by the deep reinforcement learning method, namely: the primary scenario corresponds to convergence of the primary parameters; said setting maximum time per episode wheel T max The number of rounds to be performed under each scenario is set, and each round corresponds to the number of time intervals, namely: the maximum completion T of each time the main assistance parameter a of the exoskeleton is converged max Wheels, each wheel requiring a time interval for the exoskeleton wearer to walk T; and, one round of beginning, record one time of time, and define round of beginning time as T moment, namely the first round of beginning time corresponds to t=1 moment, and so on, the T th max The start time of the number of rounds corresponds to t=t max Time of day.
6. The method for optimizing exoskeleton main assistance parameters based on deep reinforcement learning according to claim 1, wherein the design of the strategy network and the evaluation network by using the depth deterministic strategy gradient method in the step (3) specifically comprises the following steps:
(3-1) network μ (s|α) for online policies μ ) On-line evaluation network Q (s, a|α Q ) Initializing;
(3-2) construction and on-line policy network μ (s|a μ ) Target policy network μ (s|α μ' ) Construction and on-line evaluation of network Q (s, a|α Q ) Target evaluation network Q (s, a|α Q' ) And copying parameters of the online policy network and the online evaluation network to respective target network parameters, namely alpha μ'←α and αQ' ←α Q The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the exoskeleton main assistance parameter alpha is used as the optimal value based on the deep reinforcement learning methodThe parameters of the transformation, s refers to the state of the exoskeleton, and a refers to the action of the exoskeleton; initializing an experience playback pool R;
the exoskeleton state s in the step (3-2) comprises swing phase assistance amplitude A, current gait cycle T and swing phase cycle T of the current gait cycle b Flexion angle θ of hip joint of exoskeleton wearer, maximum flexion angle θ of hip joint at current gait cycle max Minimum flexion angle theta to hip joint in current gait cycle min The method comprises the steps of carrying out a first treatment on the surface of the The action a of the exoskeleton is the power assisting amount of the exoskeleton, and the power assisting direction of the exoskeleton is always positive, namely, the exoskeleton is vertically upward.
7. The method for optimizing main assist parameters of exoskeleton based on deep reinforcement learning as set forth in claim 1, wherein said step (5) is characterized by a state s of exoskeleton at an initial time 1 The specific obtaining method of (2) comprises the following steps:
(5-1) enabling the exoskeleton wearer to walk normally without assistance for a time interval of T, placing MEMS attitude sensors at middle positions of the rear parts of the left and right thighs of the wearer of the flexible exoskeleton robot, collecting hip joint flexion angle parameters of the wearer during normal walking in real time, and taking the flexion angle of the hip joint at the moment of the end of walking of the exoskeleton wearer as the flexion angle theta of the hip joint of the exoskeleton wearer at the initial moment 1
(5-2) acquiring the hip joint flexion angle parameters of the wearer during normal walking without assistance in real time, acquiring the flexion angle parameter curves of the hip joint of the wearer through the steps (1-1) and (1-2), and recording the peak time as t Wave crest The trough moment is recorded as t Trough of wave And recording hip joint flexion angles corresponding to the peaks and hip joint flexion angles of the valleys;
(5-3) subtracting the last trough moment from the trough moment which appears before the end of the time interval of normal walking T under the condition of no power of the wearer as the initial moment gait cycle T 1
(5-4) subtracting the previous peak time of the trough time from the last trough time of the wearer's walking T before the end of the interval The wobble phase period I of the state period is denoted as T b1,1
(5-5) the swing phase period II of the gait cycle, which is obtained by subtracting the previous peak time from the trough time appearing the next-to-last time, as the initial time, is denoted as T b1,2
(5-6) the swing phase period III of the gait cycle, which is obtained by subtracting the previous peak time from the last trough time as the initial time, is denoted as T b1,3
(5-7) averaging the three swing phase periods obtained in the steps (5-4, 5-5) and (5-6), and obtaining the swing phase period of the next gait cycle, wherein the swing phase period is taken as the swing phase period of the gait cycle at the initial moment, namely:
Figure FDA0004040150970000071
(5-8) taking the hip joint flexion angle corresponding to the last trough moment as the minimum flexion angle theta of the hip joint in the gait cycle at the initial moment min 1, the maximum flexion angle theta of the hip joint under the gait cycle of the initial moment of the flexion angle of the hip joint corresponding to the last peak moment max ,1;
(5-9) initial time swing phase Power amplitude A 1 Is equal to the swing phase power-assisted amplitude A set by people.
8. The method of deep reinforcement learning-based exoskeleton main boost parameter optimization of claim 1, wherein said flexible exoskeleton feedback scalar rewards r in step (8) t The specific form is as follows:
Figure FDA0004040150970000072
wherein W is the walking ratio, W tv The walking ratio of the healthy elderly is set;
the value of the walking ratio in the step (8) is defined as the ratio of the step length to the step frequency, and the specific form is shown in the formula (8):
Figure FDA0004040150970000081
in the formula ,Dt+1 For the next time step length, the unit is m, N is the step frequency, the unit is steps/s, T t+1 For the next moment gait cycle, the unit is s;
the next time step size can be obtained by:
D t+1 =l(θ max ,t+1-θ min ,t+1) (9)
where l is the leg length of the wearer of the flexible exoskeleton robot; θ max T+1 is the maximum flexion angle of the hip joint at the next moment in gait cycle, θ min T+1 is the minimum flexion angle of the hip joint at the next moment in the gait cycle.
9. The method of deep reinforcement learning-based exoskeleton main boost parameter optimization as claimed in claim 1, wherein the exoskeleton state s at the next time in said step (8) t+1 Comprises a swing phase power-assisted amplitude A at the next moment t+1 The next moment in time the flexion angle θt of the hip joint of the exoskeleton wearer t+1 Gait cycle T at next moment t+1 Swing phase period T of next moment gait period bt+1 Maximum flexion angle theta of hip joint in next moment gait cycle max T+1, minimum flexion angle θ of hip joint at next moment gait cycle min T+1; the exoskeleton state s at the next moment t+1 Obtained by the following steps:
(8-1) the exoskeleton executes the action selected in the step (7), the hip joint buckling angle parameters of the exoskeleton wearer during walking are acquired in real time through the MEMS attitude sensor at the time interval of the walking T of the exoskeleton wearer, and the buckling angle of the hip joint of the exoskeleton wearer at the moment of the end of walking of the exoskeleton wearer is taken as the buckling angle theta of the hip joint of the exoskeleton wearer at the next moment t+1
(8-2) collecting the outer part in real timeThe hip joint buckling angle parameter of a bone wearer during walking is obtained through the steps (1-1) and (1-2), and the peak moment is recorded as t Wave crest The trough moment is recorded as t Trough of wave And recording hip joint buckling angles corresponding to the wave crests and hip joint buckling angles corresponding to the wave troughs;
(8-3) subtracting the last trough moment from the last trough moment occurring before the end of the interval of walking T of the exoskeleton wearer as the next moment in gait cycle T t+1 The method comprises the steps of carrying out a first treatment on the surface of the Meanwhile, the last wave trough moment appearing before the end of the time interval of walking T of the exoskeleton wearer is subtracted by the previous wave crest moment of the wave trough moment to be used as the swing phase period I of the gait period of the next moment and recorded as T bt+1,1 And the swing phase period II of the gait cycle at the next moment is marked as T by subtracting the previous peak moment of the trough moment from the trough moment appearing at the last time bt+1,2 The swing phase period III of the gait cycle at the next moment is marked as T by subtracting the previous peak moment of the trough moment from the trough moment appearing at the third time at the last time bt+1,3 Averaging the three swing phase periods, as shown in formula (10), to obtain the swing phase period of the next gait cycle, and taking the swing phase period as the swing phase period of the next gait cycle:
Figure FDA0004040150970000091
(8-4) taking the hip joint flexion angle corresponding to the last peak moment as the maximum flexion angle θ of the hip joint in the gait cycle at the next moment max T+1, the hip joint buckling angle corresponding to the last trough moment is taken as the minimum buckling angle theta of the hip joint in the gait cycle at the next moment min ,t+1;
(8-5) amplitude A of swing phase assistance at the next time t+1 The amplitude A of the swing phase power assisting is equal to the amplitude A of the swing phase power assisting which is set by people;
the state s of the exoskeleton at the time t in the step (9) t Executing outside of the next moment obtained in the step (8) under the t-1 time round of the current number of nodesThe bone status is the same.
10. The method for optimizing exoskeleton main assistance parameters based on deep reinforcement learning according to claim 1, wherein the parameter training in the step (10) specifically comprises the following steps:
(10-1) calculating a loss of the online evaluation network, the loss being defined as a mean square error form, as shown in formula (11), for updating the online evaluation network parameters:
Figure FDA0004040150970000101
wherein ,L(αQ ) The loss function value of the network is evaluated on line and is used for training and optimizing; q(s) i ,a iQ ) The method is an evaluation value of an online evaluation network, namely a Q value, and the input of the online evaluation network is the state and action of the exoskeleton in the ith state conversion process; y is i Refers to the target of Q value, namely:
Figure FDA0004040150970000102
in the formula ,ri Scalar rewards referring to the ith state transition process; s is(s) i+1 Refers to the next exoskeleton state of the ith state transition procedure; y is a discount factor, Y e [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the After Q'(s) i+1 ,μ'(s i+1μ' )|α Q' ) Is a nesting of two functions, the first being Q'(s) i+1 ,μ'(s i+1μ' )|α Q' ) The Q function generated by the target evaluation network is input as the next exoskeleton state and action of the ith state transition process, the next exoskeleton action of the ith state transition process is generated by the target policy network and is a second function μ'(s) i+1μ' ) The input is the next exoskeleton state of the ith state transition procedure;
(10-2) updating the online policy network parameters as shown in formula (13):
Figure FDA0004040150970000103
wherein ,
Figure FDA0004040150970000104
gradient values of network parameters of the online strategy are referred; />
Figure FDA0004040150970000105
Refers to the online evaluation of the gradient of the Q value of the network to action a, which is defined by μ (s iμ ) Generating an online strategy network; />
Figure FDA0004040150970000106
The gradient of network parameters of the online strategy is referred; in the formula->
Figure FDA0004040150970000107
And->
Figure FDA0004040150970000108
Is a multiplication relationship;
(10-3) updating the target policy network parameters and the target evaluation network parameters as shown in formula (14):
Figure FDA0004040150970000109
/>
wherein ,αμ' Refers to target policy network parameters; alpha μ Refers to online policy network parameters; alpha Q' Refers to target evaluation network parameters; alpha Q The network parameters are evaluated on line; sigma refers to an updated scale parameter, the value of sigma represents that the updating of the target strategy network parameter and the target evaluation network parameter is a slow process, and the value is related to the walking ratio of the exoskeleton wearer and takes a smaller value;
to sum up, finish one step(10) The network parameters in the policy network and the evaluation network can be updated once to promote the parameters of each network in the policy network and the evaluation network to be converged, wherein the network parameters in the policy network comprise the online policy network parameters alpha of the online policy network μ And target policy network parameter alpha of the target policy network μ' The method comprises the steps of carrying out a first treatment on the surface of the The network parameters in the evaluation network include an online evaluation network parameter α of the online evaluation network Q And a target evaluation network parameter alpha of the target evaluation network Q' The method comprises the steps of carrying out a first treatment on the surface of the Finally, parameter convergence of each network in the strategy network and the evaluation network is realized, namely, the walking ratio of the exoskeleton wearer is promoted to approach to the walking ratio of the set healthy elderly, and finally, the walking ratio of the exoskeleton wearer is stabilized at the walking ratio of the set healthy elderly.
CN202011383180.8A 2020-12-01 2020-12-01 Exoskeleton main assistance parameter optimization method based on deep reinforcement learning Active CN112494282B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011383180.8A CN112494282B (en) 2020-12-01 2020-12-01 Exoskeleton main assistance parameter optimization method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011383180.8A CN112494282B (en) 2020-12-01 2020-12-01 Exoskeleton main assistance parameter optimization method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN112494282A CN112494282A (en) 2021-03-16
CN112494282B true CN112494282B (en) 2023-05-02

Family

ID=74969071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011383180.8A Active CN112494282B (en) 2020-12-01 2020-12-01 Exoskeleton main assistance parameter optimization method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN112494282B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115016561B (en) * 2022-06-07 2024-04-19 深圳市英汉思动力科技有限公司 Walking auxiliary device control method and related equipment thereof
CN115256340B (en) * 2022-06-09 2024-06-25 天津理工大学 Double-power-assisted flexible lower limb exoskeleton system and control method
CN116898583B (en) * 2023-06-21 2024-04-26 北京长木谷医疗科技股份有限公司 Deep learning-based intelligent rasping control method and device for orthopedic operation robot
CN116807839B (en) * 2023-08-30 2023-11-28 山东泽普医疗科技有限公司 Exoskeleton rehabilitation robot gait algorithm and control system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108451748A (en) * 2018-05-30 2018-08-28 中国工程物理研究院总体工程研究所 A kind of direct-drive type rehabilitation ectoskeleton and training method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9943459B2 (en) * 2013-11-20 2018-04-17 University Of Maryland, Baltimore Method and apparatus for providing deficit-adjusted adaptive assistance during movement phases of an impaired joint
WO2016180073A1 (en) * 2015-05-11 2016-11-17 The Hong Kong Polytechnic University Exoskeleton ankle robot
CN106821680A (en) * 2017-02-27 2017-06-13 浙江工业大学 A kind of upper limb healing ectoskeleton control method based on lower limb gait
EP3881986A4 (en) * 2018-11-16 2022-01-26 NEC Corporation Load reduction device, load reduction method, and storage medium storing program
CN109549821B (en) * 2018-12-30 2021-07-09 南京航空航天大学 Exoskeleton robot power-assisted control system and method based on myoelectricity and inertial navigation signal fusion
CN111631923A (en) * 2020-06-02 2020-09-08 中国科学技术大学先进技术研究院 Neural network control system of exoskeleton robot based on intention recognition

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108451748A (en) * 2018-05-30 2018-08-28 中国工程物理研究院总体工程研究所 A kind of direct-drive type rehabilitation ectoskeleton and training method

Also Published As

Publication number Publication date
CN112494282A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
CN112494282B (en) Exoskeleton main assistance parameter optimization method based on deep reinforcement learning
Li et al. sEMG-based joint force control for an upper-limb power-assist exoskeleton robot
CN110215648B (en) Exoskeleton coordination gait control method based on human body gait motion coordination characteristic
Liu et al. EMG-based real-time linear-nonlinear cascade regression decoding of shoulder, elbow, and wrist movements in able-bodied persons and stroke survivors
Ronsse et al. Oscillator-based assistance of cyclical movements: model-based and model-free approaches
Mefoued A second order sliding mode control and a neural network to drive a knee joint actuated orthosis
JP5254985B2 (en) Device and method for following the movement of a living organism
Chen et al. Gait-event-based synchronization method for gait rehabilitation robots via a bioinspired adaptive oscillator
CN102698411B (en) Recumbent lower limb rehabilitation robot
JP5569890B2 (en) Stimulation signal generator
Huang et al. Modeling and stiffness-based continuous torque control of lightweight quasi-direct-drive knee exoskeletons for versatile walking assistance
CN110523060B (en) Muscle strength recovery and posture correction auxiliary device
CN110812127A (en) Lower limb exoskeleton control method and device
Li et al. A temporally smoothed MLP regression scheme for continuous knee/ankle angles estimation by using multi-channel sEMG
Lenzi et al. Reducing muscle effort in walking through powered exoskeletons
Ma et al. Design and control of a powered knee orthosis for gait assistance
US10213324B2 (en) Minimum jerk swing control for assistive device
CN111408043B (en) Coordination control method, device, storage medium and system for functional electric stimulation and exoskeleton equipment
Zaway et al. A Robust Fuzzy Fractional Order PID Design Based On Multi-Objective Optimization For Rehabilitation Device Control
Mohamad et al. Minimum-Time and Minimum-Jerk Gait Planning in Joint Space for Assistive Lower Limb Exoskeleton
Tanaka et al. A Walking Assistive Device of the Ankle Joint Motion and the Control Method According to the Emotion Condition
Ren et al. On-line dynamic gait generation model for wearable robot with user’s motion intention
Ma et al. EMG-based Human-in-the-loop Optimization of Ankle Plantar-flexion Assistance with a Soft Exoskeleton
Moreno et al. Introduction to the special section on wearable robots
CN114750152B (en) Volunteer compliance auxiliary control method for variable-rigidity exoskeleton

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant