CN112757275A - Method, system and device for controlling musculoskeletal system based on speed precision balance - Google Patents

Method, system and device for controlling musculoskeletal system based on speed precision balance Download PDF

Info

Publication number
CN112757275A
CN112757275A CN202011610884.4A CN202011610884A CN112757275A CN 112757275 A CN112757275 A CN 112757275A CN 202011610884 A CN202011610884 A CN 202011610884A CN 112757275 A CN112757275 A CN 112757275A
Authority
CN
China
Prior art keywords
time
activation signal
muscle activation
musculoskeletal system
moment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011610884.4A
Other languages
Chinese (zh)
Other versions
CN112757275B (en
Inventor
周俊杰
钟汕林
乔红
吴伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202011610884.4A priority Critical patent/CN112757275B/en
Publication of CN112757275A publication Critical patent/CN112757275A/en
Application granted granted Critical
Publication of CN112757275B publication Critical patent/CN112757275B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/10Programme-controlled manipulators characterised by positioning means for manipulator elements
    • B25J9/1075Programme-controlled manipulators characterised by positioning means for manipulator elements with muscles or tendons
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/1633Programme controls characterised by the control loop compliant, force, torque control, e.g. combined with position control

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Orthopedic Medicine & Surgery (AREA)
  • Rheumatology (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention belongs to the technical field of control, and particularly relates to a musculoskeletal system control method, a musculoskeletal system control system and a musculoskeletal system control device based on speed precision balance, aiming at solving the problem that the existing control method of a musculoskeletal robot similar to a human cannot well control antagonistic muscle cooperative contraction. The invention comprises the following steps: obtaining estimated motion precision of a musculoskeletal system through a Fitz rule, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision, calculating a muscle activation signal vector through a muscle activation signal network, calculating action reward based on the muscle activation signal vector and the supervision item moment, further calculating a loss function, adjusting parameters of the muscle activation signal network based on the loss function, increasing the value of the action reward, and repeatedly iterating to obtain a muscle activation signal sequence required by control; the invention utilizes the structural information of a musculoskeletal system, constructs a general antagonistic muscle cooperative contraction control strategy and ensures the smooth movement.

Description

Method, system and device for controlling musculoskeletal system based on speed precision balance
Technical Field
The invention belongs to the technical field of control, and particularly relates to a method, a system and a device for controlling a musculoskeletal system based on speed precision balance.
Background
The adaptability of living beings allows them the flexibility to adjust and execute behaviors, allowing learned skilled sports to vary according to the environment and task requirements. One of the typical strategies to achieve motion variability is a speed accuracy tradeoff, which reflects the trade-off between rapidity and accuracy of motion. How to implement such a flexible behavior strategy in a human-like musculoskeletal robot, which enables the robot to generate universal adaptability to environment and tasks, is an attractive challenge. On the other hand, for a human-like musculoskeletal robot system, the number of muscles is generally far greater than the number of joints, and redundant muscles not only bring difficulty to exercise learning, but also bring trouble to generation of new exercises. It is also a challenge how to construct a general antagonistic muscle cooperative contraction control strategy using structural information of the musculoskeletal system, especially considering partial muscle damage.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, the problem that the existing human-like musculoskeletal robot control method cannot well perform antagonistic muscle cooperative contraction control, the present invention provides a musculoskeletal system control method based on speed precision balance, the method comprising:
making the training times k equal to 1;
s100, obtaining estimated motion precision W of a musculoskeletal system at the time t through a Fitz rule;
step S200, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Figure BDA0002874547930000011
Step S300, based on the supervision item moment
Figure BDA0002874547930000012
Computing a muscle activation signal vector u through a muscle activation signal networkt
Step S400, based on the muscle activation signal vector utAnd moment of supervision
Figure BDA0002874547930000013
Calculating an action reward RtAnd further calculating a preset loss function L, adjusting parameters of the muscle activation signal network based on the preset loss function L, and enabling the action reward RtAnd increasing the value, and repeating the steps S100-S400 by making K equal to K +1 until K equal to K is the preset maximum training time, so as to obtain a muscle activation signal sequence required by control.
In some preferred embodiments, step S100 includes:
presetting accumulated time T;
step S110, obtaining t through a cortex model1Perceptual evidence of time xi(t1)~N(μi,σ2) Further, the accumulated perception evidence Y is obtainedi(T):
Figure BDA0002874547930000021
Step S120, the perception evidence Y is processedi(T) inputting the basal ganglia model to obtain an output OUT of the basal ganglia modeli
Step S130, outputting OUT based on the basal ganglia modeliPassing through a preset decision threshold-ln Pi(T) obtaining a first effective accumulation time
Figure BDA0002874547930000022
And a second effective accumulation time
Figure BDA0002874547930000023
If T > 0, let T be T-1, repeat steps S110-S130;
wherein, when firstSecond occurrence of the output OUT of the basal ganglia modeli≥-ln PiAt (T), will OUTiThe corresponding accumulation time T is set as the first effective accumulation time
Figure BDA0002874547930000024
Output OUT when the basal ganglia model is presenti<≥-ln PiAt (T), will OUTiThe corresponding accumulated time T is set as the second effective accumulated time
Figure BDA0002874547930000025
Each time generating new second effective accumulation time in the iterative process
Figure BDA0002874547930000026
Covering the last generated second effective accumulation time
Figure BDA0002874547930000027
The-ln Pi(T) is a decision threshold;
step S140, passing the first effective accumulation time
Figure BDA0002874547930000028
And a second effective accumulation time
Figure BDA0002874547930000029
Obtaining a final decision time
Figure BDA00028745479300000210
Step S150, based on the final decision time ToutAnd estimating the motion precision W by the Fitz rule. The accuracy of the subsequent muscle control is adjusted by calculating the appropriate final decision time.
In some preferred embodiments, the striatal inspired speed modulation strategy is:
Figure BDA00028745479300000211
wherein,
Figure BDA00028745479300000212
joint angle, q, representing an end position calculated from the estimated motion accuracysAngle of articulation, t, representing initial positionSIs the starting moment of the movement, VM(lambda, T) is a bell-shaped velocity modulation model, T denotes the time T of the modulation, ToutRepresenting a decision time;
said bell-shaped velocity modulation model VM(λ, t) is:
Figure BDA00028745479300000213
wherein, λ is a parameter of the modulation model, and t represents t time;
the desired angular velocity of the joint is
Figure BDA00028745479300000214
The supervision term moment
Figure BDA0002874547930000031
Comprises the following steps:
Figure BDA0002874547930000032
wherein q istIs the joint angle at the time t,
Figure BDA0002874547930000033
the desired angular velocity of the joint is,
Figure BDA0002874547930000034
angular acceleration of desired angular velocity of joint, M (q)t) Is an inertial matrix of the musculoskeletal system,
Figure BDA0002874547930000035
centripetal CoriolisForce, G (q)t) Is the gravity matrix of the musculoskeletal system;
the inertia matrix M (q) of the musculoskeletal systemt) Comprises the following steps:
Figure BDA0002874547930000036
the centripetal Coriolis force
Figure BDA0002874547930000037
Comprises the following steps:
Figure BDA0002874547930000038
the gravity matrix G (q) of the musculoskeletal systemt) Comprises the following steps:
Figure BDA0002874547930000039
wherein m is1Representing the mass, m, of the first link of the arm2Indicating the mass of the second link of the robot arm, d1Indicating the length of the first link of the robot arm, d2The length of the second connecting rod of the mechanical arm is shown,
Figure BDA00028745479300000310
q1,trepresenting the angle of the first joint of the arm, q2,tIndicating the angle of the second joint of the robotic arm,
Figure BDA00028745479300000311
and
Figure BDA00028745479300000312
is the angular velocity of the first joint and the second joint of the mechanical arm.
In some preferred embodiments, the muscle activation signal vector utThe calculation method comprises the following steps:
Figure BDA00028745479300000313
wherein u ist-1Is the muscle activation signal at time t-1, taut-1For the joint moment at time t-1, the strategy network mu (· | theta)μ) A neural network for solving for muscle activation signals.
In some preferred embodiments, step S400 includes:
step S410, based on the muscle activation signal vector utAnd moment of supervision
Figure BDA00028745479300000314
Calculating an action reward RtLet R betAs large as possible:
Figure BDA00028745479300000315
where, gamma is a discount factor,
Figure BDA00028745479300000316
for the moment generated by the flexors at time t,
Figure BDA00028745479300000317
the force produced by the flexor muscle at time t,
Figure BDA00028745479300000318
the muscle activation signal generated for the flexor muscle at time t,
Figure BDA00028745479300000319
for the moment generated by the extensor muscle at time t,
Figure BDA00028745479300000320
the force produced by the extensor muscle at time t,
Figure BDA0002874547930000041
generating muscle activation signals for the extensors at time t, p being the number of flexors, q being the extensionNumber of muscles, ω1And ω2Is a proportional parameter;
step S420, reward R based on the actiontComputing an evaluation network QμLoss function L of (d):
Figure BDA0002874547930000042
wherein, thetaQTo evaluate the parameters of the network, Δ ut+1Is the muscle activation signal at time t +1, Δ utMuscle activation signal at time t;
step S430, evaluating the network Q based on the loss function LμParameter theta ofQUpdating:
Figure BDA0002874547930000043
wherein eta is1Represents an update step size;
based on said evaluation network QμUpdating policy network μ (· | θ)μ) Parameter theta ofμ
Figure BDA0002874547930000044
Wherein eta is2Which represents the step size of the update,
Figure BDA0002874547930000045
for policy network mu (· | theta)μ) Gradient (2):
Figure BDA0002874547930000046
step S440, if T ≠ ToutThe method from step S200 to step S400 is repeated until T is T +1out
Step S450, if K ≠ K, let K ═ K +1, repeat the method of steps S100-S400 until K ═ K, at which time the muscle activation signalNumber ut(t∈[1,T]) The sequence of muscle activation signals required to accomplish control.
In some preferred embodiments, the output of the basal ganglia model, OUTiComprises the following steps:
Figure BDA0002874547930000047
wherein,
Figure BDA0002874547930000048
for gain factors, i and j are the ordinal numbers of the perceptual evidence.
In some preferred embodiments, the motion precision W estimated by the feitz law is:
Figure BDA0002874547930000049
where a and b are two constant parameters and D is the distance moved by the joint tip.
In another aspect of the invention, a musculoskeletal system control system based on speed accuracy balance is provided, the system comprising an accuracy estimation module, an expected torque calculation module, an activation signal calculation module and a speed accuracy balance module;
making the training times k equal to 1;
the precision estimation module is used for acquiring the estimated motion precision W of the musculoskeletal system at the moment t through the Fitz rule;
the expected moment calculation module is used for calculating the moment of a supervision item through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Figure BDA0002874547930000051
The activation signal calculation module is used for calculating the moment based on the supervision item
Figure BDA0002874547930000052
By muscle stimulationActivity signal network computing muscle activation signal vector ut
The velocity accuracy trade-off module is used for activating the signal vector u based on the muscletAnd moment of supervision
Figure BDA0002874547930000053
Calculating an action reward RtAnd further calculating a preset loss function L, adjusting parameters of the muscle activation signal network based on the preset loss function L, and enabling the action reward RtAnd increasing the value, and repeating the function of the precision estimation module, namely the speed precision balancing module, when K is equal to K +1 until K is equal to K, wherein K is the preset maximum training frequency, so as to obtain a muscle activation signal sequence required by control.
In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-mentioned musculoskeletal system control method based on speed accuracy trade-off.
In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; the processor is suitable for executing various programs; the storage device is suitable for storing a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the aforementioned musculoskeletal system control method based on speed accuracy trade-offs.
The invention has the beneficial effects that:
(1) the muscle-skeletal-system control method based on speed precision balance is characterized in that an antagonistic muscle cooperative contraction strategy is designed by combining the Fitz rule and a speed modulation strategy of a neuron loop in the striatum FSI-SPN, a universal redundant muscle control algorithm is realized, and the adaptability of the muscle-skeletal-system to the simulated movement on the basis of nerves is improved.
(2) The invention relates to a musculoskeletal system control method based on speed precision balance, which is characterized in that a general antagonistic muscle cooperative contraction control strategy is constructed by designing action rewards of an antagonistic muscle cooperative contraction strategy and updating parameters of a strategy network through combining the action rewards with an evaluation network, and is beneficial to the movement learning and control of redundant muscles of a man-like musculoskeletal robot.
(3) The invention relates to a musculoskeletal system control method based on speed precision balance, which constructs a supervised Markov decision process algorithm by introducing a supervision item in a Markov process, divides the control process into two stages of motion planning and motion execution, and takes the two stages as the basis for realizing motion variability, thereby realizing the efficient training and control of a musculoskeletal robot system.
(4) The muscle-skeletal-system control method based on speed precision balance calculates proper exercise execution time by combining an antagonistic muscle cooperative contraction strategy and an active speed precision balance model, further influences the precision of muscle control, and realizes the adaptability of exercise simulated on the basis of nerves on a muscle-skeletal system.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow diagram of a musculoskeletal system control method based on speed accuracy trade-off in accordance with an embodiment of the present invention;
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention provides a musculoskeletal system control method based on speed precision balance.
The invention relates to a musculoskeletal system control method based on speed precision balance, which comprises the following steps:
making the training times k equal to 1; because musculoskeletal robotic systems have redundant numbers of joints, there are several implementations for a single motor task.
S100, obtaining estimated motion precision W of a musculoskeletal system at the time t through a Fitz rule;
step S200, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Figure BDA0002874547930000061
Step S300, based on the supervision item moment
Figure BDA0002874547930000062
Computing a muscle activation signal vector u through a muscle activation signal networkt
Step S400, based on the muscle activation signal vector utAnd moment of supervision
Figure BDA0002874547930000063
Calculating an action reward RtCalculating a loss value through a preset loss function L, and adjusting parameters of a muscle activation signal network based on the loss value corresponding to the preset loss function L to enable an action reward RtAnd increasing the value, and if K is less than K, repeating the steps S100-S400 by making K equal to K +1 until K equal to K is the preset maximum training times, so as to obtain a muscle activation signal sequence required by control.
In order to more clearly describe the musculoskeletal system control method based on speed precision balance, the following describes an embodiment of the present invention in detail with reference to fig. 1.
The invention discloses a musculoskeletal system control method based on speed precision balance, which comprises a step S100-a step S400, wherein the steps are described in detail as follows:
making the training times k equal to 1;
s100, obtaining estimated motion precision W of a musculoskeletal system at the time t through a Fitz rule;
in the present embodiment, step S100 includes:
presetting accumulated time T; the cumulative time T set in the first iteration needs to be the maximum time that can be set for the subsequent iteration to proceed, and preferably T may be selected to be 30 s.
Step S110, obtaining t through a cortex model1Perceptual evidence of time xi(t1)~N(μi,σ2) Further, the accumulated perception evidence Y is obtainedi(T) is shown in equation (1):
Figure BDA0002874547930000071
step S120, the perception evidence Y is processedi(T) inputting the basal ganglia model to obtain an output OUT of the basal ganglia modeli(ii) a The accumulated perception evidence is transmitted into the striatum of the extremely low ganglion model, and is collected in the substantia nigra and the globus pallidus after passing through a direct path and an indirect path to obtain the output OUT of the basal ganglion modeli
Step S130, outputting OUT based on the basal ganglia modeliPassing through a preset decision threshold-ln Pi(T) obtaining a first effective accumulation time
Figure BDA0002874547930000072
And a second effective accumulation time
Figure BDA0002874547930000073
If T > 0, let T be T-1, repeat steps S110-S130;
wherein the output OUT of the basal ganglia model occurs for the first timei≥-ln PiAt (T), will OUTiThe corresponding accumulation time T is set as the first effective accumulation time
Figure BDA0002874547930000074
When the basal nerve appearsOutput OUT of the section modeli<-ln PiAt (T), will OUTiThe corresponding accumulated time T is set as the second effective accumulated time
Figure BDA0002874547930000075
Each time generating new second effective accumulation time in the iterative process
Figure BDA0002874547930000076
Covering the last generated second effective accumulation time
Figure BDA0002874547930000077
The-ln Pi(T) is a decision threshold;
in the present embodiment, the decision threshold value-ln Pi(T) to determine if evidence is sufficient to make a decision, where Pi(T) indicating the accuracy of the decision, e.g. Pi0.8 (T) means that the decision has a probability of being correct of 80%.
In this embodiment, the output OUT of the model of the basal gangliaiAs shown in equation (2):
Figure BDA0002874547930000081
wherein,
Figure BDA0002874547930000082
is a gain factor.
Step S140, passing the first effective accumulation time
Figure BDA0002874547930000083
And a second effective accumulation time
Figure BDA0002874547930000084
Obtaining a final decision time
Figure BDA0002874547930000085
Step S150, based on the final decisionPolicy time ToutAnd estimating the motion precision W by the Fitz rule.
In this embodiment, the motion precision W estimated by the fitz law is shown in formula (3):
Figure BDA0002874547930000086
where a and b are two constant parameters and D is the distance of movement of the joint end
Step S200, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Figure BDA0002874547930000087
In this embodiment, the striatum-inspired speed modulation strategy is shown in equation (4):
Figure BDA0002874547930000088
wherein,
Figure BDA0002874547930000089
joint angle, q, representing an end position calculated from the estimated motion accuracysAngle of articulation, t, representing initial positionSIs the starting moment of the movement, VM(lambda, T) is a bell-shaped velocity modulation model, T denotes the time T of the modulation, ToutRepresenting a decision time;
said bell-shaped velocity modulation model VM(λ, t) is shown in equation (5):
Figure BDA00028745479300000810
wherein, λ is a parameter of the modulation model, and t represents t time;
the desired angular velocity of the joint is
Figure BDA00028745479300000811
The supervision term moment
Figure BDA00028745479300000812
As shown in equation (6):
Figure BDA00028745479300000813
wherein q istIs the joint angle at the time t,
Figure BDA0002874547930000091
the desired angular velocity of the joint is,
Figure BDA0002874547930000092
angular acceleration of desired angular velocity of joint, M (q)t) Is an inertial matrix of the musculoskeletal system,
Figure BDA0002874547930000093
centripetal Coriolis force, G (q)t) Is the gravity matrix of the musculoskeletal system;
the inertia matrix M (q) of the musculoskeletal systemt) As shown in equation (7):
Figure BDA0002874547930000094
the centripetal Coriolis force
Figure BDA0002874547930000095
As shown in equation (8):
Figure BDA0002874547930000096
the gravity matrix G (q) of the musculoskeletal systemt) As shown in formula (9):
Figure BDA0002874547930000097
wherein m is1Representing the mass, m, of the first link of the arm2Indicating the mass of the second link of the robot arm, d1Indicating the length of the first link of the robot arm, d2The length of the second connecting rod of the mechanical arm is shown,
Figure BDA0002874547930000098
q1,trepresenting the angle of the first joint of the arm, q2,tIndicating the angle of the second joint of the robotic arm,
Figure BDA0002874547930000099
and
Figure BDA00028745479300000910
is the angular velocity of the first joint and the second joint of the mechanical arm.
Step S300, based on the supervision item moment
Figure BDA00028745479300000911
Computing a muscle activation signal vector u through a muscle activation signal networkt
In this embodiment, the muscle activation signal vector utThe calculation method is shown as formula (10):
Figure BDA00028745479300000912
wherein u ist-1Is the muscle activation signal at time t-1, taut-1For the joint moment at time t-1, the strategy network mu (· | theta)μ) A neural network for solving for muscle activation signals. The preferred policy network may be that of the classical DDPG method.
Step S400, based on the muscle activation signal vector utAnd moment of supervision
Figure BDA00028745479300000913
Calculating an action reward RtCalculating a loss value through a preset loss function L, and adjusting parameters of a muscle activation signal network based on the loss value corresponding to the preset loss function L to enable an action reward RtAnd increasing the value, and if K is less than K, repeating the steps S100-S400 by making K equal to K +1 until K equal to K is the preset maximum training times, so as to obtain a muscle activation signal sequence required by control.
In this embodiment, step S400 includes:
step S410, based on the muscle activation signal vector utAnd moment of supervision
Figure BDA0002874547930000101
Calculating an action reward RtLet R betAs large as possible, as shown in equation (11):
Figure BDA0002874547930000102
where, gamma is a discount factor,
Figure BDA0002874547930000103
for the moment generated by the flexors at time t,
Figure BDA0002874547930000104
in order to produce the force of the flexors at the moment,
Figure BDA0002874547930000105
the muscle activation signal generated for the flexor muscle at time t,
Figure BDA0002874547930000106
for the moment generated by the extensor muscle at time t,
Figure BDA0002874547930000107
the force produced by the extensor muscle at time t,
Figure BDA0002874547930000108
producing muscle for extensor at time tMeat activation signal, p is the number of flexors, q is the number of extensors, ω1And ω2Is a proportional parameter; the fine control of redundant muscles is realized through the steps.
In this embodiment, the higher the reward for action, the more accurate the output is, and the purpose of the present invention is to bring the resultant moment closer to the supervisory term while minimizing the change in muscle activation signals of the flexors and extensors to form a stable coordinated contraction between the flexors and extensors.
Step S420, reward R based on the actiontComputing an evaluation network QμIs shown in equation (12):
Figure BDA0002874547930000109
wherein, thetaQTo evaluate the parameters of the network, Δ ut+1Is the muscle activation signal at time t-1, Δ utMuscle activation signal at time t; muscle activation signal Δ u heretIncluding flexor muscle activation signals and extensor muscle activation signals; wherein the network Q is evaluatedμRepresenting a state-action value function;
step S430, evaluating the network Q based on the loss function LμParameter theta ofQUpdating is performed as shown in equation (13):
Figure BDA00028745479300001010
wherein eta is1Represents an update step size;
based on said evaluation network QμUpdating policy network μ (· | θ)μ) Parameter theta ofμAs shown in equation (14):
Figure BDA0002874547930000111
wherein eta is2Which represents the step size of the update,
Figure BDA0002874547930000112
for policy network mu (· | theta)μ) The gradient of (d) is shown in equation (15):
Figure BDA0002874547930000113
step S440, if T ≠ ToutThe method from step S200 to step S400 is repeated until T is T +1out
Step S450, if K ≠ K, let K ═ K, and repeat the method of steps S100-S400 until K ═ K, at which time the muscle activation signal ut(t∈[1,T]) The sequence of muscle activation signals required to accomplish control. The calculation process is such that the initial activation signal is assumed to be ut-1Calculating the variation of the model at each moment
Figure BDA0002874547930000114
Figure BDA0002874547930000115
At the next moment, the signal input to the muscle becomes
Figure BDA0002874547930000116
Repeating until T is T; to show the sequence and value differences, ut(t∈[1,T]) Represents a sequence, utRepresenting a single value.
Aiming at a high-redundancy and high-coupling musculoskeletal robot system, on one hand, a biological credible basal ganglia calculable decision model is provided by simulating a cortical-basal ganglia neural loop by using a neural mechanism of speed precision balance of a living being as reference. Meanwhile, an active speed precision balance model is provided by combining Fitts' Law and a speed modulation strategy of a neuron loop in the striatum FSI-SPN, and the adaptability of flexibly adjusting the skilled sports performance according to the environment information and the task related parameters is realized. On the other hand, in order to realize efficient training and control of the musculoskeletal robot system, a supervision item is introduced in a Markov Decision Process (MDP), a supervised MDP algorithm is constructed, and the control process is divided into two stages of motion planning and motion execution, which are used as a basis for realizing motion variability. And in the exercise execution stage, an antagonistic muscle cooperative contraction strategy is designed for exploring the exercise cooperative relationship among antagonistic muscles, so that a universal redundant muscle control algorithm is realized. Finally, the algorithm is combined with an active speed precision balance model, and the adaptability of the motion simulated on the basis of the nerve is realized on a musculoskeletal system. The musculoskeletal system control system based on speed precision balance comprises a precision estimation module, an expected torque calculation module, an activation signal calculation module and a speed precision balance module;
making the training times k equal to 1;
the precision estimation module is used for acquiring the estimated motion precision W of the musculoskeletal system at the moment t through the Fitz rule;
the expected moment calculation module is used for calculating the moment of a supervision item through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Figure BDA0002874547930000121
The activation signal calculation module is used for calculating the moment based on the supervision item
Figure BDA0002874547930000122
Computing a muscle activation signal vector u through a muscle activation signal networkt
The velocity accuracy trade-off module is used for activating the signal vector u based on the muscletAnd moment of supervision
Figure BDA0002874547930000123
Calculating an action reward RtAnd further calculating a preset loss function L, adjusting parameters of the muscle activation signal network based on the preset loss function L, and enabling the action reward RtIncreasing the value, and repeating the function of the precision estimation module, namely the speed precision balance module, when K is equal to K +1 until K is equal to K, and K is pre-determinedAnd (4) setting the maximum training times to obtain a muscle activation signal sequence required by control.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that the musculoskeletal system control system based on speed precision tradeoff provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the above embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
A storage device according to a third embodiment of the invention has stored therein a plurality of programs adapted to be loaded and executed by a processor to implement the method of musculoskeletal system control based on speed accuracy trade-off described above.
A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the aforementioned musculoskeletal system control method based on speed accuracy trade-offs.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A musculoskeletal system control method based on speed accuracy trade-off, the control method comprising:
making the training times k equal to 1;
s100, obtaining estimated motion precision W of a musculoskeletal system at the time t through a Fitz rule;
step S200, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Figure FDA0002874547920000017
Step S300, based on the supervision item moment
Figure FDA0002874547920000016
Computing a muscle activation signal vector u through a muscle activation signal networkt
Step S400, based on the muscle activation signal vector utAnd moment of supervision
Figure FDA0002874547920000018
Calculating an action reward RtCalculating a loss value through a preset loss function L, and adjusting parameters of a muscle activation signal network based on the loss value corresponding to the preset loss function L to enable an action reward RtAnd increasing the value, and if K is less than K, repeating the steps S100-S400 by making K equal to K +1 until K equal to K is the preset maximum training times, so as to obtain a muscle activation signal sequence required by control.
2. The method for controlling a musculoskeletal system based on a speed accuracy tradeoff according to claim 1, wherein the step S100 comprises:
presetting accumulated time T;
step S110, obtaining t through a cortex model1Perceptual evidence of time xi(t1)~N(μi,σ2) Further, the accumulated perception evidence Y is obtainedi(T):
Figure FDA0002874547920000011
Step S120, the perception evidence Y is processedi(T) inputting the basal ganglia model to obtain an output OUT of the basal ganglia modeli
Step S130, outputting OUT based on the basal ganglia modeliPassing through a preset decision threshold-ln Pi(T) obtaining a first effective accumulation time
Figure FDA0002874547920000012
And a second effective accumulation time
Figure FDA0002874547920000013
If T > 0, let T be T-1, repeat steps S110-S130;
wherein the output OUT of the basal ganglia model occurs for the first timei≥-ln PiAt (T), will OUTiThe corresponding accumulation time T is set as the first effective accumulation time
Figure FDA0002874547920000014
Output OUT when the basal ganglia model is presenti<-ln PiAt (T), will OUTiThe corresponding accumulated time T is set as the second effective accumulated time
Figure FDA0002874547920000021
Each time generating new second effective accumulation time in the iterative process
Figure FDA0002874547920000022
Covering the last generated second effective accumulation time
Figure FDA0002874547920000023
The-ln Pi(T) is a decision threshold;
step S140, passing the first effective accumulation time
Figure FDA0002874547920000024
And a second effective accumulation time
Figure FDA0002874547920000025
Obtaining a final decision time
Figure FDA0002874547920000026
Step S150, based on the final decision time ToutAnd estimating the motion precision W by the Fitz rule.
3. The method of musculoskeletal system control based on speed accuracy trade-off of claim 2, wherein the striatal inspired speed modulation strategy is:
Figure FDA0002874547920000027
wherein,
Figure FDA0002874547920000028
joint angle, q, representing an end position calculated from the estimated motion accuracysAngle of articulation, t, representing initial positionSIs the starting moment of the movement, VM(lambda, T) is a bell-shaped velocity modulation model, T denotes the time T of the modulation, ToutRepresenting a decision time;
said bell-shaped velocity modulation model VM(λ, t) is:
Figure FDA0002874547920000029
wherein, λ is a parameter of the modulation model, and t represents t time;
the desired angular velocity of the joint is
Figure FDA00028745479200000210
The supervision term moment
Figure FDA00028745479200000211
Comprises the following steps:
Figure FDA00028745479200000212
wherein q istIs the joint angle at the time t,
Figure FDA00028745479200000213
the desired angular velocity of the joint is,
Figure FDA00028745479200000214
angular acceleration of desired angular velocity of joint, M (q)t) Is an inertial matrix of the musculoskeletal system,
Figure FDA00028745479200000215
centripetal Coriolis force, G (q)t) Is the gravity matrix of the musculoskeletal system;
the inertia matrix M (q) of the musculoskeletal systemt) Comprises the following steps:
Figure FDA00028745479200000216
the centripetal Coriolis force
Figure FDA00028745479200000217
Comprises the following steps:
Figure FDA00028745479200000218
the gravity matrix G (q) of the musculoskeletal systemt) Comprises the following steps:
Figure FDA0002874547920000031
wherein m is1Indicating the first arm of the robotMass of the connecting rod, m2Indicating the mass of the second link of the robot arm, d1Indicating the length of the first link of the robot arm, d2The length of the second connecting rod of the mechanical arm is shown,
Figure FDA0002874547920000032
q1,trepresenting the angle of the first joint of the arm, q2,tIndicating the angle of the second joint of the robotic arm,
Figure FDA0002874547920000033
and
Figure FDA0002874547920000034
is the angular velocity of the first joint and the second joint of the mechanical arm.
4. The method of claim 3, wherein the muscle activation signal vector u is a velocity precision tradeoff based musculoskeletal system control methodtThe calculation method comprises the following steps:
Figure FDA0002874547920000035
wherein u ist-1Is the muscle activation signal at time t-1, taut-1For the joint moment at time t-1, the strategy network mu (· | theta)μ) A neural network for solving for muscle activation signals.
5. The method for controlling a musculoskeletal system based on a speed accuracy tradeoff according to claim 4, wherein the step S400 comprises:
step S410, based on the muscle activation signal vector utAnd moment of supervision
Figure FDA00028745479200000315
Calculating an action reward RtLet R betAs large as possible:
Figure FDA0002874547920000036
where, gamma is a discount factor,
Figure FDA0002874547920000037
for the moment generated by the flexors at time t,
Figure FDA0002874547920000038
in order to produce the force of the flexors at the moment,
Figure FDA0002874547920000039
the muscle activation signal generated for the flexor muscle at time t,
Figure FDA00028745479200000310
for the moment generated by the extensor muscle at time t,
Figure FDA00028745479200000311
the force produced by the extensor muscle at time t,
Figure FDA00028745479200000312
generating muscle activation signals for the extensors at time t, p being the number of flexors, q being the number of extensors, ω1And ω2Is a proportional parameter;
step S420, reward R based on the actiontComputing an evaluation network QμLoss function L of (d):
Figure FDA00028745479200000313
wherein, thetaQTo evaluate the parameters of the network, Δ ut+1Is the muscle activation signal at time t-1, Δ utMuscle activation signal at time t;
step S430, evaluating the network Q based on the loss function LμParameter theta ofQUpdating:
Figure FDA00028745479200000314
wherein eta is1Represents an update step size;
based on said evaluation network QμUpdating policy network μ (· | θ)μ) Parameter theta ofμ
Figure FDA0002874547920000041
Wherein eta is2Which represents the step size of the update,
Figure FDA0002874547920000042
for policy network mu (· | theta)μ) Gradient (2):
Figure FDA0002874547920000043
step S440, if T ≠ ToutThe method from step S200 to step S400 is repeated until T is T +1out
Step S450, if K ≠ K, let K ≠ K +1, and repeat the method from step S100 to step S400 until K ═ K, at which time the muscle activation signal ut(t∈[1,T]) The sequence of muscle activation signals required to accomplish control.
6. The method of claim 2, wherein the output of the basal ganglia model, OUTiComprises the following steps:
Figure FDA0002874547920000044
wherein,
Figure FDA0002874547920000045
for gain factors, i and j are the ordinal numbers of the perceptual evidence.
7. The method for controlling musculoskeletal system based on speed accuracy tradeoff according to claim 2, wherein the estimation of motion accuracy W by the feitz law is:
Figure FDA0002874547920000046
where a and b are two constant parameters and D is the distance moved by the joint tip.
8. A musculoskeletal system control system based on speed accuracy trade-offs, the system comprising: the device comprises an accuracy estimation module, an expected torque calculation module, an activation signal calculation module and a speed accuracy balance module;
making the training times k equal to 1;
the precision estimation module is used for acquiring the estimated motion precision W of the musculoskeletal system at the moment t through the Fitz rule;
the expected moment calculation module is used for calculating the moment of a supervision item through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Figure FDA0002874547920000047
The activation signal calculation module is used for calculating the moment based on the supervision item
Figure FDA0002874547920000048
Computing a muscle activation signal vector u through a muscle activation signal networkt
The velocity accuracy trade-off module is used for activating the signal vector u based on the muscletAnd moment of supervision
Figure FDA0002874547920000049
Calculating an action reward RtAnd further calculating a preset loss function L, adjusting parameters of the muscle activation signal network based on the preset loss function L, and enabling the action reward RtAnd increasing the value, and repeating the function of the precision estimation module, namely the speed precision balancing module, when K is equal to K +1 until K is equal to K, wherein K is the preset maximum training frequency, so as to obtain a muscle activation signal sequence required by control.
9. A storage device having stored therein a plurality of programs, wherein said programs are adapted to be loaded and executed by a processor to implement the method of musculoskeletal system control based on speed accuracy trade-offs of any one of claims 1-7.
10. A processing apparatus comprising a processor adapted to execute programs; and a storage device adapted to store a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the method of musculoskeletal system control based on speed accuracy trade-off of any of claims 1-7.
CN202011610884.4A 2020-12-30 2020-12-30 Method, system and device for controlling musculoskeletal system based on speed precision balance Active CN112757275B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011610884.4A CN112757275B (en) 2020-12-30 2020-12-30 Method, system and device for controlling musculoskeletal system based on speed precision balance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011610884.4A CN112757275B (en) 2020-12-30 2020-12-30 Method, system and device for controlling musculoskeletal system based on speed precision balance

Publications (2)

Publication Number Publication Date
CN112757275A true CN112757275A (en) 2021-05-07
CN112757275B CN112757275B (en) 2022-02-25

Family

ID=75695918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011610884.4A Active CN112757275B (en) 2020-12-30 2020-12-30 Method, system and device for controlling musculoskeletal system based on speed precision balance

Country Status (1)

Country Link
CN (1) CN112757275B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113199460A (en) * 2021-05-24 2021-08-03 中国科学院自动化研究所 Nonlinear musculoskeletal robot control method, system and equipment
CN114918914A (en) * 2022-04-26 2022-08-19 中国科学院自动化研究所 Human body musculoskeletal simulation control system and simulation device
CN115070760A (en) * 2022-06-16 2022-09-20 中国科学院自动化研究所 Method and device for controlling musculoskeletal mechanical arm

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040229198A1 (en) * 2003-05-15 2004-11-18 Cns Vital Signs, Llc Methods and systems for computer-based neurocognitive testing
CN107199569A (en) * 2017-06-22 2017-09-26 华中科技大学 A kind of articulated robot method for planning track distributed based on joint balancing energy
CN108115681A (en) * 2017-11-14 2018-06-05 深圳先进技术研究院 Learning by imitation method, apparatus, robot and the storage medium of robot
CN108724191A (en) * 2018-06-27 2018-11-02 芜湖市越泽机器人科技有限公司 A kind of robot motion's method for controlling trajectory
JP2020031508A (en) * 2018-08-24 2020-02-27 株式会社日立産機システム Control device of ac motor and control method thereof
CN111515929A (en) * 2020-04-15 2020-08-11 深圳航天科技创新研究院 Human motion state estimation method, device, terminal and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040229198A1 (en) * 2003-05-15 2004-11-18 Cns Vital Signs, Llc Methods and systems for computer-based neurocognitive testing
CN107199569A (en) * 2017-06-22 2017-09-26 华中科技大学 A kind of articulated robot method for planning track distributed based on joint balancing energy
CN108115681A (en) * 2017-11-14 2018-06-05 深圳先进技术研究院 Learning by imitation method, apparatus, robot and the storage medium of robot
CN108724191A (en) * 2018-06-27 2018-11-02 芜湖市越泽机器人科技有限公司 A kind of robot motion's method for controlling trajectory
JP2020031508A (en) * 2018-08-24 2020-02-27 株式会社日立産機システム Control device of ac motor and control method thereof
CN111515929A (en) * 2020-04-15 2020-08-11 深圳航天科技创新研究院 Human motion state estimation method, device, terminal and computer readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李俊佑: ""基于时间约束和力反馈的速度—准确率权衡研究"", 《中国硕士学位论文全文数据库 信息科技辑》 *
郭小军: ""速度、准确率及其权衡 ——被试反应状态评价与建模"", 《中国博士学位论文全文数据库 哲学与人文科学辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113199460A (en) * 2021-05-24 2021-08-03 中国科学院自动化研究所 Nonlinear musculoskeletal robot control method, system and equipment
CN114918914A (en) * 2022-04-26 2022-08-19 中国科学院自动化研究所 Human body musculoskeletal simulation control system and simulation device
CN114918914B (en) * 2022-04-26 2024-03-22 中国科学院自动化研究所 Simulation control system and simulation device for human musculature
CN115070760A (en) * 2022-06-16 2022-09-20 中国科学院自动化研究所 Method and device for controlling musculoskeletal mechanical arm

Also Published As

Publication number Publication date
CN112757275B (en) 2022-02-25

Similar Documents

Publication Publication Date Title
CN112757275B (en) Method, system and device for controlling musculoskeletal system based on speed precision balance
CN110909859B (en) Bionic robot fish motion control method and system based on antagonistic structured control
Nguyen-Tuong et al. Using model knowledge for learning inverse dynamics
Kober et al. Reinforcement learning to adjust robot movements to new situations
CN110119844A (en) Introduce robot motion's decision-making technique, the system, device of Feeling control mechanism
CN108115681A (en) Learning by imitation method, apparatus, robot and the storage medium of robot
Higuera et al. Synthesizing neural network controllers with probabilistic model-based reinforcement learning
CN113199460B (en) Nonlinear musculoskeletal robot control method, system and device
Katliar et al. Nonlinear model predictive control of a cable-robot-based motion simulator
CN112405542B (en) Musculoskeletal robot control method and system based on brain inspiring multitask learning
Wu et al. Semi-parametric Gaussian process for robot system identification
Wochner et al. Optimality principles in human point-to-manifold reaching accounting for muscle dynamics
CN105205533A (en) Development automatic machine with brain cognition mechanism and learning method of development automatic machine
CN110516389A (en) Learning method, device, equipment and the storage medium of behaviour control strategy
CN110059439A (en) A kind of spacecraft orbit based on data-driven determines method
JP2023548964A (en) Methods and systems for modeling and controlling partially measurable systems
CN114474078B (en) Friction force compensation method and device for mechanical arm, electronic equipment and storage medium
Polydoros et al. Online multi-target learning of inverse dynamics models for computed-torque control of compliant manipulators
CN114802817A (en) Satellite attitude control method and device based on multi-flywheel array
Bae et al. Curriculum learning for vehicle lateral stability estimations
CN111531543B (en) Robot self-adaptive impedance control method based on biological heuristic neural network
CN110515297B (en) Staged motion control method based on redundant musculoskeletal system
Zhang et al. Trajectory-tracking control of robotic system via proximal policy optimization
Wang et al. Model-free event-triggered optimal control with performance guarantees via goal representation heuristic dynamic programming
CN115421387A (en) Variable impedance control system and control method based on inverse reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant