CN112757275B - Method, system and device for controlling musculoskeletal system based on speed precision balance - Google Patents

Method, system and device for controlling musculoskeletal system based on speed precision balance Download PDF

Info

Publication number
CN112757275B
CN112757275B CN202011610884.4A CN202011610884A CN112757275B CN 112757275 B CN112757275 B CN 112757275B CN 202011610884 A CN202011610884 A CN 202011610884A CN 112757275 B CN112757275 B CN 112757275B
Authority
CN
China
Prior art keywords
time
activation signal
muscle activation
moment
musculoskeletal system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011610884.4A
Other languages
Chinese (zh)
Other versions
CN112757275A (en
Inventor
周俊杰
钟汕林
乔红
吴伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202011610884.4A priority Critical patent/CN112757275B/en
Publication of CN112757275A publication Critical patent/CN112757275A/en
Application granted granted Critical
Publication of CN112757275B publication Critical patent/CN112757275B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/10Programme-controlled manipulators characterised by positioning means for manipulator elements
    • B25J9/1075Programme-controlled manipulators characterised by positioning means for manipulator elements with muscles or tendons
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/1633Programme controls characterised by the control loop compliant, force, torque control, e.g. combined with position control

Abstract

The invention belongs to the technical field of control, and particularly relates to a musculoskeletal system control method, a musculoskeletal system control system and a musculoskeletal system control device based on speed precision balance, aiming at solving the problem that the existing control method of a musculoskeletal robot similar to a human cannot well control antagonistic muscle cooperative contraction. The invention comprises the following steps: obtaining estimated motion precision of a musculoskeletal system through a Fitz rule, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision, calculating a muscle activation signal vector through a muscle activation signal network, calculating action reward based on the muscle activation signal vector and the supervision item moment, further calculating a loss function, adjusting parameters of the muscle activation signal network based on the loss function, increasing the value of the action reward, and repeatedly iterating to obtain a muscle activation signal sequence required by control; the invention utilizes the structural information of a musculoskeletal system, constructs a general antagonistic muscle cooperative contraction control strategy and ensures the smooth movement.

Description

Method, system and device for controlling musculoskeletal system based on speed precision balance
Technical Field
The invention belongs to the technical field of control, and particularly relates to a method, a system and a device for controlling a musculoskeletal system based on speed precision balance.
Background
The adaptability of living beings allows them the flexibility to adjust and execute behaviors, allowing learned skilled sports to vary according to the environment and task requirements. One of the typical strategies to achieve motion variability is a speed accuracy tradeoff, which reflects the trade-off between rapidity and accuracy of motion. How to implement such a flexible behavior strategy in a human-like musculoskeletal robot, which enables the robot to generate universal adaptability to environment and tasks, is an attractive challenge. On the other hand, for a human-like musculoskeletal robot system, the number of muscles is generally far greater than the number of joints, and redundant muscles not only bring difficulty to exercise learning, but also bring trouble to generation of new exercises. It is also a challenge how to construct a general antagonistic muscle cooperative contraction control strategy using structural information of the musculoskeletal system, especially considering partial muscle damage.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, the problem that the existing human-like musculoskeletal robot control method cannot well perform antagonistic muscle cooperative contraction control, the present invention provides a musculoskeletal system control method based on speed precision balance, the method comprising:
making the training times k equal to 1;
s100, obtaining estimated motion precision W of a musculoskeletal system at the time t through a Fitz rule;
step S200, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Figure GDA0003471163200000011
Step S300, based on the supervision item moment
Figure GDA0003471163200000012
Computing a muscle activation signal vector u through a muscle activation signal networkt
Step S400, based on the muscle activation signal vector utAnd moment of supervision
Figure GDA0003471163200000013
Calculating an action reward RtAnd then calculates a predetermined loss function L (theta)Q) Based on said predetermined loss function L (θ)Q) Adjusting parameters of a muscle activation signal network to cause an action reward RtAnd increasing the value, and repeating the steps S100-S400 by making K equal to K +1 until K equal to K is the preset maximum training time, so as to obtain a muscle activation signal sequence required by control.
In some preferred embodiments, step S100 includes:
presetting accumulated time T;
step S110, obtaining t through a cortex model1Perceptual evidence of time xi(t1)~N(μi,σ2) Further, the accumulated perception evidence Y is obtainedi(T):
Figure GDA0003471163200000021
Step S120, the perception evidence Y is processedi(T) inputting the basal ganglia model to obtain an output OUT of the basal ganglia modeli
Step S130, outputting OUT based on the basal ganglia modeliPassing through a preset decision threshold-ln Pi(T) obtaining a first effective accumulation time
Figure GDA0003471163200000022
And a second effective accumulation time
Figure GDA0003471163200000023
If T > 0, let T be T-1, repeat steps S110-S130;
wherein the output OUT of the basal ganglia model occurs for the first timei≥-ln PiAt (T), will OUTiThe corresponding accumulation time T is set as the first effective accumulation time
Figure GDA0003471163200000024
Output OUT when the basal ganglia model is presenti<-ln PiAt (T), will OUTiThe corresponding accumulated time T is set as the second effective accumulated time
Figure GDA0003471163200000025
Each time generating new second effective accumulation time in the iterative process
Figure GDA0003471163200000026
Covering the last generated second effective accumulation time
Figure GDA0003471163200000027
The-ln Pi(T) is a decision threshold;
step S140, passing the first effective accumulation time
Figure GDA0003471163200000028
And a second effective accumulation time
Figure GDA0003471163200000029
Obtaining a final decision time
Figure GDA00034711632000000210
Step S150, based on the final decision time ToutAnd estimating the motion precision W by the Fitz rule. The accuracy of the subsequent muscle control is adjusted by calculating the appropriate final decision time.
In some preferred embodiments, the striatal inspired speed modulation strategy is:
Figure GDA00034711632000000211
wherein the content of the first and second substances,
Figure GDA00034711632000000212
joint angle, q, representing an end position calculated from the estimated motion accuracysAngle of articulation, t, representing initial positionSIs the starting moment of the movement, VM(lambda, T) is a bell-shaped velocity modulation model, T denotes the time T of the modulation, ToutRepresenting a decision time;
said bell-shaped velocity modulation model VM(λ, t) is:
Figure GDA00034711632000000213
wherein, λ is a parameter of the modulation model, and t represents t time;
the desired angular velocity of the joint is
Figure GDA0003471163200000031
The supervision term moment
Figure GDA0003471163200000032
Comprises the following steps:
Figure GDA0003471163200000033
wherein q istIs the joint angle at the time t,
Figure GDA0003471163200000034
the desired angular velocity of the joint is,
Figure GDA0003471163200000035
angular acceleration of desired angular velocity of joint, M (q)t) Is an inertial matrix of the musculoskeletal system,
Figure GDA0003471163200000036
centripetal Coriolis force, G (q)t) Is the gravity matrix of the musculoskeletal system;
the inertia matrix M (q) of the musculoskeletal systemt) Comprises the following steps:
Figure GDA0003471163200000037
the centripetal Coriolis force
Figure GDA0003471163200000038
Comprises the following steps:
Figure GDA0003471163200000039
the gravity matrix G (q) of the musculoskeletal systemt) Comprises the following steps:
Figure GDA00034711632000000310
wherein m is1Representing the mass, m, of the first link of the arm2Indicating the mass of the second link of the robot arm, d1Indicating the length of the first link of the robot arm, d2The length of the second connecting rod of the mechanical arm is shown,
Figure GDA00034711632000000311
q1,trepresenting the angle of the first joint of the arm, q2,tIndicating the angle of the second joint of the robotic arm,
Figure GDA00034711632000000312
and
Figure GDA00034711632000000313
is the angular velocity of the first joint and the second joint of the mechanical arm.
In some preferred embodiments, the muscle activation signal vector utThe calculation method comprises the following steps:
Figure GDA00034711632000000314
wherein u ist-1Is the muscle activation signal at time t-1, taut-1For the joint moment at time t-1, the strategy network mu (· | theta)μ) A neural network for solving for muscle activation signals.
In some preferred embodiments, step S400 includes:
step S410, based on the muscle activation signal vector utAnd moment of supervision
Figure GDA00034711632000000315
Calculating an action reward RtLet R betAs large as possible:
Figure GDA00034711632000000316
where, gamma is a discount factor,
Figure GDA00034711632000000317
for the moment generated by the flexors at time t,
Figure GDA00034711632000000318
the force produced by the flexor muscle at time t,
Figure GDA0003471163200000041
the muscle activation signal generated for the flexor muscle at time t,
Figure GDA0003471163200000042
for the moment generated by the extensor muscle at time t,
Figure GDA0003471163200000043
the force produced by the extensor muscle at time t,
Figure GDA0003471163200000044
generating muscle activation signals for the extensors at time t, p being the number of flexors, q being the number of extensors, ω1And ω2Is a proportional parameter;
step S420, reward R based on the actiontComputing an evaluation network QμIs the loss function L (theta)Q):
Figure GDA0003471163200000045
Wherein, thetaQTo evaluate the parameters of the network, Δ ut+1Is the muscle activation signal at time t +1, Δ utMuscle activation signal at time t;
step S430, based on the loss function L (theta)Q) For the parameter theta of the evaluation network Q muQUpdating:
Figure GDA0003471163200000046
wherein eta is1Represents an update step size;
based on said evaluation network QμUpdating policy network μ (· | θ)μ) Parameter theta ofμ
Figure GDA0003471163200000047
Wherein eta is2Which represents the step size of the update,
Figure GDA0003471163200000048
for policy network mu (· | theta)μ) Gradient (2):
Figure GDA0003471163200000049
step S440, if T ≠ ToutThe method from step S200 to step S400 is repeated until T is T +1out
Step S450, if K ≠ K, let K ≠ K +1, and repeat the method from step S100 to step S400 until K ≠ K, where the muscle activation signal ut (T ∈ [1, T ]) is a muscle activation signal sequence required for completing the control.
In some preferred embodiments, the output of the basal ganglia model, OUTiComprises the following steps:
Figure GDA00034711632000000410
wherein the content of the first and second substances,
Figure GDA00034711632000000411
for gain factors, i and j are the ordinal numbers of the perceptual evidence.
In some preferred embodiments, the motion precision W estimated by the feitz law is:
Figure GDA00034711632000000412
where a and b are two constant parameters and D is the distance moved by the joint tip.
In another aspect of the invention, a musculoskeletal system control system based on speed accuracy balance is provided, the system comprising an accuracy estimation module, an expected torque calculation module, an activation signal calculation module and a speed accuracy balance module;
making the training times k equal to 1;
the precision estimation module is used for acquiring the estimated motion precision W of the musculoskeletal system at the moment t through the Fitz rule;
the expected moment calculation module is used for calculating the moment of a supervision item through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Figure GDA0003471163200000051
The activation signal calculation module is used for calculating the moment based on the supervision item
Figure GDA0003471163200000052
Computing a muscle activation signal vector u through a muscle activation signal networkt
The velocity accuracy trade-off module is used for activating the signal vector u based on the muscletAnd moment of supervision
Figure GDA0003471163200000053
Calculating an action reward RtAnd then calculates a predetermined loss function L (theta)Q) Based on said predetermined loss function L (θ)Q) Adjusting parameters of a muscle activation signal network to cause an action reward RtAnd increasing the value, and repeating the function of the precision estimation module, namely the speed precision balancing module, when K is equal to K +1 until K is equal to K, wherein K is the preset maximum training frequency, so as to obtain a muscle activation signal sequence required by control.
In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-mentioned musculoskeletal system control method based on speed accuracy trade-off.
In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; the processor is suitable for executing various programs; the storage device is suitable for storing a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the aforementioned musculoskeletal system control method based on speed accuracy trade-offs.
The invention has the beneficial effects that:
(1) the muscle-skeletal-system control method based on speed precision balance is characterized in that an antagonistic muscle cooperative contraction strategy is designed by combining the Fitz rule and a speed modulation strategy of a neuron loop in the striatum FSI-SPN, a universal redundant muscle control algorithm is realized, and the adaptability of the muscle-skeletal-system to the simulated movement on the basis of nerves is improved.
(2) The invention relates to a musculoskeletal system control method based on speed precision balance, which is characterized in that a general antagonistic muscle cooperative contraction control strategy is constructed by designing action rewards of an antagonistic muscle cooperative contraction strategy and updating parameters of a strategy network through combining the action rewards with an evaluation network, and is beneficial to the movement learning and control of redundant muscles of a man-like musculoskeletal robot.
(3) The invention relates to a musculoskeletal system control method based on speed precision balance, which constructs a supervised Markov decision process algorithm by introducing a supervision item in a Markov process, divides the control process into two stages of motion planning and motion execution, and takes the two stages as the basis for realizing motion variability, thereby realizing the efficient training and control of a musculoskeletal robot system.
(4) The muscle-skeletal-system control method based on speed precision balance calculates proper exercise execution time by combining an antagonistic muscle cooperative contraction strategy and an active speed precision balance model, further influences the precision of muscle control, and realizes the adaptability of exercise simulated on the basis of nerves on a muscle-skeletal system.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow diagram of a musculoskeletal system control method based on speed accuracy trade-off in accordance with an embodiment of the present invention;
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention provides a musculoskeletal system control method based on speed precision balance.
The invention relates to a musculoskeletal system control method based on speed precision balance, which comprises the following steps:
making the training times k equal to 1; because musculoskeletal robotic systems have redundant numbers of joints, there are several implementations for a single motor task.
S100, obtaining estimated motion precision W of a musculoskeletal system at the time t through a Fitz rule;
step S200, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Figure GDA0003471163200000061
Step S300, based on the supervision item moment
Figure GDA0003471163200000062
Computing muscle activation signals over a muscle activation signal networkVector ut
Step S400, based on the muscle activation signal vector utAnd moment of supervision
Figure GDA0003471163200000063
Calculating an action reward RtAnd passes through a predetermined loss function L (theta)Q) Calculating a loss value based on the predetermined loss function L (theta)Q) Adjusting parameters of the muscle activation signal network to cause an action reward R in response to the loss valuetAnd increasing the value, and if K is less than K, repeating the steps S100-S400 by making K equal to K +1 until K equal to K is the preset maximum training times, so as to obtain a muscle activation signal sequence required by control.
In order to more clearly describe the musculoskeletal system control method based on speed precision balance, the following describes an embodiment of the present invention in detail with reference to fig. 1.
The invention discloses a musculoskeletal system control method based on speed precision balance, which comprises a step S100-a step S400, wherein the steps are described in detail as follows:
making the training times k equal to 1;
s100, obtaining estimated motion precision W of a musculoskeletal system at the time t through a Fitz rule;
in the present embodiment, step S100 includes:
presetting accumulated time T; the cumulative time T set in the first iteration needs to be the maximum time that can be set for the subsequent iteration to proceed, and preferably T may be selected to be 30 s.
Step S110, obtaining t through a cortex model1Perceptual evidence of time xi(t1)~N(μi,σ2) Further, the accumulated perception evidence Y is obtainedi(T) is shown in equation (1):
Figure GDA0003471163200000071
step S120, the perception evidence Y is processedi(T) inputting the model of basal ganglia to obtainOutput of basal ganglia model OUTi(ii) a The accumulated perception evidence is transmitted into the striatum of the extremely low ganglion model, and is collected in the substantia nigra and the globus pallidus after passing through a direct path and an indirect path to obtain the output OUT of the basal ganglion modeli
Step S130, outputting OUT based on the basal ganglia modeliPassing through a preset decision threshold-ln Pi(T) obtaining a first effective accumulation time
Figure GDA0003471163200000072
And a second effective accumulation time
Figure GDA0003471163200000073
If T > 0, let T be T-1, repeat steps S110-S130;
wherein the output OUT of the basal ganglia model occurs for the first timei≥-ln PiAt (T), will OUTiThe corresponding accumulation time T is set as the first effective accumulation time
Figure GDA0003471163200000074
Output OUT when the basal ganglia model is presenti<-ln PiAt (T), will OUTiThe corresponding accumulated time T is set as the second effective accumulated time
Figure GDA0003471163200000075
Each time generating new second effective accumulation time in the iterative process
Figure GDA0003471163200000076
Covering the last generated second effective accumulation time
Figure GDA0003471163200000077
The-ln Pi(T) is a decision threshold;
in the present embodiment, the decision threshold value-ln Pi(T) to determine if evidence is sufficient to make a decision, where Pi(T) indicating the accuracy of the decision, e.g. Pi(T) 0.8 is the decisionThe probability of correctness is 80%.
In this embodiment, the output OUT of the model of the basal gangliaiAs shown in equation (2):
Figure GDA0003471163200000081
wherein the content of the first and second substances,
Figure GDA0003471163200000082
is a gain factor.
Step S140, passing the first effective accumulation time
Figure GDA0003471163200000083
And a second effective accumulation time
Figure GDA0003471163200000084
Obtaining a final decision time
Figure GDA0003471163200000085
Step S150, based on the final decision time ToutAnd estimating the motion precision W by the Fitz rule.
In this embodiment, the motion precision W estimated by the fitz law is shown in formula (3):
Figure GDA0003471163200000086
where a and b are two constant parameters and D is the distance of movement of the joint end
Step S200, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Figure GDA0003471163200000087
In this embodiment, the striatum-inspired speed modulation strategy is shown in equation (4):
Figure GDA0003471163200000088
wherein the content of the first and second substances,
Figure GDA0003471163200000089
joint angle, q, representing an end position calculated from the estimated motion accuracysAngle of articulation, t, representing initial positionSIs the starting moment of the movement, VM(lambda, T) is a bell-shaped velocity modulation model, T denotes the time T of the modulation, ToutRepresenting a decision time;
said bell-shaped velocity modulation model VM(λ, t) is shown in equation (5):
Figure GDA00034711632000000810
wherein, λ is a parameter of the modulation model, and t represents t time;
the desired angular velocity of the joint is
Figure GDA00034711632000000811
The supervision term moment
Figure GDA00034711632000000812
As shown in equation (6):
Figure GDA00034711632000000813
wherein q istIs the joint angle at the time t,
Figure GDA0003471163200000091
the desired angular velocity of the joint is,
Figure GDA0003471163200000092
angular acceleration of desired angular velocity of joint, M (q)t) Is a muscleThe inertial matrix of the skeletal system is,
Figure GDA0003471163200000093
centripetal Coriolis force, G (q)t) Is the gravity matrix of the musculoskeletal system;
the inertia matrix M (q) of the musculoskeletal systemt) As shown in equation (7):
Figure GDA0003471163200000094
the centripetal Coriolis force
Figure GDA0003471163200000095
As shown in equation (8):
Figure GDA0003471163200000096
the gravity matrix G (q) of the musculoskeletal systemt) As shown in formula (9):
Figure GDA0003471163200000097
wherein m is1Representing the mass, m, of the first link of the arm2Indicating the mass of the second link of the robot arm, d1Indicating the length of the first link of the robot arm, d2The length of the second connecting rod of the mechanical arm is shown,
Figure GDA0003471163200000098
q1,trepresenting the angle of the first joint of the arm, q2,tIndicating the angle of the second joint of the robotic arm,
Figure GDA0003471163200000099
and
Figure GDA00034711632000000910
is the first closing of the mechanical armThe angular velocity of the joint and the second joint.
Step S300, based on the supervision item moment
Figure GDA00034711632000000911
Computing a muscle activation signal vector u through a muscle activation signal networkt
In this embodiment, the muscle activation signal vector utThe calculation method is shown as formula (10):
Figure GDA00034711632000000912
wherein u ist-1Is the muscle activation signal at time t-1, taut-1For the joint moment at time t-1, the strategy network mu (· | theta)μ) A neural network for solving for muscle activation signals. The preferred policy network may be that of the classical DDPG method.
Step S400, based on the muscle activation signal vector ut and the supervision item moment
Figure GDA00034711632000000913
Calculating an action reward RtAnd passes through a predetermined loss function L (theta)Q) Calculating a loss value based on the predetermined loss function L (theta)Q) Adjusting parameters of the muscle activation signal network to cause an action reward R in response to the loss valuetAnd increasing the value, and if K is less than K, repeating the steps S100-S400 by making K equal to K +1 until K equal to K is the preset maximum training times, so as to obtain a muscle activation signal sequence required by control.
In this embodiment, step S400 includes:
step S410, based on the muscle activation signal vector utAnd moment of supervision
Figure GDA0003471163200000101
Calculating an action reward RtLet R betAs large as possible, as shown in equation (11):
Figure GDA0003471163200000102
where, gamma is a discount factor,
Figure GDA0003471163200000103
for the moment generated by the flexors at time t,
Figure GDA0003471163200000104
in order to produce the force of the flexors at the moment,
Figure GDA0003471163200000105
the muscle activation signal generated for the flexor muscle at time t,
Figure GDA0003471163200000106
for the moment generated by the extensor muscle at time t,
Figure GDA0003471163200000107
the force produced by the extensor muscle at time t,
Figure GDA0003471163200000108
generating muscle activation signals for the extensors at time t, p being the number of flexors, q being the number of extensors, ω1And ω2Is a proportional parameter; the fine control of redundant muscles is realized through the steps.
In this embodiment, the higher the reward for action, the more accurate the output is, and the purpose of the present invention is to bring the resultant moment closer to the supervisory term while minimizing the change in muscle activation signals of the flexors and extensors to form a stable coordinated contraction between the flexors and extensors.
Step S420, reward R based on the actiontComputing an evaluation network QμIs the loss function L (theta)Q) As shown in equation (12):
Figure GDA0003471163200000109
wherein, thetaQTo evaluate the parameters of the network, Δ ut+1Is the muscle activation signal at time t +1, Δ utMuscle activation signal at time t; muscle activation signal Δ u heretIncluding flexor muscle activation signals and extensor muscle activation signals; wherein the network Q is evaluatedμRepresenting a state-action value function;
step S430, based on the loss function L (theta)Q) For the parameter theta of the evaluation network Q muQUpdating is performed as shown in equation (13):
Figure GDA00034711632000001010
wherein eta is1Represents an update step size;
based on said evaluation network QμUpdating policy network μ (· | θ)μ) Parameter theta ofμAs shown in equation (14):
Figure GDA0003471163200000111
wherein eta is2Which represents the step size of the update,
Figure GDA0003471163200000112
for policy network mu (· | theta)μ) The gradient of (d) is shown in equation (15):
Figure GDA0003471163200000113
step S440, if T ≠ ToutThe method from step S200 to step S400 is repeated until T is T +1out
Step S450, if K ≠ K, let K ≠ K +1, and repeat the method from step S100 to step S400 until K ═ K, at which time the muscle activation signal ut(t∈[1,T]) The sequence of muscle activation signals required to accomplish control. The calculation process is such that the initial activation signal is assumed to be ut-1Calculating the variation of the model at each moment
Figure GDA0003471163200000114
Figure GDA0003471163200000115
At the next moment, the signal input to the muscle becomes
Figure GDA0003471163200000116
Repeating until T is T; to show the sequence and value differences, ut(t∈[1,T]) Represents a sequence, utRepresenting a single value.
Aiming at a high-redundancy and high-coupling musculoskeletal robot system, on one hand, a biological credible basal ganglia calculable decision model is provided by simulating a cortical-basal ganglia neural loop by using a neural mechanism of speed precision balance of a living being as reference. Meanwhile, an active speed precision balance model is provided by combining Fitts' Law and a speed modulation strategy of a neuron loop in the striatum FSI-SPN, and the adaptability of flexibly adjusting the skilled sports performance according to the environment information and the task related parameters is realized. On the other hand, in order to realize efficient training and control of the musculoskeletal robot system, a supervision item is introduced in a Markov Decision Process (MDP), a supervised MDP algorithm is constructed, and the control process is divided into two stages of motion planning and motion execution, which are used as a basis for realizing motion variability. And in the exercise execution stage, an antagonistic muscle cooperative contraction strategy is designed for exploring the exercise cooperative relationship among antagonistic muscles, so that a universal redundant muscle control algorithm is realized. Finally, the algorithm is combined with an active speed precision balance model, and the adaptability of the motion simulated on the basis of the nerve is realized on a musculoskeletal system. The musculoskeletal system control system based on speed precision balance comprises a precision estimation module, an expected torque calculation module, an activation signal calculation module and a speed precision balance module;
making the training times k equal to 1;
the precision estimation module is used for acquiring the estimated motion precision W of the musculoskeletal system at the moment t through the Fitz rule;
the expected moment calculation module is used for calculating the moment of a supervision item through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Figure GDA0003471163200000121
The activation signal calculation module is used for calculating the moment based on the supervision item
Figure GDA0003471163200000122
Computing a muscle activation signal vector u through a muscle activation signal networkt
The velocity accuracy trade-off module is used for activating the signal vector u based on the muscletAnd moment of supervision
Figure GDA0003471163200000123
Calculating an action reward RtAnd then calculates a predetermined loss function L (theta)Q) Based on said predetermined loss function L (θ)Q) Adjusting parameters of a muscle activation signal network to cause an action reward RtAnd increasing the value, and repeating the function of the precision estimation module, namely the speed precision balancing module, when K is equal to K +1 until K is equal to K, wherein K is the preset maximum training frequency, so as to obtain a muscle activation signal sequence required by control.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that the musculoskeletal system control system based on speed precision tradeoff provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the above embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
A storage device according to a third embodiment of the invention has stored therein a plurality of programs adapted to be loaded and executed by a processor to implement the method of musculoskeletal system control based on speed accuracy trade-off described above.
A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the aforementioned musculoskeletal system control method based on speed accuracy trade-offs.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A musculoskeletal system control method based on speed accuracy trade-off, the control method comprising:
making the training times k equal to 1;
s100, obtaining estimated motion precision W of a musculoskeletal system at the time t through a Fitz rule;
step S200, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Figure FDA0003471163190000011
Step S300, based on the supervision item moment
Figure FDA0003471163190000012
Computing a muscle activation signal vector u through a muscle activation signal networkt
Step S400, based on the muscle activation signal vector utAnd moment of supervision
Figure FDA0003471163190000013
Calculating an action reward RtAnd passes through a predetermined loss function L (theta)Q) Calculating a loss value based on the predetermined loss function L (theta)Q) Adjusting parameters of the muscle activation signal network to cause an action reward R in response to the loss valuetAnd increasing the value, and if K is less than K, repeating the steps S100-S400 by making K equal to K +1 until K equal to K is the preset maximum training times, so as to obtain a muscle activation signal sequence required by control.
2. The method for controlling a musculoskeletal system based on a speed accuracy tradeoff according to claim 1, wherein the step S100 comprises:
presetting accumulated time T;
step S110, obtaining t through a cortex model1Perceptual evidence of time xi(t1)~N(μi,σ2) Further, the accumulated perception evidence Y is obtainedi(T):
Figure FDA0003471163190000014
Step S120, the perception evidence Y is processedi(T) inputting the basal ganglia model to obtain an output OUT of the basal ganglia modeli
Step S130, outputting OUT based on the basal ganglia modeliPassing through a preset decision threshold-ln Pi(T) obtaining a first effective accumulation time
Figure FDA0003471163190000015
And a second effective accumulation time
Figure FDA0003471163190000016
If T > 0, let T be T-1, repeat steps S110-S130;
wherein the output OUT of the basal ganglia model occurs for the first timei≥-ln PiAt (T), will OUTiThe corresponding accumulation time T is set as the first effective accumulation time
Figure FDA0003471163190000017
Output OUT when the basal ganglia model is presenti<-ln PiAt (T), will OUTiThe corresponding accumulated time T is set as the second effective accumulated time
Figure FDA0003471163190000021
Each time generating new second effective accumulation time in the iterative process
Figure FDA0003471163190000022
Covering the last generated second effective accumulation time
Figure FDA0003471163190000023
The-ln Pi(T) is a decision threshold;
step S140, passing the first effective accumulation time
Figure FDA0003471163190000024
And a second effective accumulation time
Figure FDA0003471163190000025
Obtaining a final decision time
Figure FDA0003471163190000026
Step S150, based on the final decision time ToutAnd estimating the motion precision W by the Fitz rule.
3. The method of musculoskeletal system control based on speed accuracy trade-off of claim 2, wherein the striatal inspired speed modulation strategy is:
Figure FDA0003471163190000027
wherein the content of the first and second substances,
Figure FDA0003471163190000028
joint angle, q, representing an end position calculated from the estimated motion accuracysAngle of articulation, t, representing initial positionSIs the starting moment of the movement, VM(lambda, T) is a bell-shaped velocity modulation model, T denotes the time T of the modulation, ToutRepresenting a decision time;
said bell-shaped velocity modulation model VM(λ, t) is:
Figure FDA0003471163190000029
wherein, λ is a parameter of the modulation model, and t represents t time;
the desired angular velocity of the joint is
Figure FDA00034711631900000210
The supervision term moment
Figure FDA00034711631900000211
Comprises the following steps:
Figure FDA00034711631900000212
wherein q istIs the joint angle at the time t,
Figure FDA00034711631900000213
the desired angular velocity of the joint is,
Figure FDA00034711631900000214
angular acceleration of desired angular velocity of joint, M (q)t) Is a muscleThe inertial matrix of the skeletal system is,
Figure FDA00034711631900000215
centripetal Coriolis force, G (q)t) Is the gravity matrix of the musculoskeletal system;
the inertia matrix M (q) of the musculoskeletal systemt) Comprises the following steps:
Figure FDA00034711631900000216
the centripetal Coriolis force
Figure FDA00034711631900000217
Comprises the following steps:
Figure FDA00034711631900000218
the gravity matrix G (q) of the musculoskeletal systemt) Comprises the following steps:
Figure FDA0003471163190000031
wherein m is1Representing the mass, m, of the first link of the arm2Indicating the mass of the second link of the robot arm, d1Indicating the length of the first link of the robot arm, d2The length of the second connecting rod of the mechanical arm is shown,
Figure FDA0003471163190000032
q1,trepresenting the angle of the first joint of the arm, q2,tIndicating the angle of the second joint of the robotic arm,
Figure FDA0003471163190000033
and
Figure FDA0003471163190000034
is a mechanical armAngular velocity of the first joint and the second joint.
4. The method of claim 3, wherein the muscle activation signal vector u is a velocity precision tradeoff based musculoskeletal system control methodtThe calculation method comprises the following steps:
Figure FDA0003471163190000035
wherein u ist-1Is the muscle activation signal at time t-1, taut-1For the joint moment at time t-1, the strategy network mu (· | theta)μ) A neural network for solving for muscle activation signals.
5. The method for controlling a musculoskeletal system based on a speed accuracy tradeoff according to claim 4, wherein the step S400 comprises:
step S410, based on the muscle activation signal vector utAnd moment of supervision
Figure FDA0003471163190000036
Calculating an action reward RtLet R betAs large as possible:
Figure FDA0003471163190000037
where, gamma is a discount factor,
Figure FDA0003471163190000038
for the moment generated by the flexors at time t,
Figure FDA0003471163190000039
in order to produce the force of the flexors at the moment,
Figure FDA00034711631900000310
is at t timeThe muscle activation signal generated by the incisor flexors,
Figure FDA00034711631900000311
for the moment generated by the extensor muscle at time t,
Figure FDA00034711631900000312
the force produced by the extensor muscle at time t,
Figure FDA00034711631900000313
generating muscle activation signals for the extensors at time t, p being the number of flexors, q being the number of extensors, ω1And ω2Is a proportional parameter;
step S420, reward R based on the actiontComputing an evaluation network QμIs the loss function L (theta)Q):
Figure FDA00034711631900000314
Wherein, thetaQTo evaluate the parameters of the network, Δ ut+1Is the muscle activation signal at time t +1, Δ utMuscle activation signal at time t;
step S430, based on the loss function L (theta)Q) For evaluation network QμParameter theta ofQUpdating:
Figure FDA00034711631900000315
wherein eta is1Represents an update step size;
based on said evaluation network QμUpdating policy network μ (· | θ)μ) Parameter theta ofμ
Figure FDA0003471163190000041
Wherein the content of the first and second substances,η2which represents the step size of the update,
Figure FDA0003471163190000042
for policy network mu (· | theta)μ) Gradient (2):
Figure FDA0003471163190000043
step S440, if T ≠ ToutThe method from step S200 to step S400 is repeated until T is T +1out
Step S450, if K ≠ K, let K ≠ K +1, and repeat the method from step S100 to step S400 until K ═ K, at which time the muscle activation signal ut(t∈[1,T]) The sequence of muscle activation signals required to accomplish control.
6. The method of claim 2, wherein the output of the basal ganglia model, OUTiComprises the following steps:
Figure FDA0003471163190000044
wherein the content of the first and second substances,
Figure FDA0003471163190000045
for gain factors, i and j are the ordinal numbers of the perceptual evidence.
7. The method for controlling musculoskeletal system based on speed accuracy tradeoff according to claim 2, wherein the estimation of motion accuracy W by the feitz law is:
Figure FDA0003471163190000046
where a and b are two constant parameters and D is the distance moved by the joint tip.
8. A musculoskeletal system control system based on speed accuracy trade-offs, the system comprising: the device comprises an accuracy estimation module, an expected torque calculation module, an activation signal calculation module and a speed accuracy balance module;
making the training times k equal to 1;
the precision estimation module is used for acquiring the estimated motion precision W of the musculoskeletal system at the moment t through the Fitz rule;
the expected moment calculation module is used for calculating the moment of a supervision item through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Figure FDA0003471163190000047
The activation signal calculation module is used for calculating the moment based on the supervision item
Figure FDA0003471163190000048
Computing a muscle activation signal vector u through a muscle activation signal networkt
The velocity accuracy trade-off module is used for activating the signal vector u based on the muscletAnd moment of supervision
Figure FDA0003471163190000049
Calculating an action reward RtAnd then calculates a predetermined loss function L (theta)Q) Based on said predetermined loss function L (θ)Q) Adjusting parameters of a muscle activation signal network to cause an action reward RtAnd increasing the value, and repeating the function of the precision estimation module, namely the speed precision balancing module, when K is equal to K +1 until K is equal to K, wherein K is the preset maximum training frequency, so as to obtain a muscle activation signal sequence required by control.
9. A storage device having stored therein a plurality of programs, wherein said programs are adapted to be loaded and executed by a processor to implement the method of musculoskeletal system control based on speed accuracy trade-offs of any one of claims 1-7.
10. A processing apparatus comprising a processor adapted to execute programs; and a storage device adapted to store a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the method of musculoskeletal system control based on speed accuracy trade-off of any of claims 1-7.
CN202011610884.4A 2020-12-30 2020-12-30 Method, system and device for controlling musculoskeletal system based on speed precision balance Active CN112757275B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011610884.4A CN112757275B (en) 2020-12-30 2020-12-30 Method, system and device for controlling musculoskeletal system based on speed precision balance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011610884.4A CN112757275B (en) 2020-12-30 2020-12-30 Method, system and device for controlling musculoskeletal system based on speed precision balance

Publications (2)

Publication Number Publication Date
CN112757275A CN112757275A (en) 2021-05-07
CN112757275B true CN112757275B (en) 2022-02-25

Family

ID=75695918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011610884.4A Active CN112757275B (en) 2020-12-30 2020-12-30 Method, system and device for controlling musculoskeletal system based on speed precision balance

Country Status (1)

Country Link
CN (1) CN112757275B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113199460B (en) * 2021-05-24 2022-09-02 中国科学院自动化研究所 Nonlinear musculoskeletal robot control method, system and device
CN114918914B (en) * 2022-04-26 2024-03-22 中国科学院自动化研究所 Simulation control system and simulation device for human musculature

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040229198A1 (en) * 2003-05-15 2004-11-18 Cns Vital Signs, Llc Methods and systems for computer-based neurocognitive testing
CN107199569B (en) * 2017-06-22 2020-01-21 华中科技大学 Joint robot trajectory planning method based on joint energy balanced distribution
CN108115681B (en) * 2017-11-14 2020-04-07 深圳先进技术研究院 Simulation learning method and device for robot, robot and storage medium
CN108724191A (en) * 2018-06-27 2018-11-02 芜湖市越泽机器人科技有限公司 A kind of robot motion's method for controlling trajectory
JP7045962B2 (en) * 2018-08-24 2022-04-01 株式会社日立産機システム AC motor control device and its control method
CN111515929A (en) * 2020-04-15 2020-08-11 深圳航天科技创新研究院 Human motion state estimation method, device, terminal and computer readable storage medium

Also Published As

Publication number Publication date
CN112757275A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
CN110909859B (en) Bionic robot fish motion control method and system based on antagonistic structured control
CN112757275B (en) Method, system and device for controlling musculoskeletal system based on speed precision balance
Kober et al. Reinforcement learning to adjust parametrized motor primitives to new situations
Ghosh et al. Divide-and-conquer reinforcement learning
Nguyen-Tuong et al. Using model knowledge for learning inverse dynamics
Kober et al. Reinforcement learning to adjust robot movements to new situations
Cheema et al. Predicting mid-air interaction movements and fatigue using deep reinforcement learning
Higuera et al. Synthesizing neural network controllers with probabilistic model-based reinforcement learning
Katliar et al. Nonlinear model predictive control of a cable-robot-based motion simulator
CN112405542B (en) Musculoskeletal robot control method and system based on brain inspiring multitask learning
Wu et al. Semi-parametric Gaussian process for robot system identification
Wochner et al. Optimality principles in human point-to-manifold reaching accounting for muscle dynamics
Kebria et al. Deep imitation learning: The impact of depth on policy performance
Wiklendt et al. A small spiking neural network with LQR control applied to the acrobot
Byravan et al. Evaluating model-based planning and planner amortization for continuous control
CN114802817A (en) Satellite attitude control method and device based on multi-flywheel array
Bae et al. Curriculum learning for vehicle lateral stability estimations
CN111531543B (en) Robot self-adaptive impedance control method based on biological heuristic neural network
Shi et al. Dynamical motor control learned with deep deterministic policy gradient
CN115421387B (en) Variable impedance control system and control method based on inverse reinforcement learning
CN110515297B (en) Staged motion control method based on redundant musculoskeletal system
Wang et al. Model-free event-triggered optimal control with performance guarantees via goal representation heuristic dynamic programming
Blinov et al. Deep q-learning algorithm for solving inverse kinematics of four-link manipulator
CN114474078A (en) Friction force compensation method and device for mechanical arm, electronic equipment and storage medium
CN113977580A (en) Mechanical arm simulation learning method based on dynamic motion primitives and adaptive control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant