CN112757275B - Method, system and device for controlling musculoskeletal system based on speed precision balance - Google Patents
Method, system and device for controlling musculoskeletal system based on speed precision balance Download PDFInfo
- Publication number
- CN112757275B CN112757275B CN202011610884.4A CN202011610884A CN112757275B CN 112757275 B CN112757275 B CN 112757275B CN 202011610884 A CN202011610884 A CN 202011610884A CN 112757275 B CN112757275 B CN 112757275B
- Authority
- CN
- China
- Prior art keywords
- time
- activation signal
- muscle activation
- moment
- musculoskeletal system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 210000002346 musculoskeletal system Anatomy 0.000 title claims abstract description 45
- 210000003205 muscle Anatomy 0.000 claims abstract description 96
- 230000004913 activation Effects 0.000 claims abstract description 78
- 230000009471 action Effects 0.000 claims abstract description 22
- 210000001577 neostriatum Anatomy 0.000 claims abstract description 11
- 108010076504 Protein Sorting Signals Proteins 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 26
- 238000009825 accumulation Methods 0.000 claims description 24
- 210000004227 basal ganglia Anatomy 0.000 claims description 19
- 238000004364 calculation method Methods 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 10
- 230000008447 perception Effects 0.000 claims description 7
- 239000000126 substance Substances 0.000 claims description 7
- 230000005484 gravity Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 4
- 230000001133 acceleration Effects 0.000 claims description 3
- 230000003213 activating effect Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000012804 iterative process Methods 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 210000004283 incisor Anatomy 0.000 claims 1
- 230000003042 antagnostic effect Effects 0.000 abstract description 10
- 230000008602 contraction Effects 0.000 abstract description 10
- 238000011217 control strategy Methods 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 12
- 210000005036 nerve Anatomy 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 208000003098 Ganglion Cysts Diseases 0.000 description 1
- 208000029549 Muscle injury Diseases 0.000 description 1
- 208000005400 Synovial Cyst Diseases 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 210000000609 ganglia Anatomy 0.000 description 1
- 210000001905 globus pallidus Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000007230 neural mechanism Effects 0.000 description 1
- 230000037078 sports performance Effects 0.000 description 1
- 210000003523 substantia nigra Anatomy 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/10—Programme-controlled manipulators characterised by positioning means for manipulator elements
- B25J9/1075—Programme-controlled manipulators characterised by positioning means for manipulator elements with muscles or tendons
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1628—Programme controls characterised by the control loop
- B25J9/1633—Programme controls characterised by the control loop compliant, force, torque control, e.g. combined with position control
Abstract
The invention belongs to the technical field of control, and particularly relates to a musculoskeletal system control method, a musculoskeletal system control system and a musculoskeletal system control device based on speed precision balance, aiming at solving the problem that the existing control method of a musculoskeletal robot similar to a human cannot well control antagonistic muscle cooperative contraction. The invention comprises the following steps: obtaining estimated motion precision of a musculoskeletal system through a Fitz rule, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision, calculating a muscle activation signal vector through a muscle activation signal network, calculating action reward based on the muscle activation signal vector and the supervision item moment, further calculating a loss function, adjusting parameters of the muscle activation signal network based on the loss function, increasing the value of the action reward, and repeatedly iterating to obtain a muscle activation signal sequence required by control; the invention utilizes the structural information of a musculoskeletal system, constructs a general antagonistic muscle cooperative contraction control strategy and ensures the smooth movement.
Description
Technical Field
The invention belongs to the technical field of control, and particularly relates to a method, a system and a device for controlling a musculoskeletal system based on speed precision balance.
Background
The adaptability of living beings allows them the flexibility to adjust and execute behaviors, allowing learned skilled sports to vary according to the environment and task requirements. One of the typical strategies to achieve motion variability is a speed accuracy tradeoff, which reflects the trade-off between rapidity and accuracy of motion. How to implement such a flexible behavior strategy in a human-like musculoskeletal robot, which enables the robot to generate universal adaptability to environment and tasks, is an attractive challenge. On the other hand, for a human-like musculoskeletal robot system, the number of muscles is generally far greater than the number of joints, and redundant muscles not only bring difficulty to exercise learning, but also bring trouble to generation of new exercises. It is also a challenge how to construct a general antagonistic muscle cooperative contraction control strategy using structural information of the musculoskeletal system, especially considering partial muscle damage.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, the problem that the existing human-like musculoskeletal robot control method cannot well perform antagonistic muscle cooperative contraction control, the present invention provides a musculoskeletal system control method based on speed precision balance, the method comprising:
making the training times k equal to 1;
s100, obtaining estimated motion precision W of a musculoskeletal system at the time t through a Fitz rule;
step S200, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Step S300, based on the supervision item momentComputing a muscle activation signal vector u through a muscle activation signal networkt;
Step S400, based on the muscle activation signal vector utAnd moment of supervisionCalculating an action reward RtAnd then calculates a predetermined loss function L (theta)Q) Based on said predetermined loss function L (θ)Q) Adjusting parameters of a muscle activation signal network to cause an action reward RtAnd increasing the value, and repeating the steps S100-S400 by making K equal to K +1 until K equal to K is the preset maximum training time, so as to obtain a muscle activation signal sequence required by control.
In some preferred embodiments, step S100 includes:
presetting accumulated time T;
step S110, obtaining t through a cortex model1Perceptual evidence of time xi(t1)~N(μi,σ2) Further, the accumulated perception evidence Y is obtainedi(T):
Step S120, the perception evidence Y is processedi(T) inputting the basal ganglia model to obtain an output OUT of the basal ganglia modeli;
Step S130, outputting OUT based on the basal ganglia modeliPassing through a preset decision threshold-ln Pi(T) obtaining a first effective accumulation timeAnd a second effective accumulation timeIf T > 0, let T be T-1, repeat steps S110-S130;
wherein the output OUT of the basal ganglia model occurs for the first timei≥-ln PiAt (T), will OUTiThe corresponding accumulation time T is set as the first effective accumulation timeOutput OUT when the basal ganglia model is presenti<-ln PiAt (T), will OUTiThe corresponding accumulated time T is set as the second effective accumulated timeEach time generating new second effective accumulation time in the iterative processCovering the last generated second effective accumulation timeThe-ln Pi(T) is a decision threshold;
step S140, passing the first effective accumulation timeAnd a second effective accumulation timeObtaining a final decision time
Step S150, based on the final decision time ToutAnd estimating the motion precision W by the Fitz rule. The accuracy of the subsequent muscle control is adjusted by calculating the appropriate final decision time.
In some preferred embodiments, the striatal inspired speed modulation strategy is:
wherein the content of the first and second substances,joint angle, q, representing an end position calculated from the estimated motion accuracysAngle of articulation, t, representing initial positionSIs the starting moment of the movement, VM(lambda, T) is a bell-shaped velocity modulation model, T denotes the time T of the modulation, ToutRepresenting a decision time;
said bell-shaped velocity modulation model VM(λ, t) is:
wherein, λ is a parameter of the modulation model, and t represents t time;
wherein q istIs the joint angle at the time t,the desired angular velocity of the joint is,angular acceleration of desired angular velocity of joint, M (q)t) Is an inertial matrix of the musculoskeletal system,centripetal Coriolis force, G (q)t) Is the gravity matrix of the musculoskeletal system;
the inertia matrix M (q) of the musculoskeletal systemt) Comprises the following steps:
the gravity matrix G (q) of the musculoskeletal systemt) Comprises the following steps:
wherein m is1Representing the mass, m, of the first link of the arm2Indicating the mass of the second link of the robot arm, d1Indicating the length of the first link of the robot arm, d2The length of the second connecting rod of the mechanical arm is shown,q1,trepresenting the angle of the first joint of the arm, q2,tIndicating the angle of the second joint of the robotic arm,andis the angular velocity of the first joint and the second joint of the mechanical arm.
In some preferred embodiments, the muscle activation signal vector utThe calculation method comprises the following steps:
wherein u ist-1Is the muscle activation signal at time t-1, taut-1For the joint moment at time t-1, the strategy network mu (· | theta)μ) A neural network for solving for muscle activation signals.
In some preferred embodiments, step S400 includes:
step S410, based on the muscle activation signal vector utAnd moment of supervisionCalculating an action reward RtLet R betAs large as possible:
where, gamma is a discount factor,for the moment generated by the flexors at time t,the force produced by the flexor muscle at time t,the muscle activation signal generated for the flexor muscle at time t,for the moment generated by the extensor muscle at time t,the force produced by the extensor muscle at time t,generating muscle activation signals for the extensors at time t, p being the number of flexors, q being the number of extensors, ω1And ω2Is a proportional parameter;
step S420, reward R based on the actiontComputing an evaluation network QμIs the loss function L (theta)Q):
Wherein, thetaQTo evaluate the parameters of the network, Δ ut+1Is the muscle activation signal at time t +1, Δ utMuscle activation signal at time t;
step S430, based on the loss function L (theta)Q) For the parameter theta of the evaluation network Q muQUpdating:
wherein eta is1Represents an update step size;
based on said evaluation network QμUpdating policy network μ (· | θ)μ) Parameter theta ofμ:
Wherein eta is2Which represents the step size of the update,for policy network mu (· | theta)μ) Gradient (2):
step S440, if T ≠ ToutThe method from step S200 to step S400 is repeated until T is T +1out;
Step S450, if K ≠ K, let K ≠ K +1, and repeat the method from step S100 to step S400 until K ≠ K, where the muscle activation signal ut (T ∈ [1, T ]) is a muscle activation signal sequence required for completing the control.
In some preferred embodiments, the output of the basal ganglia model, OUTiComprises the following steps:
wherein the content of the first and second substances,for gain factors, i and j are the ordinal numbers of the perceptual evidence.
In some preferred embodiments, the motion precision W estimated by the feitz law is:
where a and b are two constant parameters and D is the distance moved by the joint tip.
In another aspect of the invention, a musculoskeletal system control system based on speed accuracy balance is provided, the system comprising an accuracy estimation module, an expected torque calculation module, an activation signal calculation module and a speed accuracy balance module;
making the training times k equal to 1;
the precision estimation module is used for acquiring the estimated motion precision W of the musculoskeletal system at the moment t through the Fitz rule;
the expected moment calculation module is used for calculating the moment of a supervision item through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
The activation signal calculation module is used for calculating the moment based on the supervision itemComputing a muscle activation signal vector u through a muscle activation signal networkt;
The velocity accuracy trade-off module is used for activating the signal vector u based on the muscletAnd moment of supervisionCalculating an action reward RtAnd then calculates a predetermined loss function L (theta)Q) Based on said predetermined loss function L (θ)Q) Adjusting parameters of a muscle activation signal network to cause an action reward RtAnd increasing the value, and repeating the function of the precision estimation module, namely the speed precision balancing module, when K is equal to K +1 until K is equal to K, wherein K is the preset maximum training frequency, so as to obtain a muscle activation signal sequence required by control.
In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-mentioned musculoskeletal system control method based on speed accuracy trade-off.
In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; the processor is suitable for executing various programs; the storage device is suitable for storing a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the aforementioned musculoskeletal system control method based on speed accuracy trade-offs.
The invention has the beneficial effects that:
(1) the muscle-skeletal-system control method based on speed precision balance is characterized in that an antagonistic muscle cooperative contraction strategy is designed by combining the Fitz rule and a speed modulation strategy of a neuron loop in the striatum FSI-SPN, a universal redundant muscle control algorithm is realized, and the adaptability of the muscle-skeletal-system to the simulated movement on the basis of nerves is improved.
(2) The invention relates to a musculoskeletal system control method based on speed precision balance, which is characterized in that a general antagonistic muscle cooperative contraction control strategy is constructed by designing action rewards of an antagonistic muscle cooperative contraction strategy and updating parameters of a strategy network through combining the action rewards with an evaluation network, and is beneficial to the movement learning and control of redundant muscles of a man-like musculoskeletal robot.
(3) The invention relates to a musculoskeletal system control method based on speed precision balance, which constructs a supervised Markov decision process algorithm by introducing a supervision item in a Markov process, divides the control process into two stages of motion planning and motion execution, and takes the two stages as the basis for realizing motion variability, thereby realizing the efficient training and control of a musculoskeletal robot system.
(4) The muscle-skeletal-system control method based on speed precision balance calculates proper exercise execution time by combining an antagonistic muscle cooperative contraction strategy and an active speed precision balance model, further influences the precision of muscle control, and realizes the adaptability of exercise simulated on the basis of nerves on a muscle-skeletal system.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow diagram of a musculoskeletal system control method based on speed accuracy trade-off in accordance with an embodiment of the present invention;
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention provides a musculoskeletal system control method based on speed precision balance.
The invention relates to a musculoskeletal system control method based on speed precision balance, which comprises the following steps:
making the training times k equal to 1; because musculoskeletal robotic systems have redundant numbers of joints, there are several implementations for a single motor task.
S100, obtaining estimated motion precision W of a musculoskeletal system at the time t through a Fitz rule;
step S200, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Step S300, based on the supervision item momentComputing muscle activation signals over a muscle activation signal networkVector ut;
Step S400, based on the muscle activation signal vector utAnd moment of supervisionCalculating an action reward RtAnd passes through a predetermined loss function L (theta)Q) Calculating a loss value based on the predetermined loss function L (theta)Q) Adjusting parameters of the muscle activation signal network to cause an action reward R in response to the loss valuetAnd increasing the value, and if K is less than K, repeating the steps S100-S400 by making K equal to K +1 until K equal to K is the preset maximum training times, so as to obtain a muscle activation signal sequence required by control.
In order to more clearly describe the musculoskeletal system control method based on speed precision balance, the following describes an embodiment of the present invention in detail with reference to fig. 1.
The invention discloses a musculoskeletal system control method based on speed precision balance, which comprises a step S100-a step S400, wherein the steps are described in detail as follows:
making the training times k equal to 1;
s100, obtaining estimated motion precision W of a musculoskeletal system at the time t through a Fitz rule;
in the present embodiment, step S100 includes:
presetting accumulated time T; the cumulative time T set in the first iteration needs to be the maximum time that can be set for the subsequent iteration to proceed, and preferably T may be selected to be 30 s.
Step S110, obtaining t through a cortex model1Perceptual evidence of time xi(t1)~N(μi,σ2) Further, the accumulated perception evidence Y is obtainedi(T) is shown in equation (1):
step S120, the perception evidence Y is processedi(T) inputting the model of basal ganglia to obtainOutput of basal ganglia model OUTi(ii) a The accumulated perception evidence is transmitted into the striatum of the extremely low ganglion model, and is collected in the substantia nigra and the globus pallidus after passing through a direct path and an indirect path to obtain the output OUT of the basal ganglion modeli;
Step S130, outputting OUT based on the basal ganglia modeliPassing through a preset decision threshold-ln Pi(T) obtaining a first effective accumulation timeAnd a second effective accumulation timeIf T > 0, let T be T-1, repeat steps S110-S130;
wherein the output OUT of the basal ganglia model occurs for the first timei≥-ln PiAt (T), will OUTiThe corresponding accumulation time T is set as the first effective accumulation timeOutput OUT when the basal ganglia model is presenti<-ln PiAt (T), will OUTiThe corresponding accumulated time T is set as the second effective accumulated timeEach time generating new second effective accumulation time in the iterative processCovering the last generated second effective accumulation timeThe-ln Pi(T) is a decision threshold;
in the present embodiment, the decision threshold value-ln Pi(T) to determine if evidence is sufficient to make a decision, where Pi(T) indicating the accuracy of the decision, e.g. Pi(T) 0.8 is the decisionThe probability of correctness is 80%.
In this embodiment, the output OUT of the model of the basal gangliaiAs shown in equation (2):
Step S140, passing the first effective accumulation timeAnd a second effective accumulation timeObtaining a final decision time
Step S150, based on the final decision time ToutAnd estimating the motion precision W by the Fitz rule.
In this embodiment, the motion precision W estimated by the fitz law is shown in formula (3):
where a and b are two constant parameters and D is the distance of movement of the joint end
Step S200, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
In this embodiment, the striatum-inspired speed modulation strategy is shown in equation (4):
wherein the content of the first and second substances,joint angle, q, representing an end position calculated from the estimated motion accuracysAngle of articulation, t, representing initial positionSIs the starting moment of the movement, VM(lambda, T) is a bell-shaped velocity modulation model, T denotes the time T of the modulation, ToutRepresenting a decision time;
said bell-shaped velocity modulation model VM(λ, t) is shown in equation (5):
wherein, λ is a parameter of the modulation model, and t represents t time;
wherein q istIs the joint angle at the time t,the desired angular velocity of the joint is,angular acceleration of desired angular velocity of joint, M (q)t) Is a muscleThe inertial matrix of the skeletal system is,centripetal Coriolis force, G (q)t) Is the gravity matrix of the musculoskeletal system;
the inertia matrix M (q) of the musculoskeletal systemt) As shown in equation (7):
the gravity matrix G (q) of the musculoskeletal systemt) As shown in formula (9):
wherein m is1Representing the mass, m, of the first link of the arm2Indicating the mass of the second link of the robot arm, d1Indicating the length of the first link of the robot arm, d2The length of the second connecting rod of the mechanical arm is shown,q1,trepresenting the angle of the first joint of the arm, q2,tIndicating the angle of the second joint of the robotic arm,andis the first closing of the mechanical armThe angular velocity of the joint and the second joint.
Step S300, based on the supervision item momentComputing a muscle activation signal vector u through a muscle activation signal networkt;
In this embodiment, the muscle activation signal vector utThe calculation method is shown as formula (10):
wherein u ist-1Is the muscle activation signal at time t-1, taut-1For the joint moment at time t-1, the strategy network mu (· | theta)μ) A neural network for solving for muscle activation signals. The preferred policy network may be that of the classical DDPG method.
Step S400, based on the muscle activation signal vector ut and the supervision item momentCalculating an action reward RtAnd passes through a predetermined loss function L (theta)Q) Calculating a loss value based on the predetermined loss function L (theta)Q) Adjusting parameters of the muscle activation signal network to cause an action reward R in response to the loss valuetAnd increasing the value, and if K is less than K, repeating the steps S100-S400 by making K equal to K +1 until K equal to K is the preset maximum training times, so as to obtain a muscle activation signal sequence required by control.
In this embodiment, step S400 includes:
step S410, based on the muscle activation signal vector utAnd moment of supervisionCalculating an action reward RtLet R betAs large as possible, as shown in equation (11):
where, gamma is a discount factor,for the moment generated by the flexors at time t,in order to produce the force of the flexors at the moment,the muscle activation signal generated for the flexor muscle at time t,for the moment generated by the extensor muscle at time t,the force produced by the extensor muscle at time t,generating muscle activation signals for the extensors at time t, p being the number of flexors, q being the number of extensors, ω1And ω2Is a proportional parameter; the fine control of redundant muscles is realized through the steps.
In this embodiment, the higher the reward for action, the more accurate the output is, and the purpose of the present invention is to bring the resultant moment closer to the supervisory term while minimizing the change in muscle activation signals of the flexors and extensors to form a stable coordinated contraction between the flexors and extensors.
Step S420, reward R based on the actiontComputing an evaluation network QμIs the loss function L (theta)Q) As shown in equation (12):
wherein, thetaQTo evaluate the parameters of the network, Δ ut+1Is the muscle activation signal at time t + 1, Δ utMuscle activation signal at time t; muscle activation signal Δ u heretIncluding flexor muscle activation signals and extensor muscle activation signals; wherein the network Q is evaluatedμRepresenting a state-action value function;
step S430, based on the loss function L (theta)Q) For the parameter theta of the evaluation network Q muQUpdating is performed as shown in equation (13):
wherein eta is1Represents an update step size;
based on said evaluation network QμUpdating policy network μ (· | θ)μ) Parameter theta ofμAs shown in equation (14):
wherein eta is2Which represents the step size of the update,for policy network mu (· | theta)μ) The gradient of (d) is shown in equation (15):
step S440, if T ≠ ToutThe method from step S200 to step S400 is repeated until T is T +1out;
Step S450, if K ≠ K, let K ≠ K +1, and repeat the method from step S100 to step S400 until K ═ K, at which time the muscle activation signal ut(t∈[1,T]) The sequence of muscle activation signals required to accomplish control. The calculation process is such that the initial activation signal is assumed to be ut-1Calculating the variation of the model at each moment At the next moment, the signal input to the muscle becomesRepeating until T is T; to show the sequence and value differences, ut(t∈[1,T]) Represents a sequence, utRepresenting a single value.
Aiming at a high-redundancy and high-coupling musculoskeletal robot system, on one hand, a biological credible basal ganglia calculable decision model is provided by simulating a cortical-basal ganglia neural loop by using a neural mechanism of speed precision balance of a living being as reference. Meanwhile, an active speed precision balance model is provided by combining Fitts' Law and a speed modulation strategy of a neuron loop in the striatum FSI-SPN, and the adaptability of flexibly adjusting the skilled sports performance according to the environment information and the task related parameters is realized. On the other hand, in order to realize efficient training and control of the musculoskeletal robot system, a supervision item is introduced in a Markov Decision Process (MDP), a supervised MDP algorithm is constructed, and the control process is divided into two stages of motion planning and motion execution, which are used as a basis for realizing motion variability. And in the exercise execution stage, an antagonistic muscle cooperative contraction strategy is designed for exploring the exercise cooperative relationship among antagonistic muscles, so that a universal redundant muscle control algorithm is realized. Finally, the algorithm is combined with an active speed precision balance model, and the adaptability of the motion simulated on the basis of the nerve is realized on a musculoskeletal system. The musculoskeletal system control system based on speed precision balance comprises a precision estimation module, an expected torque calculation module, an activation signal calculation module and a speed precision balance module;
making the training times k equal to 1;
the precision estimation module is used for acquiring the estimated motion precision W of the musculoskeletal system at the moment t through the Fitz rule;
the expected moment calculation module is used for calculating the moment of a supervision item through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
The activation signal calculation module is used for calculating the moment based on the supervision itemComputing a muscle activation signal vector u through a muscle activation signal networkt;
The velocity accuracy trade-off module is used for activating the signal vector u based on the muscletAnd moment of supervisionCalculating an action reward RtAnd then calculates a predetermined loss function L (theta)Q) Based on said predetermined loss function L (θ)Q) Adjusting parameters of a muscle activation signal network to cause an action reward RtAnd increasing the value, and repeating the function of the precision estimation module, namely the speed precision balancing module, when K is equal to K +1 until K is equal to K, wherein K is the preset maximum training frequency, so as to obtain a muscle activation signal sequence required by control.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that the musculoskeletal system control system based on speed precision tradeoff provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the above embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
A storage device according to a third embodiment of the invention has stored therein a plurality of programs adapted to be loaded and executed by a processor to implement the method of musculoskeletal system control based on speed accuracy trade-off described above.
A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the aforementioned musculoskeletal system control method based on speed accuracy trade-offs.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
Claims (10)
1. A musculoskeletal system control method based on speed accuracy trade-off, the control method comprising:
making the training times k equal to 1;
s100, obtaining estimated motion precision W of a musculoskeletal system at the time t through a Fitz rule;
step S200, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
Step S300, based on the supervision item momentComputing a muscle activation signal vector u through a muscle activation signal networkt;
Step S400, based on the muscle activation signal vector utAnd moment of supervisionCalculating an action reward RtAnd passes through a predetermined loss function L (theta)Q) Calculating a loss value based on the predetermined loss function L (theta)Q) Adjusting parameters of the muscle activation signal network to cause an action reward R in response to the loss valuetAnd increasing the value, and if K is less than K, repeating the steps S100-S400 by making K equal to K +1 until K equal to K is the preset maximum training times, so as to obtain a muscle activation signal sequence required by control.
2. The method for controlling a musculoskeletal system based on a speed accuracy tradeoff according to claim 1, wherein the step S100 comprises:
presetting accumulated time T;
step S110, obtaining t through a cortex model1Perceptual evidence of time xi(t1)~N(μi,σ2) Further, the accumulated perception evidence Y is obtainedi(T):
Step S120, the perception evidence Y is processedi(T) inputting the basal ganglia model to obtain an output OUT of the basal ganglia modeli;
Step S130, outputting OUT based on the basal ganglia modeliPassing through a preset decision threshold-ln Pi(T) obtaining a first effective accumulation timeAnd a second effective accumulation timeIf T > 0, let T be T-1, repeat steps S110-S130;
wherein the output OUT of the basal ganglia model occurs for the first timei≥-ln PiAt (T), will OUTiThe corresponding accumulation time T is set as the first effective accumulation timeOutput OUT when the basal ganglia model is presenti<-ln PiAt (T), will OUTiThe corresponding accumulated time T is set as the second effective accumulated timeEach time generating new second effective accumulation time in the iterative processCovering the last generated second effective accumulation timeThe-ln Pi(T) is a decision threshold;
step S140, passing the first effective accumulation timeAnd a second effective accumulation timeObtaining a final decision time
Step S150, based on the final decision time ToutAnd estimating the motion precision W by the Fitz rule.
3. The method of musculoskeletal system control based on speed accuracy trade-off of claim 2, wherein the striatal inspired speed modulation strategy is:
wherein the content of the first and second substances,joint angle, q, representing an end position calculated from the estimated motion accuracysAngle of articulation, t, representing initial positionSIs the starting moment of the movement, VM(lambda, T) is a bell-shaped velocity modulation model, T denotes the time T of the modulation, ToutRepresenting a decision time;
said bell-shaped velocity modulation model VM(λ, t) is:
wherein, λ is a parameter of the modulation model, and t represents t time;
wherein q istIs the joint angle at the time t,the desired angular velocity of the joint is,angular acceleration of desired angular velocity of joint, M (q)t) Is a muscleThe inertial matrix of the skeletal system is,centripetal Coriolis force, G (q)t) Is the gravity matrix of the musculoskeletal system;
the inertia matrix M (q) of the musculoskeletal systemt) Comprises the following steps:
wherein m is1Representing the mass, m, of the first link of the arm2Indicating the mass of the second link of the robot arm, d1Indicating the length of the first link of the robot arm, d2The length of the second connecting rod of the mechanical arm is shown,q1,trepresenting the angle of the first joint of the arm, q2,tIndicating the angle of the second joint of the robotic arm,andis a mechanical armAngular velocity of the first joint and the second joint.
4. The method of claim 3, wherein the muscle activation signal vector u is a velocity precision tradeoff based musculoskeletal system control methodtThe calculation method comprises the following steps:
wherein u ist-1Is the muscle activation signal at time t-1, taut-1For the joint moment at time t-1, the strategy network mu (· | theta)μ) A neural network for solving for muscle activation signals.
5. The method for controlling a musculoskeletal system based on a speed accuracy tradeoff according to claim 4, wherein the step S400 comprises:
step S410, based on the muscle activation signal vector utAnd moment of supervisionCalculating an action reward RtLet R betAs large as possible:
where, gamma is a discount factor,for the moment generated by the flexors at time t,in order to produce the force of the flexors at the moment,is at t timeThe muscle activation signal generated by the incisor flexors,for the moment generated by the extensor muscle at time t,the force produced by the extensor muscle at time t,generating muscle activation signals for the extensors at time t, p being the number of flexors, q being the number of extensors, ω1And ω2Is a proportional parameter;
step S420, reward R based on the actiontComputing an evaluation network QμIs the loss function L (theta)Q):
Wherein, thetaQTo evaluate the parameters of the network, Δ ut+1Is the muscle activation signal at time t +1, Δ utMuscle activation signal at time t;
step S430, based on the loss function L (theta)Q) For evaluation network QμParameter theta ofQUpdating:
wherein eta is1Represents an update step size;
based on said evaluation network QμUpdating policy network μ (· | θ)μ) Parameter theta ofμ:
Wherein the content of the first and second substances,η2which represents the step size of the update,for policy network mu (· | theta)μ) Gradient (2):
step S440, if T ≠ ToutThe method from step S200 to step S400 is repeated until T is T +1out;
Step S450, if K ≠ K, let K ≠ K +1, and repeat the method from step S100 to step S400 until K ═ K, at which time the muscle activation signal ut(t∈[1,T]) The sequence of muscle activation signals required to accomplish control.
8. A musculoskeletal system control system based on speed accuracy trade-offs, the system comprising: the device comprises an accuracy estimation module, an expected torque calculation module, an activation signal calculation module and a speed accuracy balance module;
making the training times k equal to 1;
the precision estimation module is used for acquiring the estimated motion precision W of the musculoskeletal system at the moment t through the Fitz rule;
the expected moment calculation module is used for calculating the moment of a supervision item through a speed modulation strategy inspired by a striatum based on the estimated motion precision W
The activation signal calculation module is used for calculating the moment based on the supervision itemComputing a muscle activation signal vector u through a muscle activation signal networkt;
The velocity accuracy trade-off module is used for activating the signal vector u based on the muscletAnd moment of supervisionCalculating an action reward RtAnd then calculates a predetermined loss function L (theta)Q) Based on said predetermined loss function L (θ)Q) Adjusting parameters of a muscle activation signal network to cause an action reward RtAnd increasing the value, and repeating the function of the precision estimation module, namely the speed precision balancing module, when K is equal to K +1 until K is equal to K, wherein K is the preset maximum training frequency, so as to obtain a muscle activation signal sequence required by control.
9. A storage device having stored therein a plurality of programs, wherein said programs are adapted to be loaded and executed by a processor to implement the method of musculoskeletal system control based on speed accuracy trade-offs of any one of claims 1-7.
10. A processing apparatus comprising a processor adapted to execute programs; and a storage device adapted to store a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the method of musculoskeletal system control based on speed accuracy trade-off of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011610884.4A CN112757275B (en) | 2020-12-30 | 2020-12-30 | Method, system and device for controlling musculoskeletal system based on speed precision balance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011610884.4A CN112757275B (en) | 2020-12-30 | 2020-12-30 | Method, system and device for controlling musculoskeletal system based on speed precision balance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112757275A CN112757275A (en) | 2021-05-07 |
CN112757275B true CN112757275B (en) | 2022-02-25 |
Family
ID=75695918
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011610884.4A Active CN112757275B (en) | 2020-12-30 | 2020-12-30 | Method, system and device for controlling musculoskeletal system based on speed precision balance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112757275B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113199460B (en) * | 2021-05-24 | 2022-09-02 | 中国科学院自动化研究所 | Nonlinear musculoskeletal robot control method, system and device |
CN114918914B (en) * | 2022-04-26 | 2024-03-22 | 中国科学院自动化研究所 | Simulation control system and simulation device for human musculature |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040229198A1 (en) * | 2003-05-15 | 2004-11-18 | Cns Vital Signs, Llc | Methods and systems for computer-based neurocognitive testing |
CN107199569B (en) * | 2017-06-22 | 2020-01-21 | 华中科技大学 | Joint robot trajectory planning method based on joint energy balanced distribution |
CN108115681B (en) * | 2017-11-14 | 2020-04-07 | 深圳先进技术研究院 | Simulation learning method and device for robot, robot and storage medium |
CN108724191A (en) * | 2018-06-27 | 2018-11-02 | 芜湖市越泽机器人科技有限公司 | A kind of robot motion's method for controlling trajectory |
JP7045962B2 (en) * | 2018-08-24 | 2022-04-01 | 株式会社日立産機システム | AC motor control device and its control method |
CN111515929A (en) * | 2020-04-15 | 2020-08-11 | 深圳航天科技创新研究院 | Human motion state estimation method, device, terminal and computer readable storage medium |
-
2020
- 2020-12-30 CN CN202011610884.4A patent/CN112757275B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN112757275A (en) | 2021-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110909859B (en) | Bionic robot fish motion control method and system based on antagonistic structured control | |
CN112757275B (en) | Method, system and device for controlling musculoskeletal system based on speed precision balance | |
Kober et al. | Reinforcement learning to adjust parametrized motor primitives to new situations | |
Ghosh et al. | Divide-and-conquer reinforcement learning | |
Nguyen-Tuong et al. | Using model knowledge for learning inverse dynamics | |
Kober et al. | Reinforcement learning to adjust robot movements to new situations | |
Cheema et al. | Predicting mid-air interaction movements and fatigue using deep reinforcement learning | |
Higuera et al. | Synthesizing neural network controllers with probabilistic model-based reinforcement learning | |
Katliar et al. | Nonlinear model predictive control of a cable-robot-based motion simulator | |
CN112405542B (en) | Musculoskeletal robot control method and system based on brain inspiring multitask learning | |
Wu et al. | Semi-parametric Gaussian process for robot system identification | |
Wochner et al. | Optimality principles in human point-to-manifold reaching accounting for muscle dynamics | |
Kebria et al. | Deep imitation learning: The impact of depth on policy performance | |
Wiklendt et al. | A small spiking neural network with LQR control applied to the acrobot | |
Byravan et al. | Evaluating model-based planning and planner amortization for continuous control | |
CN114802817A (en) | Satellite attitude control method and device based on multi-flywheel array | |
Bae et al. | Curriculum learning for vehicle lateral stability estimations | |
CN111531543B (en) | Robot self-adaptive impedance control method based on biological heuristic neural network | |
Shi et al. | Dynamical motor control learned with deep deterministic policy gradient | |
CN115421387B (en) | Variable impedance control system and control method based on inverse reinforcement learning | |
CN110515297B (en) | Staged motion control method based on redundant musculoskeletal system | |
Wang et al. | Model-free event-triggered optimal control with performance guarantees via goal representation heuristic dynamic programming | |
Blinov et al. | Deep q-learning algorithm for solving inverse kinematics of four-link manipulator | |
CN114474078A (en) | Friction force compensation method and device for mechanical arm, electronic equipment and storage medium | |
CN113977580A (en) | Mechanical arm simulation learning method based on dynamic motion primitives and adaptive control |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |