CN112757275A

CN112757275A - Method, system and device for controlling musculoskeletal system based on speed precision balance

Info

Publication number: CN112757275A
Application number: CN202011610884.4A
Authority: CN
Inventors: 周俊杰; 钟汕林; 乔红; 吴伟
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-05-07
Anticipated expiration: 2040-12-30
Also published as: CN112757275B

Abstract

The invention belongs to the technical field of control, and particularly relates to a musculoskeletal system control method, a musculoskeletal system control system and a musculoskeletal system control device based on speed precision balance, aiming at solving the problem that the existing control method of a musculoskeletal robot similar to a human cannot well control antagonistic muscle cooperative contraction. The invention comprises the following steps: obtaining estimated motion precision of a musculoskeletal system through a Fitz rule, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision, calculating a muscle activation signal vector through a muscle activation signal network, calculating action reward based on the muscle activation signal vector and the supervision item moment, further calculating a loss function, adjusting parameters of the muscle activation signal network based on the loss function, increasing the value of the action reward, and repeatedly iterating to obtain a muscle activation signal sequence required by control; the invention utilizes the structural information of a musculoskeletal system, constructs a general antagonistic muscle cooperative contraction control strategy and ensures the smooth movement.

Description

Method, system and device for controlling musculoskeletal system based on speed precision balance

Technical Field

The invention belongs to the technical field of control, and particularly relates to a method, a system and a device for controlling a musculoskeletal system based on speed precision balance.

Background

The adaptability of living beings allows them the flexibility to adjust and execute behaviors, allowing learned skilled sports to vary according to the environment and task requirements. One of the typical strategies to achieve motion variability is a speed accuracy tradeoff, which reflects the trade-off between rapidity and accuracy of motion. How to implement such a flexible behavior strategy in a human-like musculoskeletal robot, which enables the robot to generate universal adaptability to environment and tasks, is an attractive challenge. On the other hand, for a human-like musculoskeletal robot system, the number of muscles is generally far greater than the number of joints, and redundant muscles not only bring difficulty to exercise learning, but also bring trouble to generation of new exercises. It is also a challenge how to construct a general antagonistic muscle cooperative contraction control strategy using structural information of the musculoskeletal system, especially considering partial muscle damage.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, the problem that the existing human-like musculoskeletal robot control method cannot well perform antagonistic muscle cooperative contraction control, the present invention provides a musculoskeletal system control method based on speed precision balance, the method comprising:

making the training times k equal to 1;

s100, obtaining estimated motion precision W of a musculoskeletal system at the time t through a Fitz rule;

step S200, calculating a supervision item moment through a speed modulation strategy inspired by a striatum based on the estimated motion precision W

Step S300, based on the supervision item moment

Computing a muscle activation signal vector u through a muscle activation signal network_t；

Step S400, based on the muscle activation signal vector u_tAnd moment of supervision

Calculating an action reward R_tAnd further calculating a preset loss function L, adjusting parameters of the muscle activation signal network based on the preset loss function L, and enabling the action reward R_tAnd increasing the value, and repeating the steps S100-S400 by making K equal to K +1 until K equal to K is the preset maximum training time, so as to obtain a muscle activation signal sequence required by control.

In some preferred embodiments, step S100 includes:

presetting accumulated time T;

step S110, obtaining t through a cortex model₁Perceptual evidence of time x_i(t₁)～N(μ_i，σ²) Further, the accumulated perception evidence Y is obtained_i(T)：

Step S120, the perception evidence Y is processed_i(T) inputting the basal ganglia model to obtain an output OUT of the basal ganglia model_i；

Step S130, outputting OUT based on the basal ganglia model_iPassing through a preset decision threshold-ln P_i(T) obtaining a first effective accumulation time

And a second effective accumulation time

If T > 0, let T be T-1, repeat steps S110-S130;

wherein, when firstSecond occurrence of the output OUT of the basal ganglia model_i≥-ln P_iAt (T), will OUT_iThe corresponding accumulation time T is set as the first effective accumulation time

Output OUT when the basal ganglia model is present_i＜≥-ln P_iAt (T), will OUT_iThe corresponding accumulated time T is set as the second effective accumulated time

Each time generating new second effective accumulation time in the iterative process

Covering the last generated second effective accumulation time

The-ln P_i(T) is a decision threshold;

step S140, passing the first effective accumulation time

And a second effective accumulation time

Obtaining a final decision time

Step S150, based on the final decision time T_outAnd estimating the motion precision W by the Fitz rule. The accuracy of the subsequent muscle control is adjusted by calculating the appropriate final decision time.

In some preferred embodiments, the striatal inspired speed modulation strategy is:

wherein,

joint angle, q, representing an end position calculated from the estimated motion accuracy_sAngle of articulation, t, representing initial position_SIs the starting moment of the movement, V_M(lambda, T) is a bell-shaped velocity modulation model, T denotes the time T of the modulation, T_outRepresenting a decision time;

said bell-shaped velocity modulation model V_M(λ, t) is:

wherein, λ is a parameter of the modulation model, and t represents t time;

the desired angular velocity of the joint is

The supervision term moment

Comprises the following steps:

wherein q is_tIs the joint angle at the time t,

the desired angular velocity of the joint is,

angular acceleration of desired angular velocity of joint, M (q)_t) Is an inertial matrix of the musculoskeletal system,

centripetal CoriolisForce, G (q)_t) Is the gravity matrix of the musculoskeletal system;

the inertia matrix M (q) of the musculoskeletal system_t) Comprises the following steps:

the centripetal Coriolis force

Comprises the following steps:

the gravity matrix G (q) of the musculoskeletal system_t) Comprises the following steps:

wherein m is₁Representing the mass, m, of the first link of the arm₂Indicating the mass of the second link of the robot arm, d₁Indicating the length of the first link of the robot arm, d₂The length of the second connecting rod of the mechanical arm is shown,

q_1，trepresenting the angle of the first joint of the arm, q_2，tIndicating the angle of the second joint of the robotic arm,

and

is the angular velocity of the first joint and the second joint of the mechanical arm.

In some preferred embodiments, the muscle activation signal vector u_tThe calculation method comprises the following steps:

wherein u is_t-1Is the muscle activation signal at time t-1, tau_t-1For the joint moment at time t-1, the strategy network mu (· | theta)^μ) A neural network for solving for muscle activation signals.

In some preferred embodiments, step S400 includes:

step S410, based on the muscle activation signal vector u_tAnd moment of supervision

Calculating an action reward R_tLet R be_tAs large as possible:

where, gamma is a discount factor,

for the moment generated by the flexors at time t,

the force produced by the flexor muscle at time t,

the muscle activation signal generated for the flexor muscle at time t,

for the moment generated by the extensor muscle at time t,

the force produced by the extensor muscle at time t,

generating muscle activation signals for the extensors at time t, p being the number of flexors, q being the extensionNumber of muscles, ω₁And ω₂Is a proportional parameter;

step S420, reward R based on the action_tComputing an evaluation network Q^μLoss function L of (d):

wherein, theta^QTo evaluate the parameters of the network, Δ u_t+1Is the muscle activation signal at time t +1, Δ u_tMuscle activation signal at time t;

step S430, evaluating the network Q based on the loss function L^μParameter theta of^QUpdating:

wherein eta is₁Represents an update step size;

based on said evaluation network Q^μUpdating policy network μ (· | θ)^μ) Parameter theta of^μ：

Wherein eta is₂Which represents the step size of the update,

for policy network mu (· | theta)^μ) Gradient (2):

step S440, if T ≠ T_outThe method from step S200 to step S400 is repeated until T is T +1_out；

Step S450, if K ≠ K, let K ═ K +1, repeat the method of steps S100-S400 until K ═ K, at which time the muscle activation signalNumber u_t(t∈[1，T]) The sequence of muscle activation signals required to accomplish control.

In some preferred embodiments, the output of the basal ganglia model, OUT_iComprises the following steps:

wherein,

for gain factors, i and j are the ordinal numbers of the perceptual evidence.

In some preferred embodiments, the motion precision W estimated by the feitz law is:

where a and b are two constant parameters and D is the distance moved by the joint tip.

In another aspect of the invention, a musculoskeletal system control system based on speed accuracy balance is provided, the system comprising an accuracy estimation module, an expected torque calculation module, an activation signal calculation module and a speed accuracy balance module;

making the training times k equal to 1;

the precision estimation module is used for acquiring the estimated motion precision W of the musculoskeletal system at the moment t through the Fitz rule;

the expected moment calculation module is used for calculating the moment of a supervision item through a speed modulation strategy inspired by a striatum based on the estimated motion precision W

The activation signal calculation module is used for calculating the moment based on the supervision item

By muscle stimulationActivity signal network computing muscle activation signal vector u_t；

The velocity accuracy trade-off module is used for activating the signal vector u based on the muscle_tAnd moment of supervision

Calculating an action reward R_tAnd further calculating a preset loss function L, adjusting parameters of the muscle activation signal network based on the preset loss function L, and enabling the action reward R_tAnd increasing the value, and repeating the function of the precision estimation module, namely the speed precision balancing module, when K is equal to K +1 until K is equal to K, wherein K is the preset maximum training frequency, so as to obtain a muscle activation signal sequence required by control.

In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-mentioned musculoskeletal system control method based on speed accuracy trade-off.

In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; the processor is suitable for executing various programs; the storage device is suitable for storing a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the aforementioned musculoskeletal system control method based on speed accuracy trade-offs.

The invention has the beneficial effects that:

(1) the muscle-skeletal-system control method based on speed precision balance is characterized in that an antagonistic muscle cooperative contraction strategy is designed by combining the Fitz rule and a speed modulation strategy of a neuron loop in the striatum FSI-SPN, a universal redundant muscle control algorithm is realized, and the adaptability of the muscle-skeletal-system to the simulated movement on the basis of nerves is improved.

(2) The invention relates to a musculoskeletal system control method based on speed precision balance, which is characterized in that a general antagonistic muscle cooperative contraction control strategy is constructed by designing action rewards of an antagonistic muscle cooperative contraction strategy and updating parameters of a strategy network through combining the action rewards with an evaluation network, and is beneficial to the movement learning and control of redundant muscles of a man-like musculoskeletal robot.

(3) The invention relates to a musculoskeletal system control method based on speed precision balance, which constructs a supervised Markov decision process algorithm by introducing a supervision item in a Markov process, divides the control process into two stages of motion planning and motion execution, and takes the two stages as the basis for realizing motion variability, thereby realizing the efficient training and control of a musculoskeletal robot system.

(4) The muscle-skeletal-system control method based on speed precision balance calculates proper exercise execution time by combining an antagonistic muscle cooperative contraction strategy and an active speed precision balance model, further influences the precision of muscle control, and realizes the adaptability of exercise simulated on the basis of nerves on a muscle-skeletal system.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a flow diagram of a musculoskeletal system control method based on speed accuracy trade-off in accordance with an embodiment of the present invention;

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

The invention provides a musculoskeletal system control method based on speed precision balance.

The invention relates to a musculoskeletal system control method based on speed precision balance, which comprises the following steps:

making the training times k equal to 1; because musculoskeletal robotic systems have redundant numbers of joints, there are several implementations for a single motor task.

Step S300, based on the supervision item moment

Calculating an action reward R_tCalculating a loss value through a preset loss function L, and adjusting parameters of a muscle activation signal network based on the loss value corresponding to the preset loss function L to enable an action reward R_tAnd increasing the value, and if K is less than K, repeating the steps S100-S400 by making K equal to K +1 until K equal to K is the preset maximum training times, so as to obtain a muscle activation signal sequence required by control.

In order to more clearly describe the musculoskeletal system control method based on speed precision balance, the following describes an embodiment of the present invention in detail with reference to fig. 1.

The invention discloses a musculoskeletal system control method based on speed precision balance, which comprises a step S100-a step S400, wherein the steps are described in detail as follows:

making the training times k equal to 1;

in the present embodiment, step S100 includes:

presetting accumulated time T; the cumulative time T set in the first iteration needs to be the maximum time that can be set for the subsequent iteration to proceed, and preferably T may be selected to be 30 s.

Step S110, obtaining t through a cortex model₁Perceptual evidence of time x_i(t₁)～N(μ_i，σ²) Further, the accumulated perception evidence Y is obtained_i(T) is shown in equation (1):

step S120, the perception evidence Y is processed_i(T) inputting the basal ganglia model to obtain an output OUT of the basal ganglia model_i(ii) a The accumulated perception evidence is transmitted into the striatum of the extremely low ganglion model, and is collected in the substantia nigra and the globus pallidus after passing through a direct path and an indirect path to obtain the output OUT of the basal ganglion model_i；

And a second effective accumulation time

If T > 0, let T be T-1, repeat steps S110-S130;

wherein the output OUT of the basal ganglia model occurs for the first time_i≥-ln P_iAt (T), will OUT_iThe corresponding accumulation time T is set as the first effective accumulation time

When the basal nerve appearsOutput OUT of the section model_i＜-ln P_iAt (T), will OUT_iThe corresponding accumulated time T is set as the second effective accumulated time

Covering the last generated second effective accumulation time

The-ln P_i(T) is a decision threshold;

in the present embodiment, the decision threshold value-ln P_i(T) to determine if evidence is sufficient to make a decision, where P_i(T) indicating the accuracy of the decision, e.g. P_i0.8 (T) means that the decision has a probability of being correct of 80%.

In this embodiment, the output OUT of the model of the basal ganglia_iAs shown in equation (2):

wherein,

is a gain factor.

Step S140, passing the first effective accumulation time

And a second effective accumulation time

Obtaining a final decision time

Step S150, based on the final decisionPolicy time T_outAnd estimating the motion precision W by the Fitz rule.

In this embodiment, the motion precision W estimated by the fitz law is shown in formula (3):

where a and b are two constant parameters and D is the distance of movement of the joint end

In this embodiment, the striatum-inspired speed modulation strategy is shown in equation (4):

wherein,

said bell-shaped velocity modulation model V_M(λ, t) is shown in equation (5):

wherein, λ is a parameter of the modulation model, and t represents t time;

the desired angular velocity of the joint is

The supervision term moment

As shown in equation (6):

wherein q is_tIs the joint angle at the time t,

the desired angular velocity of the joint is,

centripetal Coriolis force, G (q)_t) Is the gravity matrix of the musculoskeletal system;

the inertia matrix M (q) of the musculoskeletal system_t) As shown in equation (7):

the centripetal Coriolis force

As shown in equation (8):

the gravity matrix G (q) of the musculoskeletal system_t) As shown in formula (9):

and

Step S300, based on the supervision item moment

In this embodiment, the muscle activation signal vector u_tThe calculation method is shown as formula (10):

wherein u is_t-1Is the muscle activation signal at time t-1, tau_t-1For the joint moment at time t-1, the strategy network mu (· | theta)^μ) A neural network for solving for muscle activation signals. The preferred policy network may be that of the classical DDPG method.

In this embodiment, step S400 includes:

Calculating an action reward R_tLet R be_tAs large as possible, as shown in equation (11):

where, gamma is a discount factor,

for the moment generated by the flexors at time t,

in order to produce the force of the flexors at the moment,

the muscle activation signal generated for the flexor muscle at time t,

for the moment generated by the extensor muscle at time t,

the force produced by the extensor muscle at time t,

producing muscle for extensor at time tMeat activation signal, p is the number of flexors, q is the number of extensors, ω₁And ω₂Is a proportional parameter; the fine control of redundant muscles is realized through the steps.

In this embodiment, the higher the reward for action, the more accurate the output is, and the purpose of the present invention is to bring the resultant moment closer to the supervisory term while minimizing the change in muscle activation signals of the flexors and extensors to form a stable coordinated contraction between the flexors and extensors.

Step S420, reward R based on the action_tComputing an evaluation network Q^μIs shown in equation (12):

wherein, theta^QTo evaluate the parameters of the network, Δ u_t+1Is the muscle activation signal at time t-1, Δ u_tMuscle activation signal at time t; muscle activation signal Δ u here_tIncluding flexor muscle activation signals and extensor muscle activation signals; wherein the network Q is evaluated^μRepresenting a state-action value function;

step S430, evaluating the network Q based on the loss function L^μParameter theta of^QUpdating is performed as shown in equation (13):

wherein eta is₁Represents an update step size;

based on said evaluation network Q^μUpdating policy network μ (· | θ)^μ) Parameter theta of^μAs shown in equation (14):

wherein eta is₂Which represents the step size of the update,

for policy network mu (· | theta)^μ) The gradient of (d) is shown in equation (15):

Step S450, if K ≠ K, let K ═ K, and repeat the method of steps S100-S400 until K ═ K, at which time the muscle activation signal u_t(t∈[1，T]) The sequence of muscle activation signals required to accomplish control. The calculation process is such that the initial activation signal is assumed to be u_t-1Calculating the variation of the model at each moment

At the next moment, the signal input to the muscle becomes

Repeating until T is T; to show the sequence and value differences, u_t(t∈[1，T]) Represents a sequence, u_tRepresenting a single value.

Aiming at a high-redundancy and high-coupling musculoskeletal robot system, on one hand, a biological credible basal ganglia calculable decision model is provided by simulating a cortical-basal ganglia neural loop by using a neural mechanism of speed precision balance of a living being as reference. Meanwhile, an active speed precision balance model is provided by combining Fitts' Law and a speed modulation strategy of a neuron loop in the striatum FSI-SPN, and the adaptability of flexibly adjusting the skilled sports performance according to the environment information and the task related parameters is realized. On the other hand, in order to realize efficient training and control of the musculoskeletal robot system, a supervision item is introduced in a Markov Decision Process (MDP), a supervised MDP algorithm is constructed, and the control process is divided into two stages of motion planning and motion execution, which are used as a basis for realizing motion variability. And in the exercise execution stage, an antagonistic muscle cooperative contraction strategy is designed for exploring the exercise cooperative relationship among antagonistic muscles, so that a universal redundant muscle control algorithm is realized. Finally, the algorithm is combined with an active speed precision balance model, and the adaptability of the motion simulated on the basis of the nerve is realized on a musculoskeletal system. The musculoskeletal system control system based on speed precision balance comprises a precision estimation module, an expected torque calculation module, an activation signal calculation module and a speed precision balance module;

making the training times k equal to 1;

Calculating an action reward R_tAnd further calculating a preset loss function L, adjusting parameters of the muscle activation signal network based on the preset loss function L, and enabling the action reward R_tIncreasing the value, and repeating the function of the precision estimation module, namely the speed precision balance module, when K is equal to K +1 until K is equal to K, and K is pre-determinedAnd (4) setting the maximum training times to obtain a muscle activation signal sequence required by control.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.

It should be noted that the musculoskeletal system control system based on speed precision tradeoff provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the above embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

A storage device according to a third embodiment of the invention has stored therein a plurality of programs adapted to be loaded and executed by a processor to implement the method of musculoskeletal system control based on speed accuracy trade-off described above.

A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the aforementioned musculoskeletal system control method based on speed accuracy trade-offs.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A musculoskeletal system control method based on speed accuracy trade-off, the control method comprising:

making the training times k equal to 1;

Step S300, based on the supervision item moment

2. The method for controlling a musculoskeletal system based on a speed accuracy tradeoff according to claim 1, wherein the step S100 comprises:

presetting accumulated time T;

And a second effective accumulation time

If T > 0, let T be T-1, repeat steps S110-S130;

Output OUT when the basal ganglia model is present_i＜-ln P_iAt (T), will OUT_iThe corresponding accumulated time T is set as the second effective accumulated time

Covering the last generated second effective accumulation time

The-ln P_i(T) is a decision threshold;

step S140, passing the first effective accumulation time

And a second effective accumulation time

Obtaining a final decision time

Step S150, based on the final decision time T_outAnd estimating the motion precision W by the Fitz rule.

3. The method of musculoskeletal system control based on speed accuracy trade-off of claim 2, wherein the striatal inspired speed modulation strategy is:

wherein,

said bell-shaped velocity modulation model V_M(λ, t) is:

wherein, λ is a parameter of the modulation model, and t represents t time;

the desired angular velocity of the joint is

The supervision term moment

Comprises the following steps:

wherein q is_tIs the joint angle at the time t,

the desired angular velocity of the joint is,

the centripetal Coriolis force

Comprises the following steps:

wherein m is₁Indicating the first arm of the robotMass of the connecting rod, m₂Indicating the mass of the second link of the robot arm, d₁Indicating the length of the first link of the robot arm, d₂The length of the second connecting rod of the mechanical arm is shown,

and

4. The method of claim 3, wherein the muscle activation signal vector u is a velocity precision tradeoff based musculoskeletal system control method_tThe calculation method comprises the following steps:

5. The method for controlling a musculoskeletal system based on a speed accuracy tradeoff according to claim 4, wherein the step S400 comprises:

Calculating an action reward R_tLet R be_tAs large as possible:

where, gamma is a discount factor,

for the moment generated by the flexors at time t,

in order to produce the force of the flexors at the moment,

the muscle activation signal generated for the flexor muscle at time t,

for the moment generated by the extensor muscle at time t,

the force produced by the extensor muscle at time t,

generating muscle activation signals for the extensors at time t, p being the number of flexors, q being the number of extensors, ω₁And ω₂Is a proportional parameter;

wherein, theta^QTo evaluate the parameters of the network, Δ u_t+1Is the muscle activation signal at time t-1, Δ u_tMuscle activation signal at time t;

wherein eta is₁Represents an update step size;

Wherein eta is₂Which represents the step size of the update,

for policy network mu (· | theta)^μ) Gradient (2):

Step S450, if K ≠ K, let K ≠ K +1, and repeat the method from step S100 to step S400 until K ═ K, at which time the muscle activation signal u_t(t∈[1，T]) The sequence of muscle activation signals required to accomplish control.

6. The method of claim 2, wherein the output of the basal ganglia model, OUT_iComprises the following steps:

wherein,

for gain factors, i and j are the ordinal numbers of the perceptual evidence.

7. The method for controlling musculoskeletal system based on speed accuracy tradeoff according to claim 2, wherein the estimation of motion accuracy W by the feitz law is:

8. A musculoskeletal system control system based on speed accuracy trade-offs, the system comprising: the device comprises an accuracy estimation module, an expected torque calculation module, an activation signal calculation module and a speed accuracy balance module;

making the training times k equal to 1;

9. A storage device having stored therein a plurality of programs, wherein said programs are adapted to be loaded and executed by a processor to implement the method of musculoskeletal system control based on speed accuracy trade-offs of any one of claims 1-7.

10. A processing apparatus comprising a processor adapted to execute programs; and a storage device adapted to store a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the method of musculoskeletal system control based on speed accuracy trade-off of any of claims 1-7.