CN114800488B

CN114800488B - Redundant mechanical arm operability optimization method and device based on deep reinforcement learning

Info

Publication number: CN114800488B
Application number: CN202210272600.8A
Authority: CN
Inventors: 梁斌; 王学谦; 杨皓强; 孟得山
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2022-03-18
Filing date: 2022-03-18
Publication date: 2023-06-20
Anticipated expiration: 2042-03-18
Also published as: CN114800488A

Abstract

The invention discloses a redundant mechanical arm operability optimization method based on deep reinforcement learning, which is characterized by comprising the following steps of completing the approach training of the redundant mechanical arm to a random target under a fixed reset mechanism by using a reinforcement learning method; continuously finishing the approach training of the redundant mechanical arm to the random target under the random reset mechanism by using a reinforcement learning method; wherein "randomly resetting" refers to letting the mechanical arm be in a random state; adding an operability item in the reward function, adding a coefficient of the operability item, and completing the optimization of the operability of the redundant mechanical arm by using a reinforcement learning method again; and controlling the redundant mechanical arm by using an optimized algorithm. According to the invention, the mechanical arm is trained by using the reinforcement learning method with the operability rewarding for the first time, so that the mechanical arm has the capability of automatically optimizing the operability while having the terminal track tracking capability, has good universality and can train various complex robot structures.

Description

Redundant mechanical arm operability optimization method and device based on deep reinforcement learning

Technical Field

The invention relates to the technical field of redundant mechanical arm control, in particular to a redundant mechanical arm operability optimization method and device based on deep reinforcement learning.

Background

Redundant mechanical arms have redundant space motion degrees of freedom, have great advantages in the aspect of space obstacle avoidance and motion planning, and become hot spots in the field of robot research. However, an important control difficulty exists in the field of redundant mechanical arm control, and is a singular point problem in motion planning. Although the redundant mechanical arm has strong flexibility, the problem of a singular arm type can still be encountered in actual motion planning, when the mechanical arm is close to a singular state, the joint of the mechanical arm can be severely dithered due to small displacement of the tail end, so that the problems of joint damage and sensor fault are caused. In order to solve the problem, a plurality of students optimize the operation performance evaluation index (such as operability) of the robot in the robot motion planning so as to ensure the dexterity of the robot motion, thereby being far away from the singular state of the robot as far as possible in the motion process.

In the smart control of robots, it is common practice to base the control on conventional control methods, i.e. to add operability in the joint zero space when planning a pathGradient of work degree w with angle q

The arm type moves towards the direction with high operability as much as possible during planning, but the processing brings complex matrix derivation and matrix inversion operation, and is inconvenient for real-time calculation. Reinforcement learning is one of machine learning, and it is a problem to study how to let an agent learn an execution strategy so that it can obtain the largest rewards in the environment. For example, chinese patent CN201710042360.1 proposes a motion planning method for optimizing the operability of a redundant manipulator, which comprises setting an optimized motion performance index with the maximum operability derivative of the redundant manipulator and a constraint relation corresponding to the motion performance index; converting the motion performance index and the corresponding constraint relation into a quadratic programming problem; solving the quadratic programming problem through a quadratic programming solver to obtain a solving result; and controlling the mechanical arm to move according to the solving result. However, this patent suffers from several drawbacks: a) The operability optimization of the patent is based on the traditional jacobian matrix optimization, and multiple iterative computations are needed, so that great time complexity is brought to the track planning process, and the operation speed is low; b) The operability optimization needs to carry out mathematical transformation aiming at the structures of different robots, and the formula is complex, so that the method is inconvenient to popularize on robots with more complex structures.

Disclosure of Invention

The invention aims to solve the technical problems of poor real-time performance, low operation speed and complex formula aiming at mathematical transformation of optimizing operability in a track planning process in the prior art, and provides a redundant mechanical arm operability optimizing method and device based on deep reinforcement learning.

The invention provides a redundant mechanical arm operability optimization method based on deep reinforcement learning, which comprises the following steps:

s1, finishing the approach training of the redundant mechanical arm to the random target by using a reinforcement learning method under a fixed reset mechanism;

s2, continuously finishing the approach training of the redundant mechanical arm to the random target by using a reinforcement learning method under a random reset mechanism; wherein "randomly resetting" refers to letting the mechanical arm be in a random state;

s3, adding an operability item in the reward function, increasing the coefficient of the operability item, and completing the optimization of the operability of the redundant mechanical arm by using a reinforcement learning method again;

and S4, controlling the redundant mechanical arm by using an optimized algorithm.

In some embodiments, the fixed reset in step S1 is that the mechanical arm is in a horizontally straightened state.

In some embodiments, in step S3, the algorithm is allowed to converge normally by adjusting the coefficients of the "operability" term.

In some embodiments, the random target approach task under the fixed reset mechanism of the redundant robotic arm is accomplished using a TD3 algorithm in reinforcement learning.

In some embodiments, in step S1, the arm is in a horizontal straightening state at the beginning of each round, and then the end of the arm reaches a randomly set target point, and is fixedly reset to the horizontal straightening state after each round is completed.

In some embodiments, the value ranges of the input state and the output action are symmetrically processed, so that the symmetrical distribution characteristics of the input state and the output action are guaranteed.

In some embodiments, the reward is set as the opposite number of euclidean distances of the robotic arm tip position from the target point.

In some embodiments, the discount factor γ is taken to be 0 to eliminate the interference of the next action value Q (s, a).

In some embodiments, k is taken _w1 The value of (c) is such that k _w1 /w _t+1 And d _t+1 Is of similar order, thereby taking into account both the end-of-line task and the increased operability task in training, where k _w1 Is an adjustable super parameter d _t+1 For the euclidean distance between the end position of the mechanical arm and the target point, the subscript t represents the state variable at the time t, and the subscript t+1 represents the state variable at the time t+1.

The invention also provides a redundant mechanical arm control device, which comprises: comprising at least one memory and at least one processor;

the memory includes at least one executable program stored therein;

the executable program, when executed by the processor, implements the method.

According to the redundant mechanical arm operability optimization method based on deep reinforcement learning, the mechanical arm is trained by using the reinforcement learning method with operability rewarding for the first time, operability indexes are increased in a rewarding function of the reinforcement learning method, the trained mechanical arm can automatically increase the operability of the mechanical arm while the tail end track moves, complex kinematic solution and iterative calculation are not needed, and the real-time performance is higher, so that the problem of real-time difference in the traditional method is solved, the mechanical arm has the capability of automatically optimizing the operability while having the tail end track tracking capability, and has good universality, and various complex robot structures can be trained.

In addition, according to the redundant mechanical arm operability optimization method based on deep reinforcement learning, through step-by-step optimization, the 'operability' item is added in the reward function step by step from easy to difficult, and the coefficient of the 'operability' item is increased, so that the training can be converged.

Drawings

FIG. 1 is a schematic flow chart of a redundant manipulator operability optimization method based on deep reinforcement learning according to an embodiment of the present invention;

FIG. 2 is a diagram of a 6-joint 12-degree-of-freedom super-redundant manipulator in a mujoco simulation engine according to an embodiment of the present invention;

FIG. 3 is a graph showing the success rate of different γ in the evaluation process according to the round under the fixed reset mechanism in the embodiment of the present invention;

FIG. 4 is a graph showing the return of different γ values during the evaluation process according to the fixed reset mechanism in the embodiment of the present invention;

FIG. 5 is a graph showing the success rate of different gamma in the evaluation process according to the round under the random reset mechanism in the embodiment of the invention;

FIG. 6 is a graph showing the return of different γ values during evaluation according to the round under the random reset mechanism according to the embodiment of the present invention;

FIG. 7 shows different k in an embodiment of the invention _w1 A plot of success rate over round during evaluation;

FIG. 8 shows different k in an embodiment of the invention _w1 A plot of success rate over round during evaluation;

FIG. 9 shows different k in an embodiment of the invention _w1 A plot of success rate over round during evaluation;

FIG. 10 shows different k in an embodiment of the invention _w1 In the evaluation process, the operability is changed along with the change curve graph of the motion steps of the mechanical arm in the circle track tracking process;

FIG. 11 shows different k _w1 In the evaluation process, the operability is changed along with the change curve graph of the motion steps of the mechanical arm in the linear track tracking process;

FIG. 12 shows different k in an embodiment of the invention _w1 In the process of evaluating the mixed track of the line segment and the circle, the mixed track comprises k _w1 A graph of variation of operability versus number of mechanical arm steps of motion =0;

FIG. 13 shows different k in an embodiment of the invention _w1 In the process of evaluating the mixed track of the line segment and the circle, k is deleted _w1 A plot of the variation of the manipulator motion steps with operability =0.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Fig. 1 is a schematic flow chart of a redundant manipulator operability optimization method based on deep reinforcement learning, which is provided by an embodiment of the invention, and includes the following steps:

In one embodiment of the invention, a TD3 algorithm in reinforcement learning is used to complete a random target approach task under a fixed reset mechanism of a 12-degree-of-freedom mechanical arm (experimental system hardware component: a computer equipped with a Linux system). Specifically: the mechanical arm is in a horizontal straightening state at the beginning of each round, then the tail end of the mechanical arm reaches a randomly arranged target point, and the mechanical arm is fixedly reset to the horizontal straightening state after the round is finished. This task is the basis for the following random reset mechanism task and the end trace tracking task (i.e., no reset at all).

In order to embody the super-redundancy characteristic of the mechanical arm, the 12-degree-of-freedom mechanical arm only considers the position of the tail end of the mechanical arm, and does not consider the gesture of the mechanical arm, so that 12 control amounts are super-redundant for the tail end position information of 3 degrees of freedom. It is worth mentioning that the concept of the invention can be fully extended to the addition of the terminal gesture information.

The invention is a mechanical arm shown in fig. 2, which has 6 joints, and each joint has two degrees of freedom of pitching and yawing, and the total degree of freedom is 12. Each arm rod has a length of 0.09m, each joint and end effector is represented by a small sphere, and the diameter of the small sphere is 0.01m, so that the length of the whole mechanical arm is 0.7m. From the actual robot conditions, the environment is globally observable and the state transitions conform to the markov chain, so the motion process of the robot can be regarded as a markov decision process. The Markov decision process may be composed of a six-tuple

Indicating (I)>

Is state space, ++>

Is a motion space, & lt + & gt>

For rewarding space, & lt>

Is a state transition probability space ρ ₀ Is the initial state distribution and gamma is the discount factor.

For convenience of the following description, the state space of the mechanical arm is denoted as

The mechanical arm comprises an arm joint angle, an arm joint angular speed, a mechanical arm tail end position and a mechanical arm tail end linear speed; the state space of the target point position is recorded as +.>

Since the task of this section is the approach of random target points, it is known from generalized value function fitter method that information of target points needs to be introduced as part of state to help convergence of reinforcement learning algorithm, i.e. state space->

Is formed by splicing two parts. Input states of "actor" network and "critique" network>

As shown in table 1, it is composed of five parts: arm joint angle, arm joint angular velocity, arm end position, arm end linear velocityAnd target coordinates. Action->

Is the value of the driver in the mujoco simulation engine. Through simple tests, the driving mode of the joint in mujoco is set to be a speed mode and a position mode which are not greatly different, and neither the joint speed nor the joint angle is directly equal to the set value and is regulated by PID control.

In order to make the neural network better converged, the value ranges of the input state and the output action are symmetrically processed, so that the values of the input state and the output action are ensured to have the symmetric distribution characteristics of [ -X, X ]. Because the TD3 algorithm is a model-free (model-free) algorithm, the training algorithm can be popularized to the mechanical arm with more degrees of freedom.

Because the kinematics of the robotic arm are well defined, the probability of state transition

The value p of (2) is also completely determined, and the formula (3-4) is satisfied, wherein f (·) represents the positive kinematics of the mechanical arm, pr [ · ]]Representing probability S _t Representing the state variable at the time t, and the corresponding S represents the value and S of the variable _t+1 Representing the state variable at time t+1, and the corresponding s' represents the value of the variable.

The most important thing of reinforcement learning algorithm is to set rewards

The correct rewards can guide the agent towards converging on the intended strategy. In general R _t+1 Is with S _t ,A _t ,S _t+1 Related, but according to the formula, S _t+1 Will be S _t And A _t Uniquely determined, so for simplicity, a prize R is set _t+1 Is S _t+1 End position e of middle mechanical arm _t+1 Is->

Euclidean distance d of (2) _t+1 The opposite number of (3-5) is satisfied, wherein R is awarded _t+1 Is S _t+1 End position e of middle mechanical arm _t+1 Is->

Euclidean distance d of (2) _t+1 R (S) _t ,A _t ) Indicating the variable R _t+1 And S is equal to _t And A _t Related to S _t+1 Is irrelevant; thus, not only can the task purpose be directly represented and the learning action of the agent be correctly guided, but also the form of rewards is simple enough.

Because of the state space

By->

And->

Two parts so that the initial state distribution ρ ₀ And is also described in two parts. In this section, the mechanical arm is fixed and reset, the joint angular velocity and the joint angle of the mechanical arm are 0 at the beginning of each round (the end position and the linear velocity can be determined by the joint angle and the angular velocity), and the state of the mechanical arm at this time is recorded as ^m s ₀ . While the target point position g is in the workspace +.>

Randomly selected. So the initial state distribution ρ ₀ Satisfy formula (3-6), wherein +.>

Is the target point positionSpace for placing

The number of all points Pr [ S ] ₀ ＝s]Representing the current state variable S ₀ Probability of value when =s.

Discount

factor gamma epsilon

0, 1. This parameter is represented in the TD3 algorithm by an update to the "critique" network, which represents the degree of importance to the next action value Q (s ', a'), the greater γ, the more importance it is to the next, which is represented in the equation.

The end of each round is marked d _t+1 ≤d _threshold =0.02 or the number of steps of the robot arm movement equals 100 steps.

TABLE 1 input cases of "actor" network and "critique" network

^a The drives in mujoco are all single control inputs, and if the speed mode is set, the speed of the drive does not reach that value directly, but rather by PID regulation, thus requiring a certain dwell time.

^b This value is the output of the "actor" network

^c Attempts to add, but it was found that adding this to the state variable had no significant effect on the improvement

TABLE 2 super parameters of "actor" network and "critique" network

The discount factor gamma is an important super-parameter that affects reinforcement learning training. This section studied that under different random seeds,the effect of the value of the discount factor gamma on training. As shown in fig. 3 and 4, the meaning of each point in the figure is: the evaluation was performed every 40 rounds in 12000 rounds of training, the value of each point was the mean of the success rate and the mean of return of the last 10 evaluations, the solid line in the figure represents the mean of the running results under 3 different random seeds, and the hatched area was the 95% confidence interval obtained. As can be seen from fig. 3 and 4, the larger the γ value, the worse the effect, and therefore the γ is optimally valued at 0. Analyzing the reason, the rewards are set to be in the form of the formula (3-5), and the action value Q (s, a) can be well depicted to represent the current action A _t For the current state S _t Instead, the influence of the action value Q (s ', a') of the next step is considered on the basis of the reward function, so that the interference is increased, and the network convergence of a 'commentator' is not facilitated.

Random target approach task under random reset mechanism

In the previous section, the TD3 algorithm is found to be used for well converging the mechanical arm to a target strategy and completing a random target approaching task under a fixed reset mechanism. The section further randomizes the initial state, and on the basis of the formula (3-6), randomizes the initial joint angle of the mechanical arm, wherein the joint angular velocity of the mechanical arm is still 0. Initial state distribution under random reset mechanism satisfies (3-7), wherein

Is the size of the state space of the mechanical arm.

The settings of the super parameters are exactly the same as in table 2. Under the random reset mechanism, the mechanical arm is trained for 20000 rounds, and the evaluation is also carried out once every 40 rounds, and the average success rate and return change curve is shown in fig. 4. Comparing fig. 3-4 with fig. 5-6, it can be seen that random reset is more difficult to converge than fixed reset, and as such, the smaller γ, the better the convergence effect.

Operability optimized end trajectory tracking

Operability isThe most commonly used index in robotics, describing the performance of a robot, generally represents the dexterity of the robot, the greater the operability, the more dexterous the robot. Specifically, the operability w is defined based on a jacobian matrix J (θ) of the robot velocity, and the calculation formula is represented by σ _i Is the singular value of matrix J (θ).

Because the smaller the operability is, the closer the mechanical arm is to the singular state, in order to avoid the singular state of the mechanical arm in the motion process, a plurality of students at home and abroad can optimize the operability of the mechanical arm in the motion planning, thereby ensuring the dexterity in the motion process. The problems generally encountered in the traditional control method and the neural network solving method are poor real-time performance, complex solving, incapability of moving to other types of mechanical arms and poor universality, so that training by using a reinforcement learning method is necessary, and the mechanical arms can automatically optimize the operability in the motion process.

The velocity jacobian matrix of the mechanical arm can be deduced according to the mechanical arm DH parameter method, and further the operability expression is deduced, and the deduction process is not described in detail in this section, and the operability of the mechanical arm is related to time and is marked as w _t . This section focuses on how to add operability to the reward function for reinforcement learning training.

For a 12 degree of freedom robotic arm in the present invention, distance d _t In the order of 10 ^-2 ～10 ^-1 Degree of operability w _t Is generally on the order of 10 ^-2 . Putting the operability into the reward function needs to meet two requirements at the same time, firstly, the mechanical arm hopes to learn a strategy to make the operability become large as much as possible, and secondly, the main tail end approaching task cannot be covered. Equations (3-9) satisfy the first point requirement, the greater the operability, the greater the prize, but do not satisfy the second point requirement because the symbol before operability of the robotic arm is positive, which leads toSo that the mechanical arm continuously adjusts the arm near the target point to obtain positive rewards to perform brushing without completing the approaching task of the target point.

R _t+1 ＝-d _t+1 +w _t+1 (3-9)

Then, by combining the two requirements, we can design many rewards meeting the requirements. Formulas (3-10) are one possible prize, where k _w1 Is an adjustable hyper-parameter. k (k) _w1 In the order of 10 ^-4 ～10 ^-3 Better, on the one hand

Not more than d _t+1 The strategy learned by the mechanical arm is guaranteed to still be capable of completing the tail end approaching task; on the other hand->

And not too small to be ignored during the training process. FIGS. 7-8 and 9 try different k _w1 The effect on the success rate of the proximity task is the same as in table 2 for the rest of the hyper parameters, and the value of γ is 0. Can see k _w1 In the range of 10 ^-4 ～10 ^-1 The end proximity task works well when it is, but exceeds 10 ^-1 The end proximity task is not easily accomplished because the order of magnitude of the term for operability is far beyond the order of magnitude of the Euclidean distance, and adjusting the operability task is considered more important than the end proximity task in the reinforcement learning algorithm.

Then comparing different k after 20000 rounds of training (thus ensuring that the random target points generated during training are consistent) on the same random seed _w1 And the difference of the tail end track tracking effect after the mechanical arm is trained. Due to k _w1 Exceeding 10 ^-1 The algorithm is difficult to converge correctly, so training and testing using the TD3 algorithm is limited to k only _w1 ∈[0,10 ^-1 ]From which we selected 5 values, respectively: k (k) _w1 ＝0,10 ^-4 ,10 ^-3 ,10 ^-2 ,10 ^-1 . In the test section, the initial state of the mechanical arm is a horizontal straightening state, and the task we give is to track the following three different paths:

1. and (3) a circle. The end of the mechanical arm is required to be capable of tracking a circular track with a circle center position (0.6,0,0) and a radius of 0.1 in a test.

2. A line segment. The robot arm end is required to be able to track the line segment trajectory with the starting point (0.55, -0.1, 0) and the ending point (0.65,0.2,0) in the test.

3. Line segment + circle. The end of the mechanical arm is required to be capable of tracking the line segment track with the starting point (0.8,0,0) and the end point (0.7,0,0) firstly, and then tracking the circle track with the circle center position (0.6,0,0) and the radius of 0.1 in the test.

FIGS. 10-11 and 12-13 show the test results k of track following after training the mechanical arm by TD3 algorithm _w1 =0 means that no item of operability is added to the prize. kw1=0 represents that the operability is not added, kw1 is not equal to 0 represents that the operability is added, then I draw an operability change curve in the motion process, and find that the operability value of the manipulator with the operability in the motion process is obviously higher than that of the manipulator without the operability, so that the manipulator is really improved in dexterity and the singular state is avoided.

Comparing the performances of the mechanical arm trained by five different rewards in three path tracking tasks, we can obtain the following three-point observations:

1. by k _w1 A reward of =0 may require a longer time step for the arm to complete a given task, and sometimes may not even complete the task, such as it may not complete task three.

2. By k _w1 The arm trained with =0 generally has a lower operability value during movement than other arms trained with the operability reward, and in particular, the final operability is half that of the other arms.

3. K in all robotic arms trained with operability rewards _w1 ＝10 ^-3 Best performing. Not only is the operability maximized, but the time steps required for the exercise process are minimized.

The conclusion from the three observations is that adding the operability rewards can enable the TD3 algorithm to better train the mechanical arm to complete the tail end track tracking task, so that the operability (representative dexterity) in the mechanical arm movement process can be improved, and the control step length of the mechanical arm movement can be shortened. k (k) _w1 ＝10 ^-3 Best performing because of k at this time _w1 /w _t+1 And d _t+1 The order of magnitude of the steps is similar, and the end approaching task and the operability increasing task can be considered in training.

The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims

1. The redundant mechanical arm operability optimization method based on deep reinforcement learning is characterized by comprising the following steps of:

wherein the operability is added to the reward function for reinforcement learning training, and the formulas (3-10) are one possible reward:

wherein R is _t+1 For rewarding, d _t+1 K is Euclidean distance between the tail end position of the mechanical arm and the target point _w1 Is an adjustable super parameter, w _t+1 Is operable;

2. The method for optimizing the operability of a redundant manipulator based on deep reinforcement learning of claim 1, wherein said fixed reset in step S1 is a horizontal straightening state of the manipulator.

3. The redundant manipulator operability optimization method based on deep reinforcement learning according to claim 1, wherein in step S3, the algorithm is allowed to converge normally by adjusting the coefficient of the "operability" term; wherein training and testing using the TD3 algorithm is limited to k only _w1 ∈[0,10 ^-1 ]。

4. The method for optimizing the operability of the redundant manipulator based on the deep reinforcement learning according to claim 1, wherein a TD3 algorithm in the reinforcement learning is used for completing a random target approaching task under a fixed reset mechanism of the redundant manipulator; the motion process of the mechanical arm can be regarded as a Markov decision process; the Markov decision process may be composed of a six-tuple

ρ ₀ γ represents->

Is state space, ++>

Is a motion space, & lt + & gt>

For rewarding space, & lt>

5. The method for optimizing the operability of a redundant manipulator based on deep reinforcement learning according to claim 1, wherein in the step S1, the manipulator is in a horizontal straightening state at the beginning of each round, and then the manipulator end reaches a randomly set target point, and is fixedly reset to the horizontal straightening state after the end of each round.

6. The method for optimizing the operability of a redundant manipulator based on deep reinforcement learning of claim 4, wherein the value ranges of the input state and the output action are symmetrically processed, and the symmetric distribution characteristics of the input state and the output action are guaranteed.

7. The method for optimizing the operability of a redundant manipulator based on deep reinforcement learning of claim 4, wherein the reward is set to be the inverse of the euclidean distance between the manipulator end position and the target point.

8. The redundant manipulator operability optimization method based on deep reinforcement learning according to claim 4, wherein the discount factor γ is valued to 0 to eliminate the interference of the next action value Q (s, a); wherein the next action value Q (S, a) is embodied in the next state S _t+1 Action A at the next moment _t+1 Is of value (c).

9. The redundant manipulator operability optimization method based on deep reinforcement learning of claim 1, wherein k is taken _w1 The value of (c) is such that k _w1 /w _t+1 And d _t+1 Is of similar order, thereby taking into account both the end-of-line task and the increased operability task in training, where k _w1 Is an adjustable super parameter d _t+1 For the euclidean distance between the end position of the mechanical arm and the target point, the subscript t represents the state variable at the time t, and the subscript t+1 represents the state variable at the time t+1.

10. A redundant robot arm control apparatus, comprising: comprising at least one memory and at least one processor;

the memory includes at least one executable program stored therein;

the executable program, when executed by the processor, implements the method of any one of claims 1 to 9.