CN115042191A

CN115042191A - Pre-training model fine-tuning training method and device, electronic equipment and storage medium

Info

Publication number: CN115042191A
Application number: CN202210965690.9A
Authority: CN
Inventors: 杨远达; 林才纺; 赵旭东; 张梦瑶
Original assignee: Ji Hua Laboratory
Current assignee: Ji Hua Laboratory
Priority date: 2022-08-12
Filing date: 2022-08-12
Publication date: 2022-09-13
Anticipated expiration: 2042-08-12
Also published as: CN115042191B

Abstract

The application belongs to the technical field of mechanical arm control and discloses a pre-training model fine-tuning training method, a device, electronic equipment and a storage medium, wherein the safety of a friction compensation model is detected by collecting motion instruction data of a mechanical arm when the mechanical arm executes a work task under the condition that the friction compensation model does not participate in a control process; if the friction compensation model is safe, acquiring a fine tuning data set for multiple times to adjust and train the friction compensation model so as to obtain a corresponding optimization model and a loss function value; updating the alternative optimal model according to the optimal model and the corresponding loss function value; carrying out actual operation test on the latest candidate optimal model, and updating the optimal model according to the test result; therefore, the fine tuning training of the pre-trained friction compensation model can be efficiently completed, the matching degree of the friction compensation model and a mechanical arm which actually applies the friction compensation model is improved, and the control precision is improved.

Description

Pre-training model fine-tuning training method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of mechanical arm control, in particular to a method and device for training a pre-training model in a fine tuning mode, electronic equipment and a storage medium.

Background

Friction force exists at each joint of the multi-axis mechanical arm, the control precision of the mechanical arm can be influenced by the friction force, and therefore a friction force compensation model can be added to control the motion of the mechanical arm, so that the control precision is improved. Before the friction compensation model is put into use, the friction compensation model needs to be trained, and at present, some mechanical arm manufacturers can pre-train a uniform friction compensation model according to mechanical arms of the same model so as to reduce the workload of training the friction compensation model by users.

However, even if the mechanical arms of the same model are different in assembly error, observation error and wear condition of joint gears, and actual working environment and working task are different, the movement conditions of all joints of the mechanical arms are different, and the friction compensation model obtained through unified training does not necessarily completely meet the actual use requirements of the mechanical arms. Therefore, a pre-training model fine tuning training method is required to perform online fine tuning training on the pre-trained friction compensation model so as to improve the matching degree between the friction compensation model and the mechanical arm actually applying the friction compensation model, thereby improving the control precision.

Disclosure of Invention

The application aims to provide a pre-training model fine-tuning training method and device, electronic equipment and a storage medium, which can efficiently finish fine-tuning training of a pre-trained friction compensation model so as to improve the matching degree of the friction compensation model and a mechanical arm which actually applies the friction compensation model, and further improve the control precision.

In a first aspect, the present application provides a pre-training model fine-tuning training method, which is used for performing online fine-tuning training on a pre-trained friction compensation model, where the friction compensation model is a friction compensation model of a mechanical arm, and includes the steps of:

A1. acquiring motion instruction data of a mechanical arm when the friction compensation model executes a work task under the condition that the friction compensation model does not participate in a control process so as to detect the safety of the friction compensation model; a2, if the friction compensation model is safe, acquiring a fine adjustment data set for multiple times to adjust and train the friction compensation model so as to obtain an optimized model and a corresponding loss function value after each adjustment and training;

A3. after the adjustment training is completed every time, updating the alternative optimal model according to the optimal model and the corresponding loss function value;

A4. and after updating the candidate optimal model each time, performing actual operation test on the latest candidate optimal model, and updating the optimal model according to the test result.

According to the pre-training model fine-tuning training method, the pre-trained friction compensation model is subjected to on-line fine-tuning training, and the optimized model with high matching degree with the mechanical arm actually applying the friction compensation model can be obtained only by a small amount of training, so that the pre-trained friction compensation model can be subjected to fine-tuning training efficiently, the matching degree of the friction compensation model and the mechanical arm actually applying the friction compensation model is improved, and the control precision is improved; in addition, safety detection is performed in the training process, and an optimal model is obtained by using an optimal model screening mechanism, so that the training process is safer, more efficient and more automatic.

Preferably, step a1 includes:

acquiring motion instruction data of a mechanical arm when the mechanical arm repeatedly executes at least one working task under the condition that the friction compensation model does not participate in the control process to obtain at least one group of motion instruction data;

respectively inputting each group of motion instruction data into the friction compensation model to obtain a compensation torque value correspondingly output by the friction compensation model;

and if at least one compensation torque value does not accord with a preset safety index, judging that the friction force compensation model is unsafe, otherwise, judging that the friction force compensation model is safe.

Generally, when a mechanical arm executes a work task under the condition that a friction compensation model does not participate in a control process, the control model can ensure that action parameters (such as a tail end position, moment of each shaft, a rotation angle and the like) when the mechanical arm moves do not exceed a safety range, if the friction compensation model is not appropriate, the action parameters when the mechanical arm moves can exceed the safety range after the friction compensation model is added into a control link, so that the mechanical arm is damaged, therefore, the mechanical arm executes the work task under the condition that the friction compensation model does not participate in the control process, collects motion instruction data meeting the input requirement of the friction compensation model when the mechanical arm moves, inputs the motion instruction data into the friction compensation model to obtain a compensation moment value output by the mechanical arm, judges whether the friction compensation model is safe or not, and only the friction compensation model is safe to perform subsequent training, avoid arousing the arm impaired, improve the security.

Preferably, step a2 includes:

and circularly acquiring a fine adjustment data set to adjust and train the friction compensation model until the times of adjusting and training reach a preset time threshold value, or until an actual operation test shows that the friction compensation effect of the latest optimal model meets a preset compensation effect index.

Thereby avoiding the overlong training time and ensuring the high efficiency of the training.

Preferably, the step of acquiring a fine tuning data set for tuning training of the friction compensation model comprises:

collecting training input data of a mechanical arm when the mechanical arm executes the working task for multiple times under the condition that a friction force compensation model does not participate in the control process to form the fine adjustment data set; the training input data comprises movement instruction data and movement feedback data;

dividing the fine tuning data set into a training set and a testing set according to a preset proportion;

and adjusting and training the friction compensation model by using the training set to obtain an optimized model and a corresponding loss function value, and testing the optimized model by using the testing set.

Preferably, step a3 includes:

if the adjustment training is only completed once at present, taking the optimization model corresponding to the adjustment training as an alternative optimal model;

and if the adjustment training is finished more than once currently, comparing the loss function value corresponding to the adjustment training and the loss function value corresponding to the current optimal model, and updating the alternative optimal model by using the optimal model corresponding to the adjustment training when the comparison result meets a first preset updating condition.

Preferably, step a4 includes:

A401. enabling the mechanical arm to execute the working task under the condition that the latest candidate optimal model participates in the control process so as to obtain a friction compensation effect evaluation index value;

A402. if the adjustment training is only completed once currently, the latest alternative optimal model is used as an optimal model;

A403. and if the adjustment training is currently completed more than once, comparing the latest friction compensation effect evaluation index value of the candidate optimal model with the current friction compensation effect evaluation index value of the optimal model, and updating the optimal model by using the latest candidate optimal model when the comparison result meets a second preset updating condition.

Preferably, before step a401, the method further comprises the steps of:

acquiring motion instruction data of a mechanical arm when the latest candidate optimal model executes the work task under the condition that the latest candidate optimal model does not participate in the control process so as to detect the safety of the latest candidate optimal model;

step a401 includes:

and if the latest candidate optimal model is safe, enabling the mechanical arm to execute the working task under the condition that the latest candidate optimal model participates in the control process so as to obtain the evaluation index value of the friction compensation effect.

In a second aspect, the present application provides a training device for training model fine tuning in advance, which is used for training the friction compensation model trained in advance on line fine tuning, the friction compensation model is the friction compensation model of the mechanical arm, and includes:

the safety detection module is used for acquiring motion instruction data of the mechanical arm when the mechanical arm executes a work task under the condition that the friction compensation model does not participate in the control process so as to detect the safety of the friction compensation model;

the training module is used for acquiring a fine tuning data set for multiple times to adjust and train the friction compensation model when the friction compensation model is safe so as to obtain an optimized model and a corresponding loss function value after each adjustment and training;

the first updating module is used for updating the alternative optimal model according to the optimal model and the corresponding loss function value after the adjustment training is completed every time;

and the second updating module is used for carrying out actual operation test on the latest candidate optimal model after updating the candidate optimal model each time, and carrying out updating processing on the optimal model according to the test result.

According to the pre-training model fine-tuning training device, the friction compensation model which is trained in advance is subjected to on-line fine-tuning training, and the optimized model which is higher in matching degree with the mechanical arm which actually applies the friction compensation model can be obtained only by a small amount of training, so that the fine-tuning training of the pre-trained friction compensation model can be efficiently completed, the matching degree of the friction compensation model and the mechanical arm which actually applies the friction compensation model is improved, and the control precision is improved; in addition, safety detection is performed in the training process, and an optimal model is obtained by using an optimal model screening mechanism, so that the training process is safer, more efficient and more automatic.

In a third aspect, the present application provides an electronic device, comprising a processor and a memory, where the memory stores a computer program executable by the processor, and the processor executes the computer program to perform the steps of the pre-training model fine tuning training method as described above.

In a fourth aspect, the present application provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps in the method for pre-training model fine tuning training as described above.

Has the advantages that:

according to the pre-training model fine-tuning training method, the pre-training model fine-tuning training device, the electronic equipment and the storage medium, the motion instruction data of the mechanical arm when the mechanical arm executes a work task under the condition that the friction compensation model does not participate in the control process is collected, so that the safety of the friction compensation model is detected; if the friction compensation model is safe, acquiring a fine adjustment data set for multiple times to adjust and train the friction compensation model so as to obtain an optimized model and a corresponding loss function value after each adjustment and training; after the adjustment training is completed every time, updating the alternative optimal model according to the optimal model and the corresponding loss function value; after updating the candidate optimal model each time, performing actual operation test on the latest candidate optimal model, and updating the optimal model according to the test result; therefore, the fine tuning training of the pre-trained friction compensation model can be efficiently completed, the matching degree of the friction compensation model and a mechanical arm which actually applies the friction compensation model is improved, and the control precision is improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application.

Drawings

Fig. 1 is a flowchart of a pre-training model fine-tuning training method according to an embodiment of the present disclosure.

Fig. 2 is a schematic structural diagram of a pre-training model fine-tuning training device according to an embodiment of the present application.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Referring to fig. 1, fig. 1 is a diagram illustrating a pre-trained model fine tuning training method in some embodiments of the present application, for performing online fine tuning training on a pre-trained friction compensation model, where the friction compensation model is a friction compensation model of a robot arm, and the method includes the following steps:

A1. acquiring motion instruction data of the mechanical arm when the mechanical arm executes a work task under the condition that the friction compensation model does not participate in the control process so as to detect the safety of the friction compensation model;

A2. if the friction compensation model is safe, acquiring the fine adjustment data set for multiple times to adjust and train the friction compensation model so as to obtain the optimized model and the corresponding loss function value after each adjustment and training;

A3. after each adjustment training is finished, updating the alternative optimal model according to the optimal model and the corresponding loss function value;

A4. after the alternative optimal model is updated every time, the actual operation test is carried out on the latest alternative optimal model, and the updating processing of the optimal model is carried out according to the test result.

The method can be applied to an industrial personal computer of the mechanical arm, so that hardware equipment required by implementing the method is operation basic equipment of the mechanical arm, and no additional expenditure is caused.

The pre-trained friction compensation model has generalization, is obtained by training according to a training data set acquired when the mechanical arm executes various different tasks, is approximately suitable for performing friction compensation on the mechanical arm, but can be used for improving the friction compensation effect of a specific mechanical arm (not only the difference of the work tasks, but also the individual difference) for executing a specific work task, and can be matched with the mechanical arm only by performing online fine-tuning training in a targeted manner. The pre-trained friction compensation model may be pre-trained by any training method, where the specific method of pre-training is not limited, and the structure and type of the friction compensation model are not the key points of the present application, and are not limited herein.

The mechanical arm is a multi-axis mechanical arm, the work task refers to a task which is executed by the mechanical arm in actual work, for one work task, the motion track of the mechanical arm is fixed or close (for example, when a user uses the mechanical arm to perform blanking stacking of workpieces, the work task of the mechanical arm is to take out the workpieces for stacking, if the position and posture of the workpieces to be taken out are fixed, the motion track of the mechanical arm is usually fixed, if the position and posture of the workpieces to be taken out randomly change within a certain deviation range, the mechanical arm needs to be matched with a vision system for positioning and grabbing, although the motion track of the mechanical arm changes, the track of each operation is close), in actual application, the work tasks of most mechanical arms are fixed and unchanged, and some mechanical arms can change the purpose in the midway, so that the work task changes (after each change, the pre-training model fine tuning training method can be executed once to perform the friction compensation model on the work task Line fine tuning training), the change frequency is not high, therefore, the work task of the mechanical arm has stability, and is invariable within a certain time, therefore, the optimal model obtained by the pre-training model fine tuning training method has high matching degree with the mechanical arm within a certain time.

It should be noted that the friction compensation model does not participate in the control process, and may refer to a case where the friction compensation model does not operate and thus does not output, or may refer to a case where the output of the friction compensation model does not participate in the control process although the friction compensation model operates. That is, the friction compensation model does not run, the friction compensation model runs but the output quantity is not added to the control process, and both the situations belong to the situation that the friction compensation model does not participate in the control process.

In some embodiments, step a1 includes:

acquiring motion instruction data of the mechanical arm when the mechanical arm repeatedly executes at least one work task under the condition that the friction compensation model does not participate in the control process to obtain at least one group of motion instruction data;

respectively inputting each group of motion instruction data into the friction force compensation model to obtain a compensation torque value correspondingly output by the friction force compensation model;

and if at least one compensation torque value does not accord with the preset safety index, judging that the friction force compensation model is unsafe, otherwise, judging that the friction force compensation model is safe.

For example, if the input data required by the friction compensation model includes at least one of torque, speed, and angle of each joint motor of the mechanical arm, the motion instruction data correspondingly includes at least one of torque instruction data, speed instruction data, and angle instruction data of each joint motor of the mechanical arm (hereinafter, the motion feedback data correspondingly includes torque feedback data, speed feedback data, and angle feedback data of each joint motor of the mechanical arm); the specific data items of the training input data are not limited herein.

The output data of the friction compensation model is typically a compensation torque corresponding to the input data.

Generally, when a mechanical arm executes a work task under the condition that a friction force compensation model does not participate in a control process, the control model can ensure that action parameters (such as a terminal position, moments of various shafts, a rotation angle and the like) of the mechanical arm during motion of the mechanical arm do not exceed a safety range, if the friction force compensation model is not appropriate, the friction force compensation model is added into a control link, the action parameters of the mechanical arm during motion possibly exceed the safety range, and the mechanical arm is damaged, so that the mechanical arm executes the work task under the condition that the friction force compensation model does not participate in the control process, motion instruction data meeting input requirements of the friction force compensation model during motion of the mechanical arm is collected to be input into the friction force compensation model to obtain a compensation moment value output by the mechanical arm, whether the friction force compensation model is safe or not is judged, and subsequent training is only if the friction force compensation model is safe, avoid arousing the arm impaired, improve the security.

In practical application, during the motion of the mechanical arm, the controller of the mechanical arm may periodically send motion instruction data (the motion instruction data is generally calculated by dynamics based on intrinsic parameters of the mechanical arm and obtained by combining control algorithms such as PID) to the mechanical arm so as to control the motion of the mechanical arm at each time, so that, when a work task is executed, the motion instruction data (sent by the controller of the mechanical arm) and corresponding motion feedback data (for example, if the motion instruction data includes a target angle of a joint motor, the corresponding motion feedback data is an actual angle reached by the joint motor executing an action according to the motion instruction data) at each time may be recorded, and a required motion instruction data item is extracted from the motion instruction data to obtain a group of motion instruction data. After a set of motion command data is input into the friction compensation model, a set of compensation torque values (including a plurality of compensation torque values, for example, if a set of motion command data includes motion command data at N times, N compensation torque values) output by the friction compensation model can be obtained. And judging that the friction force compensation model is unsafe as long as one compensation torque value in the group of compensation torque values does not accord with the preset safety index.

The number of times that the mechanical arm repeatedly executes the work task under the condition that the friction compensation model does not participate in the control process can be set according to actual needs, and the number is not limited here.

Wherein, the preset safety index can be: the compensation torque value is within a preset torque range (which can be set according to actual requirements).

In other embodiments, step a1 includes:

acquiring motion instruction data of the mechanical arm when the mechanical arm repeatedly executes at least one work task under the condition that the friction compensation model does not participate in the control process to obtain at least one group of motion instruction data (the specific process refers to the preamble);

inputting each group of motion instruction data into the friction compensation model respectively to obtain a compensation torque value (the specific process refers to the foregoing) correspondingly output by the friction compensation model;

simulating the motion of the mechanical arm according to the compensation torque value, and extracting the simulation motion parameters (such as tail end position, torque of each shaft, rotation angle and the like) of the mechanical arm in the simulation motion process;

and if at least one simulated motion parameter exceeds the corresponding safety range (determined by the performance of the mechanical arm), judging that the friction force compensation model is unsafe, otherwise, judging that the friction force compensation model is safe.

The method comprises the following steps that mechanical arm motion can be simulated by mechanical arm simulation software, so that simulated motion parameters of the mechanical arm in the motion simulation process are obtained; by the method, various action parameters of the mechanical arm when the friction force compensation model is added for control can be detected visually and conveniently, so that whether the friction force compensation model is safe or not can be judged more accurately.

In this embodiment, step a2 includes:

and circularly acquiring a fine tuning data set to adjust and train the friction compensation model until the times of adjustment training reach a preset time threshold, or until an actual operation test shows that the friction compensation effect of the latest optimal model meets a preset compensation effect index (namely, when the times of adjustment training do not reach the preset time threshold and the friction compensation effect of the latest optimal model does not meet the preset compensation effect index, circularly executing the step of acquiring the fine tuning data set to adjust and train the friction compensation model).

Detecting whether the number of times of one adjustment training reaches a preset number threshold value or not every time the adjustment training is finished, if an alternative optimal model is updated and an actual operation test is carried out after each adjustment training is finished, acquiring a friction compensation effect evaluation index value for representing a friction compensation effect, and judging whether the friction compensation effect meets a preset compensation effect index or not (the preset compensation effect index can be set according to actual needs), wherein the training can be stopped as long as the number of times of the adjustment training reaches the preset number threshold value and the friction compensation effect meets at least one of the preset compensation effect indexes; thereby avoiding the overlong training time and ensuring the high efficiency of the training.

Specifically, the step of collecting a fine tuning data set for tuning training of the friction compensation model comprises:

collecting training input data of a mechanical arm when the mechanical arm executes a working task for multiple times under the condition that a friction compensation model does not participate in a control process to form a fine adjustment data set; the training input data comprises movement instruction data and movement feedback data;

and adjusting and training the friction compensation model by using a training set to obtain an optimized model and a corresponding loss function value, and testing the optimized model by using a testing set.

A group of training input data can be obtained every time a work task is executed, each group of training input data is used as a sample, the fine adjustment data set comprises a plurality of samples, and the specific number of the samples can be set according to actual needs.

When a work task is executed once, the motion instruction data and the corresponding motion feedback data at each moment can be recorded (for example, if the motion instruction data includes a target angle of a joint motor, the corresponding motion feedback data is an actual angle reached by the joint motor executing actions according to the motion instruction data), and a required motion instruction data item and a corresponding motion feedback data item are extracted from the motion instruction data item to obtain a group of training input data.

The preset ratio may be set according to actual needs, for example, in the embodiment, the preset ratio is 8:2 (i.e., the fine tuning data set is divided into the training set and the test set according to 8: 2), but is not limited thereto.

When the adjustment training is performed, part of parameters of the friction compensation model can be frozen (namely, in the training process, the frozen parameters are kept unchanged, and only other parameters are adjusted) so as to further improve the training efficiency. The fine-tuning training process requires maintaining several (depending on the actual setup) rounds, each of which is accompanied by a loss function calculation (where the motion feedback data in the samples is typically used to perform the loss function calculation), so as to optimize the model in a gradient descent. The specific loss function may be set according to actual needs, such as Mean Square Error (MSE), cross entropy loss, Focal loss, etc., but is not limited thereto. And adjusting and training each time to obtain a model with the minimum loss function through a gradient descent method, wherein the model with the minimum loss function is an optimization model, and the value of the minimum loss function is a loss function value corresponding to the optimization model.

The friction compensation model can be trained by using a memory type neural network (such as a gate cycle control unit (GRU) recurrent neural network, a long-short term memory (LSTM) recurrent neural network and the like), and the memory type neural network is suitable for time sequence table data, has a memory type and can deal with the problem of gradient attenuation in the recurrent neural network.

In some embodiments, step a3 includes:

if the current adjustment training is only completed once, taking the optimization model corresponding to the current adjustment training as the alternative optimal model;

and if more than one adjustment training is finished currently, comparing the loss function value corresponding to the current adjustment training with the loss function value corresponding to the current optimal model, and updating the alternative optimal model by using the optimal model corresponding to the current adjustment training when the comparison result meets a first preset updating condition.

In fact, if the currently completed adjustment training is the first adjustment training, the candidate optimal model does not exist at this time, and therefore the optimization model corresponding to the current adjustment training is directly used as the candidate optimal model; if the current adjustment training is not the first adjustment training, the alternative optimal model (generally recorded in a local database) exists and the current optimal model exists, whether the comparison result meets a first preset updating condition or not is judged by comparing the loss function value of the optimal model obtained by the current adjustment training with the loss function value of the current optimal model, and if the comparison result meets the first preset updating condition, the existing alternative optimal model is replaced by the optimal model corresponding to the current adjustment training, so that the updating of the alternative optimal model is realized.

And if the comparison result does not meet the first preset updating condition, maintaining the alternative optimal model unchanged.

The first preset updating condition can be set according to actual needs. For example, in this embodiment, the first preset update condition is: the loss function value corresponding to the current adjustment training is reduced by a preset percentage threshold (which can be set according to actual needs, for example, 20%) compared with the loss function value corresponding to the current optimal model; that is, if the ratio of the reduction value of the loss function value corresponding to the current adjustment training, which is smaller than the loss function value corresponding to the current optimal model, to the loss function value corresponding to the current optimal model reaches a preset ratio threshold (for example, 20%) or more, it indicates that the comparison result satisfies the first preset update condition.

Further, step a4 includes:

A401. enabling the mechanical arm to execute a work task under the condition that the latest candidate optimal model participates in the control process so as to obtain a friction compensation effect evaluation index value;

A402. if the adjustment training is only completed once currently, the latest alternative optimal model is used as the optimal model;

A403. and if the adjustment training is finished more than once currently, comparing the latest friction compensation effect evaluation index value of the candidate optimal model with the current friction compensation effect evaluation index value of the optimal model, and updating the optimal model by using the latest candidate optimal model when the comparison result meets a second preset updating condition.

The evaluation index of the friction compensation effect may be set according to actual needs, for example, the evaluation index of the friction compensation effect is a deviation between a feedback torque and instruction data, a repeated positioning accuracy of the mechanical arm, and the like, but is not limited thereto. The friction compensation effect evaluation index value is a value of the friction compensation effect evaluation index.

If the currently completed adjustment training is the first adjustment training, directly using an optimization model (the optimization model is also the latest candidate optimization model) obtained by the current adjustment training as the optimal model; if the current adjustment training is not the first adjustment training, the optimal model already exists, so that the updated latest candidate optimal model is used for comparing the friction compensation effect evaluation index value with the current optimal model, and if the latest candidate optimal model is more optimal, the current optimal model is replaced by the latest candidate optimal model, so that the updating of the optimal model is realized.

The second preset updating condition can be set according to actual needs. For example, in this embodiment, the second preset update condition is:

P1>P2；

p1 is the frictional force compensation effect evaluation index value of the latest candidate optimum model, and P2 is the frictional force compensation effect evaluation index value of the current optimum model.

And if the comparison result does not meet the second preset updating condition, keeping the optimal model unchanged.

In some preferred embodiments, before step a401, the method further comprises the steps of:

acquiring motion instruction data of the mechanical arm when the latest candidate optimal model executes a work task under the condition that the latest candidate optimal model does not participate in the control process, so as to detect the safety of the latest candidate optimal model (specifically referring to the process for detecting the safety of the friction compensation model in the foregoing);

thus, step a401 includes:

and if the latest candidate optimal model is safe, the mechanical arm executes a working task under the condition that the latest candidate optimal model participates in the control process so as to obtain the evaluation index value of the friction compensation effect.

The latest candidate optimal model obtained after the adjustment training is directly used, which may cause the motion parameters of the mechanical arm to exceed the safety range, so that safety detection is performed before the latest candidate optimal model is added into the control process, and the latest candidate optimal model is added into the control process after the detection is passed, so as to further ensure the safety of the mechanical arm.

It should be noted that, from the second training, each training is performed on the current optimal model, so as to ensure that the friction compensation model is gradually trained in a more optimal direction.

According to the method for training the pre-training model to finely adjust, the safety of the friction compensation model is detected by collecting the motion instruction data of the mechanical arm when the mechanical arm executes the working task under the condition that the friction compensation model does not participate in the control process; if the friction compensation model is safe, acquiring the fine adjustment data set for multiple times to adjust and train the friction compensation model so as to obtain the optimized model and the corresponding loss function value after each adjustment and training; after each adjustment training is finished, updating the alternative optimal model according to the optimal model and the corresponding loss function value; after the alternative optimal model is updated every time, performing actual operation test on the latest alternative optimal model, and updating the optimal model according to a test result; therefore, the fine tuning training of the pre-trained friction compensation model can be efficiently completed, the matching degree of the friction compensation model and a mechanical arm which actually applies the friction compensation model is improved, and the control precision is improved.

Referring to fig. 2, the present application provides a pre-training model fine tuning training device, for performing online fine tuning training on a pre-trained friction compensation model, where the friction compensation model is a friction compensation model of a mechanical arm, and the device includes:

the safety detection module 1 is used for acquiring motion instruction data of the mechanical arm when the mechanical arm executes a work task under the condition that the friction compensation model does not participate in the control process so as to detect the safety of the friction compensation model;

the training module 2 is used for acquiring the fine tuning data sets for multiple times to adjust and train the friction compensation model when the friction compensation model is safe, so as to obtain the optimized model and the corresponding loss function value after each adjustment and training;

the first updating module 3 is used for updating the alternative optimal model according to the optimal model and the corresponding loss function value after each adjustment training is finished;

and the second updating module 4 is used for performing actual operation test on the latest candidate optimal model after updating the candidate optimal model each time, and performing updating processing on the optimal model according to the test result.

The method can be applied to an industrial personal computer of the mechanical arm.

The pre-trained friction compensation model has generalization, is obtained by training according to a training data set acquired when the mechanical arm executes various different tasks, is approximately suitable for performing friction compensation on the mechanical arm, but can be used for improving the friction compensation effect of a specific mechanical arm (not only the difference of the work tasks, but also the individual difference) for executing a specific work task, and can be matched with the mechanical arm only by performing online fine-tuning training in a targeted manner. The friction compensation model after pre-training may be pre-trained by any training method, where the specific method of pre-training is not limited, and the structure and type of the friction compensation model are not the key points of the present application, and the structure and type of the friction compensation model are not limited.

It should be noted that the friction compensation model does not participate in the control process, and may refer to a case where the friction compensation model does not operate and thus does not output, or may refer to a case where the output of the friction compensation model does not participate in the control process although the friction compensation model operates. That is, the friction compensation model does not run, the friction compensation model runs but the output quantity is not added to the control process, and the friction compensation model does not participate in the control process.

In some embodiments, the safety detection module 1 is configured to, when acquiring motion instruction data of the mechanical arm when performing a work task under a condition that the friction compensation model does not participate in the control process, perform:

In practical application, during the motion of the mechanical arm, the controller of the mechanical arm may periodically send motion instruction data (the motion instruction data is generally calculated by dynamics based on intrinsic parameters of the mechanical arm and obtained by combining control algorithms such as PID) to the mechanical arm so as to control the motion of the mechanical arm at each time, so that, when a work task is executed, the motion instruction data (sent by the controller of the mechanical arm) and corresponding motion feedback data (for example, if the motion instruction data includes a target angle of a joint motor, the corresponding motion feedback data is an actual angle reached by the joint motor executing an action according to the motion instruction data) at each time may be recorded, and a required motion instruction data item is extracted from the motion instruction data to obtain a group of motion instruction data. After a set of motion command data is input into the friction compensation model, a set of compensation torque values (including a plurality of compensation torque values, for example, if a set of motion command data includes motion command data at N time instants, N compensation torque values) output by the friction compensation model can be obtained. And judging that the friction force compensation model is unsafe as long as one compensation torque value in the group of compensation torque values does not accord with the preset safety index.

The preset safety index can be as follows: the compensation torque value is within a preset torque range (which can be set according to actual requirements).

In other embodiments, the safety detection module 1 is configured to, when acquiring motion instruction data of the mechanical arm when the mechanical arm performs a work task under a condition that the friction compensation model does not participate in the control process, perform:

The method comprises the following steps that mechanical arm motion can be simulated by mechanical arm simulation software, so that simulated motion parameters of the mechanical arm in the motion simulation process are obtained; by the method, various action parameters of the mechanical arm when the friction compensation model is added for control can be intuitively and conveniently detected, so that whether the friction compensation model is safe or not can be more accurately judged.

In this embodiment, when the friction compensation model is safe, the training module 2 collects the fine tuning data set for multiple times to adjust and train the friction compensation model, so as to obtain the optimization model and the corresponding loss function value after each adjustment training, and specifically includes:

Specifically, the training module 2, when acquiring the fine tuning data set to perform the adjustment training on the friction compensation model, performs:

When the adjustment training is performed, part of parameters of the friction compensation model can be frozen (namely, in the training process, the frozen parameters are kept unchanged, and only other parameters are adjusted) so as to further improve the training efficiency. The fine-tuning training process requires maintaining several (depending on the actual setup) rounds, each of which is accompanied by a loss function calculation (where the motion feedback data in the sample is typically used to perform the loss function calculation), so that the gradient descent optimizes the model. The specific loss function may be set according to actual needs, such as Mean Square Error (MSE), cross entropy loss, Focal loss, etc., but is not limited thereto. And adjusting and training each time to obtain a model with the minimum loss function through a gradient descent method, wherein the model with the minimum loss function is an optimization model, and the value of the minimum loss function is a loss function value corresponding to the optimization model.

The friction compensation model can be trained by using a memory type neural network (such as a gate cycle control unit (GRU) cyclic neural network, a long-short term memory (LSTM) cyclic neural network and the like), and the memory type neural network is suitable for time sequence table data, has a memory type and can deal with the problem of gradient attenuation in the cyclic neural network.

In some embodiments, after each adjustment training, the first updating module 3 performs, according to the optimization model and the corresponding loss function value, an update process of the candidate optimal model, which specifically includes:

if the adjustment training is only completed once currently, taking the optimization model corresponding to the adjustment training as the alternative optimal model;

Further, after updating the candidate optimal model each time, the second updating module 4 performs an actual operation test on the latest candidate optimal model, and performs an updating process on the optimal model according to a test result, which specifically includes:

enabling the mechanical arm to execute a work task under the condition that the latest candidate optimal model participates in the control process so as to obtain a friction compensation effect evaluation index value;

if the current adjustment training is finished only once, the latest alternative optimal model is used as the optimal model;

and if the adjustment training is finished more than once currently, comparing the latest friction compensation effect evaluation index value of the candidate optimal model with the current friction compensation effect evaluation index value of the optimal model, and updating the optimal model by using the latest candidate optimal model when the comparison result meets a second preset updating condition.

P1>P2；

In some preferred embodiments, the second updating module 4 further performs, before causing the robot arm to execute the work task under the condition that the latest optimal candidate model participates in the control process to obtain the evaluation index value of the friction compensation effect:

therefore, when the mechanical arm executes a work task under the condition that the latest candidate optimal model participates in the control process to obtain the friction compensation effect evaluation index value, the second updating module 4 executes:

Therefore, the pre-training model fine-tuning training device detects the safety of the friction compensation model by acquiring the motion instruction data of the mechanical arm when the mechanical arm executes the working task under the condition that the friction compensation model does not participate in the control process; if the friction compensation model is safe, acquiring the fine tuning data set for multiple times to adjust and train the friction compensation model so as to obtain an optimized model after each adjustment and training and a corresponding loss function value; after each adjustment training is finished, updating the alternative optimal model according to the optimal model and the corresponding loss function value; after updating the candidate optimal model each time, performing actual operation test on the latest candidate optimal model, and updating the optimal model according to the test result; therefore, the fine tuning training of the pre-trained friction compensation model can be efficiently completed, the matching degree of the friction compensation model and a mechanical arm which actually applies the friction compensation model is improved, and the control precision is improved.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, where the present disclosure provides an electronic device, including: the processor 301 and the memory 302, the processor 301 and the memory 302 being interconnected and communicating with each other via a communication bus 303 and/or other form of connection mechanism (not shown), the memory 302 storing a computer program executable by the processor 301, the processor 301 executing the computer program when the electronic device is running to perform the pre-training model fine-tuning training method in any of the alternative implementations of the above-mentioned embodiments to implement the following functions: acquiring motion instruction data of the mechanical arm when the mechanical arm executes a work task under the condition that the friction compensation model does not participate in the control process so as to detect the safety of the friction compensation model; if the friction compensation model is safe, acquiring the fine adjustment data set for multiple times to adjust and train the friction compensation model so as to obtain the optimized model and the corresponding loss function value after each adjustment and training; after each adjustment training is finished, updating the alternative optimal model according to the optimal model and the corresponding loss function value; after the alternative optimal model is updated every time, the actual operation test is carried out on the latest alternative optimal model, and the updating processing of the optimal model is carried out according to the test result.

The embodiment of the present application provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for training the pre-training model in any optional implementation manner of the foregoing embodiment is executed, so as to implement the following functions: acquiring motion instruction data of the mechanical arm when the mechanical arm executes a work task under the condition that the friction compensation model does not participate in the control process so as to detect the safety of the friction compensation model; if the friction compensation model is safe, acquiring the fine adjustment data set for multiple times to adjust and train the friction compensation model so as to obtain the optimized model and the corresponding loss function value after each adjustment and training; after each adjustment training is finished, updating the alternative optimal model according to the optimal model and the corresponding loss function value; after the alternative optimal model is updated every time, the actual operation test is carried out on the latest alternative optimal model, and the updating processing of the optimal model is carried out according to the test result. The storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units into only one type of logical function may be implemented in other ways, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A pre-training model fine-tuning training method is used for performing online fine-tuning training on a pre-trained friction compensation model, wherein the friction compensation model is a friction compensation model of a mechanical arm, and is characterized by comprising the following steps:

A1. acquiring motion instruction data of a mechanical arm when the friction compensation model executes a work task under the condition that the friction compensation model does not participate in a control process so as to detect the safety of the friction compensation model;

A2. if the friction compensation model is safe, acquiring a fine adjustment data set for multiple times to adjust and train the friction compensation model so as to obtain an optimized model and a corresponding loss function value after each adjustment and training;

A4. and after the candidate optimal model is updated every time, performing actual operation test on the latest candidate optimal model, and updating the optimal model according to a test result.

2. The method for pre-training model fine-tuning training of claim 1, wherein step a1 comprises:

3. The method for pre-training model fine-tuning training of claim 1, wherein step a2 comprises:

4. The pre-trained model fine tuning training method of claim 1, wherein the step of collecting a fine tuning data set for fine tuning training of the friction compensation model comprises:

acquiring training input data of a mechanical arm when the mechanical arm executes the working task for multiple times under the condition that the friction compensation model does not participate in the control process, and forming a fine adjustment data set; the training input data comprises movement instruction data and movement feedback data;

5. The method for pre-training model fine-tuning training of claim 1, wherein step a3 comprises:

6. The pre-trained model fine-tuning training method of claim 5, wherein the step A4 comprises:

A402. if the adjustment training is only completed once currently, the latest candidate optimal model is used as an optimal model;

7. The method for fine tuning training of a pre-trained model according to claim 6, wherein before step A401, the method further comprises the steps of:

acquiring training input data of a mechanical arm when the mechanical arm executes the work task under the condition that the friction compensation model does not participate in the control process so as to detect the safety of the latest candidate optimal model;

step a401 includes:

and if the latest candidate optimal model is safe, the mechanical arm executes the working task under the condition that the latest candidate optimal model participates in the control process so as to obtain the evaluation index value of the friction compensation effect.

8. The utility model provides a training model fine setting trainer in advance for carry out online fine setting training to the frictional force compensation model through training in advance, frictional force compensation model is the frictional force compensation model of arm, its characterized in that includes:

the safety detection module is used for acquiring motion instruction data of the mechanical arm when the mechanical arm executes a work task under the condition that the friction force compensation model does not participate in the control process so as to detect the safety of the friction force compensation model;

9. An electronic device comprising a processor and a memory, the memory storing a computer program executable by the processor, the processor executing the computer program to perform the steps of the pre-training model fine tuning training method according to any one of claims 1-7.

10. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the pre-trained model fine tuning training method according to any one of claims 1-7.