WO2021132281A1

WO2021132281A1 - Training data selection device, robot system, and training data selection method

Info

Publication number: WO2021132281A1
Application number: PCT/JP2020/048038
Authority: WO
Inventors: 武司山本; 仁志蓮沼; 一輝倉島
Original assignee: 川崎重工業株式会社
Priority date: 2019-12-27
Filing date: 2020-12-22
Publication date: 2021-07-01
Also published as: US20230045162A1; JP7460366B2; CN115087521A; JP2021107970A; EP4082726A4; EP4082726A1

Abstract

In the present invention, a draining data selection device comprises a data evaluation model, a data evaluation unit, a storage unit, and a training data selection unit. The data evaluation model is built by performing machine learning on at least a part of collected data, or performing machine learning on data different to the collected data. The data evaluation unit evaluates the inputted collected data using the data evaluation model. The storage unit stores evaluated data, which is collected data evaluated by the data evaluation unit. The training data selection unit selects, from the evaluated data stored by the storage unit, training data for building a learning model, the training data being selected according to an instruction from a worker to whom the result of the evaluation performed by the data evaluation unit has been presented, or being selected automatically on the basis of the result of the evaluation.

Description

Training data sorting device, robot system and training data sorting method

The present invention relates to selection of training data for training a learning model.

Conventionally, a system that controls robot movements, etc. using machine learning that iteratively learns from collected data, automatically finds rules and rules, and realizes functions similar to the learning ability that humans naturally perform. It has been known. Patent Document 1 discloses this type of system.

The motion prediction system of Patent Document 1 has a configuration in which a motion prediction model is constructed by machine learning a motion prediction model corresponding to each group in which learning motion case data (collected data) is classified into a plurality of groups. ing.

Japanese Unexamined Patent Publication No. 2018-20286

In a system that controls the movement of a robot by using machine learning as in Patent Document 1, generally, the data collected for learning includes movement data intended by an operator and movement data not intended by the operator. include. The learning model learns the motion data in the collected data in the same way regardless of whether or not it is in line with the worker's intention.

In the collected data, the operation data intended by the worker is generally considerably larger than the unintended operation data. Therefore, it is expected that the robot will perform the movement intended by the worker as the learning is repeated.

However, since the machine learning model also performs machine learning for inappropriate motion data that the worker does not intend, it takes time for the learning to converge and it takes a lot of time to output the result intended by the worker. I needed it. In addition, whether or not the robot has acquired the motion intended by the worker by learning has to be determined by visually confirming the motion after learning of the robot. Even after learning for a long time, if the robot does not perform the intended movement and it is unlikely that the movement will improve in the future, it may be necessary to start over from the data collection stage, which takes time and man-hours. There was a lot of waste.

The present invention has been made in view of the above circumstances, and an object of the present invention is training that can reduce the time and man-hours required for trial and error of machine learning and shorten the learning time until a result according to an intention can be output. The purpose is to provide a training data sorting device capable of providing data.

The problem to be solved by the present invention is as described above, and next, the means for solving this problem and its effect will be described.

According to the first aspect of the present invention, a training data sorting device having the following configuration is provided. That is, this training data sorting device sorts training data for constructing a learning model by machine learning from the collected data collected by the data collecting device. This training data sorting device includes a data evaluation model, a data evaluation unit, a storage unit, and a training data sorting unit. The data evaluation model is constructed by machine learning on at least a part of the collected data or by machine learning on data different from the collected data. The data evaluation unit evaluates the input collected data using the data evaluation model. The storage unit stores evaluated data, which is collected data evaluated by the data evaluation unit. From the evaluated data stored by the storage unit, the training data selection unit obtains training data for constructing the learning model according to the instruction of the operator presented with the evaluation result of the data evaluation unit, or. , Automatically select based on the evaluation result.

According to the second aspect of the present invention, a training data selection method having the following configuration is provided. That is, this training data selection method selects training data for constructing a learning model by machine learning from the collected data collected by the data acquisition device. This training data sorting method performs a process including a data evaluation step, a storage step, and a training data sorting step. In the data evaluation process, data is input using a data evaluation model constructed by machine learning on at least a part of the collected data or by machine learning on data different from the collected data. Evaluate the collected data. In the storage step, the evaluated data, which is the collected data evaluated in the data evaluation step, is stored. In the training data selection step, the training data for constructing the learning model is obtained from the evaluated data stored in the storage step by the instruction of the operator whose evaluation result in the data evaluation step is presented, or the said. Sort automatically based on the evaluation result.

Thereby, by selecting the training data from the collected data using the evaluation result by the data evaluation model, it is possible to easily prepare the training data consisting of preferable data for machine learning. As a result, the time required to build the learning model can be shortened.

According to the present invention, it is possible to reduce the time and man-hours required for trial and error of machine learning, and to provide training data that can shorten the learning time until a good result can be output.

The block diagram which shows the relationship between the training data sorting apparatus which concerns on one Embodiment of this invention, a robot system, and a learning apparatus. The figure which shows the flow of an example of work performed by a robot, and each work state. Explanatory drawing which shows evaluation for operation information. The figure which shows an example of the data displayed on the display device. The figure which shows the example which the presented data is sorted by an operator. The figure which shows the example which can improve the robustness of the data selected from the collected data of a working state. Explanatory drawing which shows one of the effects of a training data sorting apparatus.

Next, an embodiment of the present invention will be described with reference to the drawings. First, referring to FIG. 1, a robot system 1 using a learning model constructed by machine learning the data selected by the training data sorting device 2 of the present embodiment, and a learning device 3 for constructing this learning model. Will be briefly explained. FIG. 1 is a block diagram showing a relationship between the training data sorting device 2 according to the present embodiment, the robot system 1, and the learning device 3.

The robot system 1 is a system for causing the robot 11 to perform work. As this work, various things such as welding, assembly, processing, handling, painting, cleaning, and polishing can be considered.

As shown in FIG. 1, the robot system (controlled machine) 1 includes a robot control device 10, a robot 11, an operation device 12, and a data collection device 13. Each device is connected to each other via a wired or wireless network, and can exchange signals (data).

The robot control device 10 is composed of a known computer, and includes an arithmetic processing unit such as a microcontroller, CPU, MPU, PLC, DSP, ASIC, or FPGA, a robot storage unit such as ROM, RAM, and HDD, and an external device. It is equipped with a communication unit capable of communicating with. The robot storage unit stores a control application or the like for controlling the arm unit or the like.

The robot control device 10 can switch the operation mode of the robot 11 between a manual operation mode, an automatic operation mode, and an autonomous operation mode.

In the manual operation mode, the operator manually operates the operation device 12 described later to operate the robot 11.

In the automatic operation mode, the robot 11 operates following a preset operation trajectory. This automatic operation mode is used when the same operation is repeated, such as the movement of an end effector described later attached to the tip of the arm portion of the robot 11. As the movement of the end effector, for example, a movement from a preset initial position to a position where autonomous operation is started in the autonomous operation mode can be considered.

In the autonomous operation mode, the robot 11 automatically operates based on the result of learning the operation of the robot 11 by manual operation in advance. In the robot system 1 of the present embodiment, in the autonomous operation mode, the operation of the robot 11 is controlled by using a learning model constructed by machine learning the training data selected by the training data selection device 2 described later. ..

The robot 11 is configured as, for example, a vertical articulated robot having 6 degrees of freedom of movement. The robot 11 includes an arm portion attached to a pedestal. The arm portion has a plurality of joints. Each joint is provided with an actuator (for example, an electric motor) shown in the drawing for driving the arm portion around the joint. An end effector according to the work content is attached to the tip of the arm portion.

The arm portion and end effector of the robot 11 operate based on the operation command for operating the robot 11. This operation command includes, for example, a linear velocity command, an angular velocity command, and the like.

The robot 11 is equipped with a sensor for detecting the operation of the robot 11 and the surrounding environment. In the present embodiment, the motion sensor 11a, the force sensor 11b, and the camera 11c are attached to the robot 11.

The motion sensor 11a is composed of, for example, an encoder and is provided for each joint of the arm portion of the robot 11 to detect the rotation angle or the angular velocity of each joint.

The force sensor 11b detects the force applied to each joint of the arm portion of the robot 11 or the end effector attached to the tip of the arm portion when the robot 11 operates. The force sensor 11b may be configured to detect moments in place of or in addition to force.

The camera 11c detects the image of the work to be worked on (the progress of the work on the work). In place of or in addition to the camera 11c, a sound sensor for detecting sound and / or a vibration sensor for detecting vibration may be provided in order to detect the progress of work on the work. Further, the robot 11 or the like may be provided with a sensor for collecting distance information such as a laser scan sensor or an infrared scan sensor.

The data detected by the motion sensor 11a is motion data indicating the motion of the robot 11, and the data detected by the force sensor 11b and the camera 11c is ambient environment data indicating the state of the environment around the robot 11. The ambient environment data is a state value indicating the state of progress of the work of the robot 11 at the time when the sensor detects the data. The data detected by the motion sensor 11a, the force sensor 11b, and the camera 11c is collected as state information by the data collecting device 13 described later.

The operating device 12 is a member operated by an operator to operate the robot 11. The operating device 12 is different depending on the work content, but is, for example, a lever operated by the operator or a pedal operated by the foot. The operation device 12 is configured as, for example, a remote control device arranged at a location physically separated from the robot 11.

The operating device 12 is provided with an operating force detection sensor 12a. The operating force detection sensor 12a detects an operating force, which is a force applied by the operator to the operating device 12. When the operating device 12 is configured to be movable in various directions, the operating force may be a value including the direction and magnitude of the force, for example, a vector. Further, the operating force may be a value such as acceleration linked to the force as well as the force applied by the operator.

In the present embodiment, the operating force detected by the operating force detection sensor 12a includes, for example, as shown in FIG. 3, the force and velocity components (force x and velocity x) on the x-axis in the coordinate system of the robot 11. Includes force and velocity components (force y and velocity y) on the y-axis. The data related to the operating force detected by the operating force detection sensor 12a is collected by the data collecting device 13 as operating information.

The data collection device 13 is composed of, for example, a known computer, and includes an arithmetic processing unit such as a microcontroller, CPU, MPU, PLC, DSP, ASIC or FPGA, a robot storage unit such as ROM, RAM, and HDD. It includes a communication unit capable of communicating with an external device. A data collection application or the like that collects various types of data is stored in the storage unit. The data collection device 13 may be provided separately from the robot control device 10, or may be integrally configured with the robot control device 10. When the data collection device 13 and the robot control device 10 are integrally configured, the robot control device 10 functions as the data collection device 13 by the cooperation of the hardware and software included in the robot control device 10.

As described above, the collected data collected by the data collecting device 13 includes state information indicating the surrounding environment data of the robot 11, operation information reflecting the operating force of the operator corresponding to the surrounding environment data of the robot 11, and operation information. including. In other words, this collected data is a series of state information and operation information obtained when the operator continuously operates the operation device 12 to perform the work (or a part of the work) on the robot 11. It is time series data. That is, the data collecting device 13 collects each of the state information and each of the operation information in relation to the time. The state information and the operation information include measurement values based on the detection values obtained by the camera 11c, the operation force detection sensor 12a, and the like.

The learning device 3 is composed of at least one known computer. The computer constituting the learning device 3 includes, for example, a GPU, a ROM, a RAM, an HDD, and the like. An application for machine learning is stored in the HDD or the like.

The learning device 3 constructs a learning model used in the robot system 1 by machine learning (for example, supervised learning). The learning device 3 constructs a learning model by machine learning the training data selected from the collected data collected by the data collecting device 13 by the training data sorting device 2.

This training data includes, for example, at least the surrounding environment data (that is, state information) that reflects the working state of the robot 11 and the operating force (that is, operating information) associated with the surrounding environment data.

This learning model is, for example, a neural network having a general configuration having an input layer, a hidden layer, and an output layer. In each layer, a plurality of units simulating brain cells are arranged. The hidden layer is provided between the input layer and the output layer, and is composed of an appropriate number of intermediate units. The sensor information (training data) input to the learning device 3 flows in the order of the input layer, the hidden layer, and the output layer. The number of hidden layers is determined as appropriate. The format of the learning model is arbitrary, not limited to this.

In this model, the data input to the input layer is the sensor information that reflects the above-mentioned ambient environment data. The data output by the output layer is the estimation result of the detection value of the operating force detection sensor 12a. This essentially means an estimated operator maneuverability. Therefore, the data output by the output layer indicates the operator's operation estimated by the model.

Each input unit and each intermediate unit are connected by a path through which information flows, and each intermediate unit and each output unit are connected by a path through which information flows. In each route, the influence (weight) of the information of the upstream unit on the information of the downstream unit is set.

In the learning phase of the model, the learning device 3 inputs sensor information to the model and compares the operating force output from the model with the operating force by the operator. The learning device 3 updates the model by updating the weights by, for example, an error backpropagation method, which is a known algorithm, so that the error obtained by this comparison becomes small. Since the training model is not limited to the neural network, the model update is not limited to the backpropagation method. For example, the model can be updated by a known algorithm, SOM (Self-organizing map). Learning is realized by continuously performing such processing.

The learning model constructed by machine learning the training data in the learning device 3 is mounted on the robot control device 10 of the robot system 1, and is used for autonomous driving of the robot 11, for example. The learning model implemented in the robot control device 10 operates in the inference phase, estimates and outputs the operating force of the operator corresponding to the input ambient environment data with respect to the input ambient data.

Subsequently, the training data sorting device 2 of the present embodiment and the sorting of training data from the collected data by the training data sorting device 2 will be described in detail with reference to FIGS. 2 to 7 and the like.

As shown in FIG. 1, the training data sorting device 2 includes a data evaluation model 20, a data evaluation unit 21, a storage unit 22, a presentation device (evaluation presentation unit) 23, and an input device (instruction reception unit) 24. , A training data selection unit 25, and so on.

The training data sorting device 2 includes, for example, an arithmetic processing unit such as a microcontroller, CPU, MPU, PLC, DSP, ASIC or FPGA, a memory such as ROM, RAM, HDD, and a communication unit capable of communicating with an external device. Has a known computer comprising.

The HDD or the like of the computer constitutes the storage unit 22 of the training data sorting device 2. The storage unit 22 stores a program executed by the arithmetic processing unit, evaluated data described later, and the like. By the cooperation of the hardware and software, the computer can function as a data evaluation unit 21 and a training data selection unit 25. The storage unit 22 performs the processing included in the storage process.

The data evaluation model 20 has the same configuration as the above-mentioned learning model, and is constructed by machine learning on at least a part of the collected data collected by the data collection device 13. However, the data evaluation model 20 is not limited to this, and may be constructed by, for example, machine learning the operation history data of another robot system 1. When the data evaluation model 20 is constructed by machine learning the operation history data of another robot system 1, the robot 11 included in the other robot system 1 corresponds to the controlled target machine.

The collected data machine-learned by the data evaluation model 20 is classified into a plurality of groups by using, for example, a known clustering method such as an NN method, a K-means method, or a self-organizing map. Clustering is a method of learning the law of distribution from a large number of data and automatically acquiring a plurality of clusters, which are a group of data having similar characteristics to each other. The number of clusters to which the collected data is classified can be determined as appropriate. The collected data may be classified by using an automatic classification method other than clustering.

In the present embodiment, for example, the collected data related to a series of operations collected by the data collection device 13 is classified for each operator's operation (reference operation) corresponding to the work state. Specifically, as shown in FIG. 2, when the robot 11 is to perform a series of operations for inserting the work 100 into the recess 110, it can be classified into four work states, for example, air, contact, insertion, and completion. it can.

Working state A (in the air) is a state in which the robot 11 holds the work 100 and positions it above the recess 110. The working state B (contact) is a state in which the work 100 held by the robot 11 is in contact with the surface on which the recess 110 is formed. The working state C (insertion) is a state in which the work 100 held by the robot 11 is inserted into the recess 110. The working state D (completed) is a state in which the work 100 held by the robot 11 is completely inserted into the recess 110.

In this way, the four work states are a series of work by the robot 11 classified for each process, and when the work of the robot 11 proceeds correctly, the work state A (air), the work state B (contact), and the work The working state changes in the order of state C (insertion) and working state D (completed).

The data evaluation model 20 is constructed, for example, by machine learning the combination of the working state and the operating force in each predetermined time range. The above working states A, B, C, and D are typical, and in reality, there may be many different working states. Temporarily, the robot 11 is made to perform the same work several times by the operation of the operator, and for example, the work state A1 corresponding to one set of state information and operating force and the work corresponding to another set of state information and operating force are performed. It is assumed that the state A2 and the working state A3 corresponding to yet another set of state information and operating force are collected. These working states A1, A2, and A3 are different from each other in detail because there are variations in the operation of the operator, variations in the situation, and the like. However, since the working states A1, A2, and A3 have common characteristics, they are classified into the same cluster (cluster in the working state A).

However, the data evaluation model 20 is not limited to this, and the data evaluation model 20 includes, for example, a certain work state, the next work state associated with the work state (that is, the work state to be transitioned to next), and at least one set of state information. The operation force associated with this state information may be constructed by machine learning. As a result, it is possible to learn the ranking relationship between working states (and by extension, corresponding operating forces).

As described above, the data evaluation model 20 of the present embodiment performs machine learning so as to reflect the time order of the output of the operating force. Simply put, the data evaluation model 20 learns at least one set of state information and operating force combinations corresponding to each of the work state A, the work state B, the work state C, and the work state D, and also works. The work order in which the work state B appears next to the state A is also learned. As a result, the data evaluation model 20 can be used to perform classification that reflects the time-series information of the operating force. That is, each of the operating forces associated with each work state can be reflected in the work order.

As described above, this state information is sensor information detected by the motion sensor 11a, the force sensor 11b, and the camera 11c (for example, the working state of position, speed, force, moment, image, etc.). This state information may include information calculated based on the sensor information (for example, a value indicating a change over time of the sensor information from the past to the present).

The data evaluation model 20 constructed as described above can estimate and output the reference operation corresponding to the state information with respect to the state information associated with the input time series information.

When the collected data including the state information and the operation information associated with the time series information is input, the data evaluation model 20 of the present embodiment estimates and outputs the reference operation corresponding to the input state information. , The distance value between the input operation information and the estimated estimation reference operation is obtained, and the distance value (similarity) is output as an evaluation value. Instead of the estimation reference operation, for example, information on the cluster to which the reference operation belongs may be output. Further, the comparison between the output estimation reference operation and the input operation information may be performed by the data evaluation unit 21 instead of the data evaluation model 20.

The data evaluation unit 21 is used to evaluate the collected data collected by the data collection device 13 by using the data evaluation model 20 constructed in advance as described above. As shown in FIG. 3, the data evaluation unit 21 evaluates the operation information in each predetermined time range. Specifically, when the evaluation value output by the data evaluation model 20 is equal to or greater than a predetermined threshold value for the collected data, the data evaluation unit 21 provides information on the cluster to which the reference operation output by the data evaluation model 20 belongs. A label (correspondence information) indicating the above is attached to the collected data. On the other hand, when the evaluation value output by the data evaluation model 20 is less than a predetermined threshold value, the data evaluation unit 21 does not give a label to the collected data. However, the data evaluation unit 21 may give a label indicating that it does not belong to any cluster instead of giving a label. In the following, whether or not a label is attached and / or the type of the attached label and the like may be referred to as "label information". That is, the data evaluation unit 21 performs the processing included in the data evaluation process.

For example, when evaluating a series of collected data collected for a series of operations shown in FIG. 2, the data evaluation model 20 has a force x, which is included in the operation information for each predetermined time range, as shown in FIG. By obtaining the similarity of the components included in each reference operation for the force y, the velocity x, and the velocity y, the overall similarity between the operation information for each predetermined time range and each reference operation is obtained. And output as an evaluation value.

The data evaluation unit 21 targets the operation information (and by extension, the collected data) in which the evaluation value output by the data evaluation model 20 is equal to or higher than the predetermined threshold value with respect to the operation information for each predetermined time range, and the reference for the operation information is similar. Give a label indicating the operation.

The following will be explained in detail. As shown in FIG. 3, when the operation information in the predetermined time range is similar to the reference operation corresponding to the work state A, the data evaluation unit 21 assigns (assigns) a label of the numerical value (1) to the operation information. ). When the operation information in the predetermined time range is similar to the reference operation corresponding to the work state B, the data evaluation unit 21 assigns the label of the numerical value (2) to the operation information. When the operation information in the predetermined time range is similar to the reference operation corresponding to the work state C, the data evaluation unit 21 assigns the label of the numerical value (3) to the operation information. When the operation information in the predetermined time range is similar to the reference operation corresponding to the work state D, the label of the numerical value (4) is assigned to the operation information. As described above, the continuous change in the operation information (and by extension, the detection value of the operation force detection sensor 12a) can be regarded as the change in the label information.

In the following description, the data to which the numerical label is assigned may be referred to as "provisional selection target data", and the data to which the numerical label is not assigned may be referred to as "provisional selection exclusion data". The collected data evaluated by the data evaluation unit 21 is referred to as "evaluated data". This evaluated data includes one or both of the provisional selection target data and the provisional selection exclusion data.

In this way, labels are assigned to each predetermined time range in the information contained in the collected data. By grouping each part of the collected data according to the assigned label, the collected data can be treated as a block corresponding to each reference operation as shown in FIG. This makes it easy to extract only the data (block) indicating the effective part of the operation from the collected data for a series of operations.

That is, in the evaluated data, the part where the time series information is continuous and the label information is the same is treated as one group block. As a result, generally speaking, as in the series of collected data shown in FIGS. 4 and 5, for example, the data string is assigned a block (range) to which a numerical label is assigned and a numerical label. Blocks (ranges) that have not been displayed can be represented as being arranged in an order according to the time-series information.

The collected data is evaluated while maintaining the time series information. Therefore, the collected data of a series of operations including the operation information similar to the plurality of reference operations is collected in the predetermined work order of the series of operations (for example, 1 (A) → 2 (B) → 3 (C) shown in FIG. 3). → It can be easily distinguished depending on whether or not it has 4 (D)). That is, even if the set of a plurality of reference operations that are similar to each other is the same in the two collected data, if the work order corresponding to the operation information is different, the two collected data can be treated as different clusters.

As described above, the data evaluation unit 21 transmits the operation information of the collected data to which the label information is assigned (including the data portion to which the numerical label is not assigned) to the presentation device 23. This operation information corresponds to the evaluated operation information.

The presentation device 23 shown in FIG. 1 is a dot matrix type display such as a liquid crystal or an organic EL. The presenting device 23 presents the evaluation result of the data evaluation unit 21 to the operator by displaying the evaluated data evaluated by the data evaluation unit 21, the label information of the collected data, and the like. The presenting device 23 is arranged, for example, in the vicinity of the operating device 12. The presentation device 23 can also display a video signal, information about the work performed by the robot system 1, and the like.

Specifically, for example, as shown in FIG. 4, the presentation device 23 visually displays the operation information (for example, the operation force) included in the collected data in the form of a graph, and also displays the time series information. Is continuous and the data part to which the label of the same numerical value is assigned is displayed as one block. As a result, the operator can check the operation information more intuitively.

Further, the presentation device 23 may highlight the data portion by adding a “?” Mark to the data portion to which the numerical label is not assigned. Although not shown in FIG. 4, in the presentation device 23, labels having different numerical values and / or data portions to which the labels are assigned may be displayed in different colors. In the presentation device 23, the operation information (for example, the operation force) included in the collected data may be displayed in the form of a graph or the like so that the operation information (for example, the operation force) of the reference operation can be compared with the operation information (for example, the operation force).

The input device 24 receives an instruction from the operator regarding whether or not to adopt the evaluated operation information presented by the presentation device 23 as training data. The input device 24 is composed of a key, a mouse, a touch panel, etc., which are illustrated and can be operated by an operator. The training data sorting device 2 adds information regarding acceptance / rejection as training data to the evaluated operation information in the form of, for example, a flag, in response to the input of the operator to the input device 24. The provisional selection target data or the provisional selection exclusion data to which the acceptance / rejection information from the worker is given is stored in the storage unit 22 as the selected data.

In the training data sorting device 2 of the present embodiment, immediately after the operator operates the operating device 12 to cause the robot 11 to perform a series of operations, the training data sorting device 2 immediately presents the evaluation result for the collected data of the series of operations 23. Can be displayed on.

Therefore, in the present embodiment, the worker decides whether or not to use the operation performed immediately before as training data for machine learning while the feeling of operation remains in himself / herself at the operation site. It can be instructed by the input device 24.

When the operator operates the operating device 12, there may be cases where he / she is not satisfied with his / her operation, such as that he / she should have operated stronger / weaker or that he / she should have started the operation at an earlier / later timing. In this case, the worker refuses to adopt the collected data as training data, and the operation can be repeated until he / she is satisfied. As described above, in the present embodiment, it is possible to flexibly and efficiently rotate the cycle consisting of collecting data and deciding whether or not to use it as training data. Therefore, it is possible to obtain abundant training data that is easy for the operator to understand in a short period of time.

Data evaluation by the data evaluation unit 21 can be completed automatically in a short time after data collection by using the data evaluation model 20 whose construction has been completed by machine learning. Therefore, the worker can assist in the presentation of the evaluation result even when the acceptance / rejection is decided in almost real time as described above.

From the above, the collected data used in the training data can be limited to the one intended by the worker. In other words, inappropriate collected data can be ruled out before the collected data is provided for the training phase of the learning model. By selecting the collected data at an early stage, it is possible to reduce the cases where the undesired collected data is machine-learned. As a result, it is possible to shorten the learning time until a learning model that can obtain the intended output can be constructed.

It should be noted that the presentation of the evaluation result and the instruction of acceptance / rejection are not limited to being given in real time and on the spot. The training data sorting device 2 may, for example, present the evaluation results of a plurality of collected data corresponding to the operations of the worker performed within a predetermined period to the worker in a summarized form at another place.

In the training data sorting device 2 of the present embodiment, the operator uses the input device 24 to select the collected data (evaluated data) collected for a series of operations in units of collected data for training. It is possible to instruct acceptance or rejection as data. However, the worker can also select only a part of the collected data and instruct the acceptance / rejection as the training data.

For example, the five operation information (a) to (e) are shown on the upper side of FIG. 5, and each of them corresponds to the evaluated data. As shown in the lower left of FIG. 5, the operator can instruct to select the operation information (a), (b), and (d) from the five operation information and adopt them as training data. ..

The operation information (b) and (d) shown in FIG. 5 include a block of provisional selection exclusion data. However, according to the instruction of the operator, the data can be selected as valid data for a series of operations.

Alternatively, as shown in the lower right of FIG. 5, the operator selects a data block (for example, a part corresponding to a certain reference operation) included in each operation information (a) to (e) as a unit, and train data. Can be instructed to adopt as.

For example, as shown in FIG. 6, when the work 100 held by the robot 11 is brought into contact with the surface on which the recess 110 is formed in the working state B, the lower left portion of the work 100 comes into contact with the surface first. There are cases where the lower right part comes into contact first. Which of the lower left and lower right of the work 100 comes into contact with the above-mentioned surface first may be determined for this working state B, but it is the respective data when viewed from the detection value detected by the sensor. Therefore, in the data evaluation model 20, there is a possibility that it will be treated as a different operation.

For example, the operation information (a) shown in FIG. 5 is the case where the lower left portion of the work 100 is in contact with the surface first, and the operation information (c) is the case where the lower right portion of the work 100 is in contact with the surface first. Suppose there is. In this case, in the evaluation of the data evaluation unit 21, for example, in the operation information (a), the corresponding data block is labeled with the numerical value (2), and in the operation information (c), the corresponding data block is not labeled. Can be considered.

In this regard, the training data sorting device 2 of the present embodiment selects a data block without a label included in the operation information (c) shown in FIG. 5 by the operator operating the input device 24. For example, it can be instructed that the data block is an operation corresponding to the label of the numerical value (2). Thereby, in the same working state, the training data sorting device 2 can recognize the variation of the collected data which is an effective operation without omission. For example, in the work state B shown in FIG. 6, the operator selects both the operation information in each case where the work 100 contacts the surface on which the recess 110 is formed from different directions as training data. The training data sorting device 2 can be instructed. Therefore, the robustness of the training data selected by the training data selection device 2 can be improved.

As described above, the training data sorting device 2 of the present embodiment may present a large amount of collected data to the operator so that it can be selected with mechanical evaluation information (label) added. it can. As a result, the operator can efficiently select appropriate data and use it as training data for machine learning.

Next, label assignment, which means a new operation, will be described.

There may be a need to newly learn the movements that were not intended to be made to the robot 11 due to changes in the environment or the like. In this case, the operator operates the operating device 12 to cause the robot 11 to perform a series of operations including the operation. The state information and operation information at this time are acquired as collected data by the data collecting device 13. In the following, a case where the new operation part of the collected data corresponds to the data block between the two blocks labeled (1) in the operation information (c) shown in FIG. 5 will be considered. Since this is a new operation, the data block is not labeled by the data evaluation unit 21.

When this operation information (c) is presented in the presentation device 23, the operator operates the input device 24 to instruct that the unlabeled data block is selected and trained as a new reference operation. .. As a result, the training data sorting device 2 assigns the corresponding provisional selection exclusion data a label of a numerical value (for example, 5) that is not used in the provisional selection target data. As a result, the label of the numerical value (5) is additionally added to the data of the block. Further, the operator can instruct the data block to which the new label is attached to be adopted as training data by the input device 24.

In this case, when the training model learns the training data, the data labeled with the numerical value (2) and the data labeled with the numerical value (5) are operated on the same working state B. It can be treated as information.

Specifically, consider a certain collected data (label order 1 → 2 → 3 → 4) and a certain collected data (label order 1 → 5 → 3 → 4) for a series of operations shown in FIG. The two collected data have similar state information and similar work order. Therefore, in the learning model, both the data labeled with the numerical value (2) and the data labeled with the numerical value (5) can be easily classified into the clusters corresponding to the working state B. ..

The training data selection unit 25 is used to select training data for constructing a learning model used in the robot system 1 from the selected data stored in the storage unit 22. Training data is sorted in various ways according to the purpose. For example, when it is desired to train the training model for a series of operations shown in FIG. 2, the training data selection unit 25 uses labels (1) to (4) from the selected data instructed to be adopted as training data. Selects the assigned data and outputs it as training data. Further, for example, when it is desired to additionally learn the training model regarding the reference operation for the work state C, the training data selection unit 25 assigns the label of the numerical value (3) from the selected data instructed to be adopted as the training data. The block of the obtained data part is extracted and output as training data.

That is, as shown in FIG. 7, the training data sorting device 2 of the present embodiment can sort only the training data to be trained by the learning model from a plurality of collected data for a plurality of types of work. As described above, the training data sorting unit 25 performs the processing included in the training data sorting step.

This makes it possible to efficiently select training data for constructing learned data. In addition, it is possible to prevent unfavorable data from being selected as training data. As a result, it is possible to shorten the time until the learning model outputs as intended by the operator.

As described above, the training data sorting device 2 of the present embodiment sorts training data for constructing a learning model by machine learning from the collected data collected by the data collecting device. The training data sorting device 2 includes a data evaluation model 20, a data evaluation unit 21, a storage unit 22, and a training data sorting unit 25. The data evaluation model 20 is constructed by machine learning on at least a part of the collected data or by machine learning on data different from the collected data. The data evaluation unit 21 evaluates the input collected data using the data evaluation model 20. The storage unit 22 stores the evaluated data, which is the collected data evaluated by the data evaluation unit 21. The training data selection unit 25 selects training data for constructing a learning model from the evaluated data stored by the storage unit 22 according to the instruction of the worker presented with the evaluation result of the data evaluation unit 21.

Thereby, by selecting the training data from the collected data using the data evaluation model 20, it is possible to easily prepare the training data consisting of preferable data for machine learning. As a result, the time required to build the learning model can be shortened.

Further, the training data sorting device 2 of the present embodiment includes a presentation device 23 and an input device 24. The presenting device 23 presents the evaluation result of the data evaluation unit 21 to the operator. The input device 24 receives an operator's instruction regarding whether or not to select the evaluated data as training data. The training data selection unit 25 selects training data for constructing a learning model based on the instructions input to the input device 24.

As a result, the training data is selected based on the instructions of a human being (preferably the operator himself / herself), so that the training data can be made into a more appropriate collection of data. Further, by referring to the evaluation result by the data evaluation model 20, the operator can easily determine whether or not the collected data should be used as training data.

Further, in the training data sorting device 2 of the present embodiment, the collected data includes time series information of measured values based on the detected values detected by at least one of the sensors mounted on the robot system 1. Using the data evaluation model 20, the data evaluation unit 21 evaluates the collected data for each partial time-series information that is time-series information corresponding to a part of the time-series information of the detected values.

As a result, the collected data is evaluated for each appropriate unit, so that it becomes easy to grasp a series of operations as if the basic operations are arranged in an appropriate order. By using this evaluation result, the selection of training data becomes more accurate. Further, by using the part corresponding to the basic operation as the unit for selecting the training data, machine learning can be performed while efficiently using the collected data.

Further, in the training data sorting device 2 of the present embodiment, the data evaluation model 20 corresponds to each of a plurality of reference operations which are subdivided operations of the operator when partial time series information is input. It is constructed to output the evaluation value. The data evaluation unit 21 is the best evaluation value when the best evaluation value among the evaluation values output by the data evaluation model 20 for each of the plurality of reference operations when the partial time series information is input is better than the threshold value. A label indicating that the partial time series information is evaluated to correspond to the reference operation of is attached to the collected data, and the collected data is stored in the storage unit as evaluated data.

Thereby, the data given a good evaluation by the data evaluation unit 21 can be easily distinguished.

Further, in the training data sorting device 2 of the present embodiment, the evaluated data is presented in a form with a label as an evaluation result for sorting by the operator of the evaluated data.

As a result, the worker can easily confirm the data given a good evaluation by the data evaluation unit 21.

Further, in the training data sorting device 2 of the present embodiment, the evaluated data is sorted by the operator so that the evaluated data can be distinguished for each range of the partial time series information to which the label as the evaluation result is given. Presented for.

This allows the operator to easily confirm which part of the time-series information indicating a series of operations has a good evaluation.

Further, in the training data sorting device 2 of the present embodiment, the evaluated data includes a range in which the best evaluation value among the evaluation values output by the data evaluation model 20 for each of the plurality of reference operations is not better than the threshold value. , Presented for worker selection of evaluated data.

This allows the operator to confirm the range that has not been given a good evaluation. Therefore, for example, it can be used as a clue for the operator to verify which part of his / her series of operations is not good.

Further, in the training data sorting device 2 of the present embodiment, the operator specifies a range in which the best evaluation value among the evaluation values output by the data evaluation model 20 for each of the plurality of reference operations is not better than the threshold value. , It is possible to add a label indicating an operation that is not included in a plurality of reference operations.

This makes it possible to introduce new standard operations and select training data.

Further, in the training data sorting device 2 of the present embodiment, the evaluated data is visually presented for sorting by the operator of the evaluated data in the form of displaying the value detected by the sensor or the information based on the sensor in a graph. To.

This makes it easier for workers to check the evaluated data.

Further, in the training data sorting device 2 of the present embodiment, the training data sorting unit 25 can sort the training data for each range of the partial time-series information to which the label as the evaluation result is given among the evaluated data. Is.

As a result, a part of the evaluated data can be easily extracted and selected as training data, so that machine learning can be performed while efficiently using the collected data.

Although the preferred embodiment of the present invention has been described above, the above configuration can be changed as follows, for example.

The presentation device 23 is not limited to the visual display, and for example, the presentation device 23 can be used for the operation information by presenting the auditory sense with different sound effects depending on the quality of the evaluation value or presenting the force sense as feedback such as vibration to the operation device 12. The evaluation can also be presented to the worker.

By operating the input device 24, the operator inputs instruction information regarding the quality of each of the collected data stored in the storage unit 22 into the history of the collected data evaluated by the data evaluation unit 21. Is also good.

In the above-described embodiment, the worker determines whether or not to adopt the evaluated data as training data with the support of presenting the evaluation result by the data evaluation unit 21. However, the acceptance or rejection of the evaluated data as training data may be automatically determined by a program (including the case of using artificial intelligence) instead of the worker. In this case, the presenting device 23 to be presented to the operator and the input device 24 for inputting the operator's instruction can be omitted.

In the learning device 3, after constructing a learning model by machine learning the training data selected by the training data sorting device 2, this learning model can be used as the data evaluation model 20.

The robot 11 may be configured not only as an industrial robot but also as a medical robot or the like.

The training data selection device 2 selects not only training data for constructing a learning model for controlling a robot, but also training data for constructing a learning model for automatic operation of a vehicle and automatic operation of a plant. You may.

The data collecting device 13 may be provided in the training data sorting device 2 instead of the robot system 1.

The data evaluation model 20 of the training data sorting device 2 evaluates the collected data. However, the data evaluation model 20 may be used to evaluate the output of the learning model constructed by machine learning the training data selected by the training data selection device 2.

For example, when collecting data, the information obtained by the robot 11 side may be presented in real time to the operator who performs remote control. The information presented to the operator in this way can be a target to be collected by the data collection device 13.

As an example of presenting information to the operator, when collecting data, the robot control device 10 drives the robot 11 in response to the operation of the operator's operation device 12, while transmitting the reaction force received by the robot 11 from the surroundings. As described above, it is conceivable to drive the operating device 12. As a result, an interactive operation is realized, and the operator can remotely control the robot 11 in real time by using the operation device 12 while feeling the force sense pseudo-presented through the operation device 12.

As another example of presenting information to the operator, when collecting data, the image of the camera 11c included in the robot system 1 may be displayed in real time on an appropriate display arranged near the operator.

1 Robot system (machine to be controlled)
2 Training data sorting device 20 Data evaluation model 21 Data evaluation unit 22 Storage unit 23 Presentation device (evaluation presentation unit)
24 Input device (instruction reception unit)
25 Training Data Sorting Department

Claims

It is a training data sorting device that sorts training data for building a learning model by machine learning from the collected data collected by the data collecting device.
A data evaluation model constructed by machine learning on at least a part of the collected data or by machine learning on data different from the collected data.
Using the data evaluation model, a data evaluation unit that evaluates the input collected data, and
A storage unit that stores evaluated data, which is collected data evaluated by the data evaluation unit, and a storage unit.
From the evaluated data stored by the storage unit, training data for constructing the learning model can be obtained by the instruction of the operator who presented the evaluation result of the data evaluation unit, or based on the evaluation result. Training data sorting department that automatically sorts,
A training data sorting device characterized by being equipped with.
The training data sorting device according to claim 1.
An evaluation presentation unit that presents the evaluation result of the data evaluation unit to the worker, and an evaluation presentation unit.
An instruction receiving unit that receives an instruction from a worker regarding whether or not to select the evaluated data as the training data.
With
The training data sorting unit is a training data sorting device that sorts training data for constructing the learning model based on an instruction input to the instruction receiving unit.
The training data sorting device according to claim 1 or 2.
The collected data includes time series information of measured values based on detected values obtained by at least one of the sensors mounted on the controlled machine.
Using the data evaluation model, the data evaluation unit evaluates the collected data for each partial time-series information that is time-series information corresponding to a part of the time-series information of the detected value. A featured training data sorting device.
The training data sorting device according to claim 3.
The data evaluation model is constructed so as to output an evaluation value corresponding to each of a plurality of reference operations, which are subdivided operations of the operator, when the partial time series information is input.
The data evaluation unit is the most when the best evaluation value among the evaluation values output by the data evaluation model for each of the plurality of reference operations when the partial time series information is input is better than the threshold value. It is characterized in that the collected data is provided with correspondence information indicating that the partial time series information is evaluated to correspond to the reference operation of the good evaluation value, and is stored in the storage unit as the evaluated data. Training data sorting device.
The training data sorting device according to claim 4.
The evaluated data is presented for selection by an operator of the evaluated data in a form with the corresponding information as the evaluation result, or for automatic selection of the evaluated data. A training data sorting device characterized in that it is used.
The training data sorting device according to claim 5.
The evaluated data is presented for selection by the operator of the evaluated data so that the corresponding information as the evaluation result can be distinguished for each range of the partial time series information. A featured training data sorting device.
The training data sorting device according to claim 6.
The evaluated data is obtained by the operator of the evaluated data, including the range in which the best evaluation value among the evaluation values output by the data evaluation model for each of the plurality of reference operations is not better than the threshold value. A training data sorting device, characterized in that it is presented for sorting.
The training data sorting device according to claim 7.
An operation that is not included in the plurality of reference operations by the operator designating a range in which the best evaluation value among the evaluation values output by the data evaluation model for each of the plurality of reference operations is not better than the threshold value. A training data sorting device characterized in that it is configured to be able to give corresponding information indicating.
The training data sorting device according to any one of claims 6 to 8.
The evaluated data is a training data sorting apparatus, characterized in that the evaluated data is visually presented for sorting by an operator of the evaluated data in a form in which the value detected by the sensor or the information based on the value is represented by a graph.
The training data sorting device according to any one of claims 6 to 9.
The training data selection unit is capable of selecting training data for each range of the partial time series information to which the corresponding information as the evaluation result is given among the evaluated data. Data sorting device.
A learning model constructed by machine learning using the training data sorted by the training data sorting device according to any one of claims 1 to 10.
A robot that performs work based on the output of the learning model,
A robot system characterized by being equipped with.
It is a training data selection method that selects training data for constructing a learning model by machine learning from the collected data collected by a data acquisition device.
Using the data evaluation model constructed by machine learning on at least a part of the collected data or by machine learning on data different from the collected data, the input collected data is input. Data evaluation process to be evaluated and
A storage process for storing evaluated data, which is collected data evaluated in the data evaluation process, and a storage process for storing the evaluated data.
From the evaluated data stored in the storage process, training data for constructing the learning model is automatically obtained by the instruction of the operator who presented the evaluation result in the data evaluation process or based on the evaluation result. Training data selection process and
A training data selection method characterized by performing a process including.
The training data selection method according to claim 12.
The data evaluation step is characterized in that it is possible to evaluate the data collected by the training model constructed by machine learning the training data selected in the training data selection step. Sorting method.