WO2018216493A1

WO2018216493A1 - Learning apparatus, learning control method, and program therefor

Info

Publication number: WO2018216493A1
Application number: PCT/JP2018/018142
Authority: WO
Inventors: Tanichi Ando
Original assignee: Omron Corporation
Priority date: 2017-05-26
Filing date: 2018-05-10
Publication date: 2018-11-29
Also published as: JP2018200537A; JP6863081B2

Abstract

In order to provide a technique for shortening the time required for a learning apparatus to achieve a learning purpose, without performing manual manipulation, a learning apparatus configured to learn control of an operation involved in a predetermined task includes: a learning data accepting unit configured to accept learning data containing a learning purpose; a neural network configured to perform learning based on the learning data; and an output unit configured to output a learning result obtained by the neural network, wherein the neural network performs a first learning process for achieving an initial stage of the learning purpose, performs a second learning process for learning control with which an operation involved in the learning is made non-continuable, based on a result of the first learning process, and performs a third learning process for achieving the learning purpose, with the control with which an operation involved in the learning is made non-continuable being excluded, based on a result of the second learning process.

Description

LEARNING APPARATUS, LEARNING CONTROL METHOD, AND PROGRAM THEREFOR

The present invention relates to a learning apparatus, a learning control method, and a program therefor.
(CROSS-REFERENCES TO RELATED APPLICATIONS)
This application claims priority to Japanese Patent Application No. 2017-104523 filed May 26, 2017, the entire contents of which are incorporated herein by reference.

Conventionally, artificial intelligence technology (hereinafter, referred to as “AI technology”) such as neural networks has been widely researched (see JP H6-289918A, for example). In particular, with the rise of AI technology called deep learning, for example, the recognition rates of techniques for recognizing targets based on images have improved rapidly in these years, and the recognition rate of classifying images is almost higher than that of humans. Deep learning technology is expected to be applied not only to image recognition but also to various other fields such as speech recognition, personal authentication, behavior prediction, summary writing, machine translation, monitoring, autonomous driving, failure prediction, sensor data analysis, music genre determination, content generation, and security systems.

In machine learning such as deep learning, machines can be trained to attain a predetermined ability. At this time, learning apparatuses that perform machine learning repeatedly perform learning operations until they attain a predetermined ability.

For example, JP H6-289918A discloses a learning control method for robots. In the learning control method described in JP H6-289918A, an input value that is to be supplied to a driving unit of a robot is modified based on a difference between a target motion targeted in a robot operation set in advance by a person and an actual motion when the robot actually operates.

JP H6-289918A

In learning apparatuses that control actuators based on information from many sensors, such as control of an engine and driving of an automobile, chemical plants, or the like, control and sensor information output affect each other, and thus more complex learning has to be performed in order to attain a control method. Accordingly, in learning apparatuses that perform such complex learning, it is not easy for a person to set a target value of the control amount in advance as in JP H6-289918A. On the other hand, in the case of training learning apparatuses without setting a target value, it is necessary to repeat trial and error processing for an extremely large number of times, which is inefficient.

It is an object of the present invention to shorten the time required for a learning apparatus to achieve a learning purpose, without performing manual manipulation.

An aspect of the present invention is directed to a learning apparatus configured to learn control of an operation involved in a predetermined task, the apparatus including: a learning data accepting unit configured to accept learning data containing a learning purpose; a neural network configured to perform learning based on the learning data; and an output unit configured to output a learning result obtained by the neural network, wherein the neural network performs a first learning process for achieving an initial stage of the learning purpose, performs a second learning process for learning control with which an operation involved in the learning is made non-continuable, based on a result of the first learning process, and performs a third learning process for achieving the learning purpose, with the control with which an operation involved in the learning is made non-continuable being excluded, based on a result of the second learning process.

With this configuration, learning of control with which the operation involved in the learning is made non-continuable is performed before a third learning process for achieving a learning purpose is performed. Accordingly, the apparatus itself can perform learning, with the control with which the operation is made non-continuable being excluded, without a person setting conditions that restrict control operations. Thus, it is possible to achieve the learning purpose in a shorter period of time.

It is possible that the output unit outputs the result of the second learning process. With this aspect, a learning result of the control with which the operation is made non-continuable can be used by other learning apparatuses as well.

It is possible that the learning apparatus is a learning apparatus configured to learn control of a series of operations involved in a predetermined task, the learning apparatus further includes a classifying unit configured to divide the task into a plurality of scenes, and, for each of the divided scenes, specify a partial operation that is to be performed in the scene, from among the series of operations, and the neural network performs the second learning process and the third learning process for each partial operation.

With this aspect, the learning apparatus can classify an operation involved in the learning into partial operations, which is classifying an operation into smaller units, according to the scenes, and perform learning for each classified partial operation. Accordingly, it is possible to achieve the learning purpose in a shorter period of time.

An aspect of the present invention is directed to an autonomous driving control learning apparatus configured to learn control of a series of operations involved in autonomous driving of a vehicle that does laps of a predetermined course, the apparatus including: a learning data accepting unit configured to accept learning data containing a learning purpose in which a purpose is to do a predetermined number of laps of the course within a predetermined period of time; a neural network configured to perform learning based on the learning data; and an output unit configured to output a learning result obtained by the neural network, wherein the neural network performs a first learning process for achieving running of one lap of the course by the vehicle, performs a second learning process for learning control with which an operation involved in the learning is made non-continuable, based on a result of the first learning process, and performs a third learning process for achieving the learning purpose, with the control with which an operation involved in the learning is made non-continuable being excluded, based on a result of the second learning process.

An aspect of the present invention is directed to a robot control learning apparatus configured to learn control of a series of operations involved in a task of holding a predetermined workpiece and stacking the workpiece at a placement position according to a shape of the workpiece, the apparatus including: a learning data accepting unit configured to accept learning data containing a learning purpose in which a purpose is to stack a predetermined number of the workpieces at the placement position within a predetermined period of time; a neural network configured to perform learning based on the learning data; and an output unit configured to output a learning result obtained by the neural network, wherein the neural network performs a first learning process for achieving stacking of one workpiece at the placement position, performs a second learning process for learning control with which an operation involved in the learning is made non-continuable, based on a result of the first learning process, and performs a third learning process for achieving the learning purpose, with the control with which an operation involved in the learning is made non-continuable being excluded, based on a result of the second learning process.

An aspect of the present invention is directed to a learning method for learning control of an operation involved in a predetermined task, the method including: accepting, by a computer, learning data containing a learning purpose; performing, by the computer, learning based on the learning data; and outputting, by the computer, a learning result obtained in the learning, wherein performing learning includes performing, by the computer, a first learning process for achieving an initial stage of the learning purpose, performing, by the computer, a second learning process for learning control with which an operation involved in the learning is made non-continuable, based on a result of the first learning process, and performing, by the computer, a third learning process for achieving the learning purpose, with the control with which an operation involved in the learning is made non-continuable being excluded, based on a result of the second learning process.

An aspect of the present invention is directed to a program for causing a computer configured to learn control of an operation involved in a predetermined task to execute: a accepting learning data containing a learning purpose; performing learning based on the learning data; and outputting a learning result obtained in the learning, wherein performing learning includes performing a first learning process for achieving an initial stage of the learning purpose, performing a second learning process for learning control with which an operation involved in the learning is made non-continuable, based on a result of the first learning process, and performing a third learning process for achieving the learning purpose, with the control with which an operation involved in the learning is made non-continuable being excluded, based on a result of the second learning process.

An aspect of the present invention is directed to an apparatus configured to perform a predetermined task, the apparatus comprising: a first sensor configured to sense information necessary for an operation of the apparatus to perform the task; an actuator; a second sensor configured to sense a change in a status of the apparatus caused by the actuator; a control unit configured to control the actuator based on sensor values output from the first sensor and the second sensor; and a storage unit configured to store a learning result obtained by the above-described learning apparatus, wherein the control unit determines a control amount that corresponds to the sensor values output from the first sensor and the second sensor, based on the learning result stored in the storage unit.

According to the present invention, it is possible to provide a technique for shortening the time required for a learning apparatus to achieve a learning purpose, without performing manual manipulation.

FIG. 1 is a block diagram showing a schematic configuration of a learning apparatus in a first embodiment. FIG. 2 is a schematic diagram showing a course for autonomous driving of a vehicle that is controlled by the learning apparatus in the first embodiment. FIG. 3 is a flowchart showing the outline of the processing performed by the learning apparatus in the first embodiment. FIG. 4 is a block diagram showing a detailed configuration of the learning apparatus in the first embodiment. FIG. 5 is a flowchart showing details of the processing performed by the learning apparatus in the first embodiment. FIG. 6 is a flowchart showing details of the processing performed by the learning apparatus in the first embodiment. FIG. 7 is a flowchart showing details of the processing performed by the learning apparatus in the first embodiment. FIG. 8 is a flowchart showing details of the processing performed by the learning apparatus in the first embodiment. FIG. 9 is a diagram showing an example of the hardware configuration of the learning apparatus in the first embodiment. FIG. 10 is a block diagram showing a schematic configuration of a learning apparatus in a second embodiment.

First Embodiment
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that the same constituent elements are denoted by the same reference numerals, and a description thereof may not be repeated. The following embodiments are merely illustrative for the sake of explanation, and are not intended to limit the present invention thereto. Various modifications can be made without departing from the gist thereof.

1. System Outline
Hereinafter, the outline of the system in this embodiment will be described with reference to FIGS. 1 to 3. FIG. 1 is a block diagram showing a schematic configuration of a learning apparatus 1 according to this embodiment. The learning apparatus 1 learns a predetermined task. For example, the learning apparatus 1 according to this embodiment is mounted in a vehicle controlled to drive autonomously (hereinafter, alternatively referred to simply as a “vehicle”) 90, and learns control of the vehicle 90 for autonomous driving along a predetermined course (see FIG. 2). At this time, the learning apparatus 1 is provided with learning data, for example, from an operator or the like. The learning data is data containing, for example, a learning purpose and learning requirements as follows.

Learning Purpose
- Do ten laps of a course within predetermined period of time and reach goal
Learning Requirements
- Do not slide off track
- Running direction is clockwise
- Reach goal
- “Do one lap of course and reach goal” in initial stage level

Note that the task is a matter that is required to be achieved through an operation involved in the learning (“operation involved in the learning” in this embodiment refers to various types of control necessary for autonomous driving of the vehicle 90, and may be considered as being an operation that is performed by the vehicle 90 through the various types of control), and refers to doing laps of a course in this embodiment. Furthermore, the learning purpose is a standard that is to be achieved by the task, and refers to “do ten laps of a course within predetermined period of time and reach goal” as described above in this embodiment. Thus, in this embodiment, it is considered that making the task performable is provided as a learning requirement, in learning in the initial stage level.

Furthermore, in the description below, the learning apparatus 1 is described as an apparatus including a computer such as a PC (personal computer) or a server apparatus, but there is no limitation to this, and it may be realized by, for example, any built-in device that has a processor, a RAM, and a ROM. Furthermore, the configuration of elements installed in the apparatuses is not limited to those realized by software. The apparatuses may have a configuration realized by hardware. For example, a later-described neural network 22 may be configured by an electronic circuit such as a custom LSI (large-scale integration) or an FPGA (field-programmable gate array).

As shown in FIG. 1, the learning apparatus 1 includes a control unit 10, a machine learning unit 20, an operation classifying unit 30, and a storage unit 40.

In the vehicle 90, the control unit 10 is connected to a control sensor 91, an actuator 92, and a status detection sensor 93 provided outside the learning apparatus 1. The control unit 10 controls the actuator 92 in response to output from the control sensor 91 and the status detection sensor 93, thereby performing autonomous driving of the vehicle 90.

The control sensor 91 is a sensor group for controlling the autonomous driving of the vehicle 90. For example, the control sensor 91 is configured by a sensor for sensing obstacles outside the vehicle or a sensor for detecting the state of the road surface such as a vehicle-mounted camera or a laser, for example. Meanwhile, the status detection sensor 93 is a sensor group for detecting the control status of the vehicle 90 in an autonomous driving state. For example, the status detection sensor 93 is configured by a vibration sensor, a noise sensor, a fuel consumption sensor, a vehicle speed sensor, an acceleration sensor, a yaw rate sensor, or the like.

The actuator 92 is controlled by the control unit 10 for autonomous driving of the vehicle 90. The actuator 92 is configured by, for example, an accelerator actuator, a brake actuator, a steering actuator, or the like. The accelerator actuator controls the vehicle driving force by controlling the throttle opening degree in response to a control signal from the control unit 10. The brake actuator controls the braking force on vehicle wheels by controlling the operation amount of the brake pedal in response to a control signal from the control unit 10. The steering actuator performs control of vehicle steering action by controlling driving of a steering assistance motor of an electronic power steering system in response to a control signal from the control unit 10.

Next, the procedure in which the learning apparatus 1 performs learning will be roughly described with reference to FIG. 3. The processing in each step will be described later in detail. FIG. 3 is a flowchart showing the outline of the processing flow when the learning apparatus 1 performs learning. First, as a learning initial stage (S1), learning is performed for the purpose of making the task performable (i.e., making an operation that satisfies the learning requirements in the initial stage performable). The learning apparatus 1 in this embodiment is provided with “do one lap of course and reach goal” as a learning requirement in the initial stage.

After the purpose in the initial stage level has been cleared, operation classification (S2) is performed. In this stage, the content of the learning performed in the learning initial stage S1 is analyzed, so that the task is divided into a plurality of portions based on predetermined parameter (hereinafter, a portion obtained by dividing a task is alternatively referred to as a “scene”), and, in each divided scene, an operation that is to be performed in that scene (hereinafter, alternatively referred to as a “partial operation”) is specified from among the series of operations involved in the task. The predetermined parameter for dividing the task is, for example, a displacement amount in the operation involved in the learning of the task, or the environment in which the operation involved in the learning of the task is performed (the time elapsed from when the task is started, the position from where the task is started, etc.). In this embodiment, the position from where the task is started (the environment in which the operation involved in the learning of the task is performed) is used as the predetermined parameter. That is to say, in this embodiment, the learning apparatus 1 divides the task into scenes based on the position in a course, and classifies the series of operations involved in the learning into scenes based on an operation performed in units of a course corresponding to each divided scene. The efficiency of learning can be increased by performing the learning in units of partial operations classified according to the scenes. In this embodiment, increasing the efficiency of learning may refer to, for example, shortening the time required from when the learning is started to when the learning purpose is achieved.

In the next step after the operations have been classified, learning of control with which the learning is made non-continuable (S3) is performed for each classified partial operation. The state in which the learning is made non-continuable refers to a state in which the task is made non-continuable. For example, if the learning by the learning apparatus 1 is control of a predetermined apparatus, the state in which the learning is made non-continuable refers to a case in which an operation of a predetermined apparatus targeted for control has stopped, or a case in which the predetermined apparatus has broken down and is not operating. In this embodiment, the state in which the learning is made non-continuable refers to, for example, a state in which the vehicle has slid off the track, in which the vehicle has crashed against a wall or the like and is not moving, in which the vehicle has broken down, or the like. If the learning of control with which the learning is made non-continuable is performed in advance, learning can be performed without making the learning non-continuable when performing learning of optimal control in subsequent steps. Accordingly, the efficiency of the learning can be further increased.

In a learning final stage (S4), the learning is optimized. In this stage, in a state where partial operations classified into scenes and learned are combined, learning for optimally performing the operation from start to end is performed. In this embodiment, learning through doing ten laps of a course within a predetermined period of time and reaching a goal is performed as the learning in the final stage.

2. Detailed Processing
Next, the processing of the learning apparatus 1 in each step will be described in detail with reference to FIGS. 4 to 8. FIG. 4 is a block diagram showing a detailed configuration of the learning apparatus 1 according to this embodiment. As shown in FIG. 4, the machine learning unit 20 includes a learning data input/output unit 21, a neural network 22, and a learning result output unit 23. The operation classifying unit 30 includes a control data extracting unit 31 and an operation classifying result extracting unit 32.
Hereinafter, the processing of each unit will be described in detail in each step of FIG. 3.

2-1. Learning Initial Stage
FIG. 5 is a flowchart showing details of the processing flow in the learning initial stage in step S1 shown in FIG. 3. First, in the learning initial stage (first learning process), the learning data input/output unit 21 accepts learning data (S101). The learning data is data containing, for example, a learning purpose and learning requirements, as described above.

In the next step (S102), machine learning is performed. In this embodiment, conditions that restrict individual control operations are not specified in advance, and thus learning of the control operation is performed by the learning apparatus 1 itself. Specifically, the control unit 10 operates the actuator 92 by setting a random control amount thereto. At this time, it is natural that the vehicle 90 cannot drive along the course, and thus the vehicle 90 drives in a haphazard way while sliding off the track, for example. The control unit 10 reads output (hereinafter, alternatively referred to as “sensor values”) from the control sensor 91 and the status detection sensor 93 for the control amount given at random, and stores the data (the control amount and the sensor values) in the storage unit 40. The neural network 22 refers to the storage unit 40 and reads the stored control amount and sensor values, and performs learning of a control operation that matches the learning requirements through deep learning (S102).

In the learning requirements, “do one lap of course and reach goal” is set as the purpose in the initial stage level. Accordingly, in the learning apparatus 1, for example, when it is determined based on output from the control sensor 91 that the vehicle has done one lap of the course and reached the goal, the machine learning is judged to have reached the initial stage level (S103: Y), and the learning in the initial stage is ended.

2-2. Operation Classification
FIG. 6 is a flowchart showing details of the processing flow in the operation classification in step S2 shown in FIG. 3. First, when performing the operation classification processing, the control data extracting unit 31 extracts the sensor value of the control sensor 91 when the learning initial stage ends and the control amount of the actuator 92 and the sensor value of the status detection sensor 93 corresponding thereto, from the storage unit 40 (S201). The control data extracting unit 31 inputs the extracted values to the neural network 22 as learning data.

Next, the neural network 22 performs machine learning based on the learning data input by the control data extracting unit 31 (S202). At this time, the neural network 22 divides the course-running operation into a predetermined number of divided scenes.

Hereinafter, the processing in which the neural network 22 classifies the course-running operation into scenes will be described in more detail. The neural network 22 classifies the course-running operation into scenes based on scene vectors and operation vectors. The scene vectors express a scene of the task that is performed by the vehicle 90. The scene vectors are acquired, for example, from sensor values (e.g., a position (or distance) from the start point, and a direction from the start point) that are output by the control sensor 91. For example, assuming the xy coordinates taking the start point as the origin, the scene vector at a point l can be expressed as (l_x, _y).

Meanwhile, the operation vectors express the control status of the driving vehicle 90. The operation vectors are acquired, for example, from sensor values (e.g., velocity, acceleration, angular velocity, angular acceleration, etc.) that are output by the status detection sensor 93. For example, the operation vector at the point l can be expressed as (v_l, a_l) using a velocity v and an acceleration a at the point l.

The neural network 22 divides the task into scenes based on the scene vector (l_{x, y}), and learns, for each divided scene, operation classification that is to be learned in that scene based on the operation vector (v_l, a_l). Accordingly, the learning apparatus 1 can learn a partial operation that corresponds to a scene, by judging in which scene the learning apparatus 1 is currently present. For example, the neural network 22 focuses on the position that is expressed by the scene vector as well as the point at which the operation vector changes, thereby finding acceleration, deceleration, change of direction, and the like of the operation of the vehicle 90, so that the series of operations can be classified into operations corresponding to the scenes based on the change point. Also, for example, the neural network 22 can learn the operation classification based on the similarity levels of the operation vectors.

In the example of the course shown in FIG. 2, the task is divided into scenes corresponding to five courses (a) to (e). The partial operations classified into scenes are, for example, as follows.
Scene (a): first straight partial operation (e.g., control of deceleration timing, driving position, and the like when reaching the subsequent first corner)
Scene (b): first corner partial operation (e.g., control of steering at a corner, acceleration timing when entering the second straight, and the like)
Scene (c): second straight partial operation (e.g., control of deceleration timing, driving position, and the like when reaching the subsequent second corner)
Scene (d): second corner partial operation (e.g., control of steering at a corner, acceleration timing, when entering the third straight and the like)
Scene (e): third straight partial operation (e.g., control of acceleration and the like when entering the first straight)

Note that the neural network 22 can preferably sort the divided scenes in the time-series order.

The operation classifying result extracting unit 32 extracts the partial operation classification learned by the neural network 22, and stores it in the storage unit 40 (S203).

2-3. Learning of Control With Which Learning Is Made Non-Continuable
FIG. 7 is a flowchart showing details of the processing flow in the learning of control with which the learning is made non-continuable (second learning process) in step S3 shown in FIG. 3. First, the learning data input/output unit 21 selects a partial operation from among the partial operations classified in the processing in step S2, referring to the storage unit 40, and extracts the control amount for the actuator 92 necessary for this partial operation. Furthermore, the learning data input/output unit 21 performs control in the extracted control amount, referring to the storage unit 40, and judges whether or not the learning has been made non-continuable as a result of the control, for example, based on output from the status detection sensor 93 and the like. The learning data input/output unit 21 reads the extracted control amount and information as to whether or not the learning has been made non-continuable as a result of the control, as learning data, and gives them to the neural network 22 as learning data. The neural network 22 performs learning through deep learning based on the given learning data (S301).

At this time, the learning result output unit 23 can output a learning result of the control with which the learning is made non-continuable. Accordingly, the neural network 22 can, for example, accept control with which the learning is made non-continuable, as learning data, from another learning apparatus 1' that has a similar configuration, and perform an additional learning process (S302). Thus, it is possible to perform a more efficient learning process. An efficient learning process refers to, for example, learning in which the time required from when the learning is started to when the learning purpose is achieved is short. Note that the processing in step S302 is not essential.

The learning apparatus 1 performs the processing in step S301 (and S302) on all classified partial operations (S303).

Although not essential, after learning the control with which the learning is made non-continuable, for all classified partial operations, the learning apparatus 1 may perform learning again through the series of operations (S304). Accordingly, faster course-running control can be performed.

In this manner, the learning apparatus 1 according to this embodiment performs learning of control with which the learning is made non-continuable, for classified partial operations, and thus it is possible to perform learning while avoiding such control in the subsequent learning. Accordingly, it is possible to perform a more efficient learning process.

2-4. Optimization Learning
FIG. 8 is a flowchart showing details of the processing flow in the optimization learning (third learning process) in step S4 shown in FIG. 3. In the optimization learning, the learning that has been performed in the steps up to S3 is optimized, and thus learning for achieving the learning purpose (“do ten laps of a course within predetermined period of time and reach goal” in this embodiment) given as learning data when the learning was started is performed. In the optimization learning, it is possible to perform the learning, with the control with which the learning is made non-continuable, learned in step S3, being excluded. At this time, the learning data input/output unit 21 refers to the storage unit 40 and extracts learning data (which is set by an operator) input in the learning initial stage (S1 in FIG. 3). Furthermore, the learning data input/output unit 21 extracts a status of the neural network 22 after the learning of control with which the learning is made non-continuable, referring to the storage unit 40. The learning data input/output unit 21 sets the extracted data to the control unit 10.

The control unit 10 outputs the control amount for the actuator 92, based on the set data described above, and acquires sensor values of the control sensor 91 and the status detection sensor 93 corresponding thereto. The control unit 10 stores the control amount and the sensor values output therefor, in the storage unit 40.

The neural network 22 reads the control amount and the sensor values stored by the control unit 10 in the above-described processing, and performs learning through deep learning (S401). Accordingly, the neural network 22 can more efficiently learn a control operation that matches the learning requirements from the start to the end of the operation (i.e., from the start to the goal of the course), in a state where control with which the learning is made non-continuable has been learned. The processing in step S401 is repeatedly performed until the entire learning is optimized (S402). A result of the optimization learning is extracted by the learning result output unit 23, and stored in the storage unit 40. Accordingly, in the optimization learning, it is possible to perform learning, with the control with which the learning is made non-continuable being excluded.

In this manner, with the learning apparatus 1 according to this embodiment, the learning apparatus 1 itself can classify an operation involved in the learning into partial operations, and perform learning. Accordingly, individual optimization can be performed for each classified operation, and the learning can be performed more efficiently (i.e., in a shorter period of time). Moreover, with the learning apparatus 1 according to this embodiment, when learning a partial operation, first, control with which the learning is made non-continuable is learned. Accordingly, the learning can be efficiently performed without a person setting detailed conditions in advance for each operation.

Hardware Configuration
Hereinafter, an example of the hardware configuration in a case where the learning apparatus 1 described above is realized by a computer 800 will be described with reference to FIG. 9. Note that the configuration of each apparatus may be realized by a plurality of separate devices.

As shown in FIG. 9, the computer 800 includes a processor 801, a memory 803, a storage apparatus 805, an input interface unit (input I/F unit) 807, a data interface unit (data I/F unit) 809, a communication interface unit (communication I/F unit) 811, and a display apparatus 813.

The processor 801 controls various types of processing in the computer 800 by executing programs stored in the memory 803. For example, as a result of the processor 801 executing programs stored in the memory 803, the control unit 10, the machine learning unit 20, the operation classifying unit 30, and the like of the learning apparatus 1 can be realized.

The memory 803 is a storage medium, for example, such as a RAM (random access memory). The memory 803 temporarily stores program code of programs that are executed by the processor 801, and data that is required at the time when the programs are executed.

The storage apparatus 805 is, for example, an auxiliary memory such as a hard disk drive (HDD) or a solid state drive, or a non-volatile storage medium such as a flash memory. The storage apparatus 805 stores various programs for realizing the operating system or the above-described configurations. These programs and data are loaded onto the memory 803 as necessary, and referred to by the processor 801. For example, the storage unit 40 described above is realized by the storage apparatus 805.

The input I/F unit 807 is a device for accepting input from an administrator. Specific examples of the input I/F unit 807 include a keyboard, a mouse, a touch panel, various sensors, a wearable device, and the like. The input I/F unit 807 may be connected to the computer 800, for example, via an interface such as a USB (universal serial bus).

The data I/F unit 809 is a device for inputting data from the outside of the computer 800. Specific examples of the data I/F unit 809 include drive devices and the like for reading data stored in various storage media. The data I/F unit 809 may be provided external to the computer 800. In that case, the data I/F unit 809 is connected to the computer 800, for example, via an interface such as a USB.

The communication I/F unit 811 is a device for performing wired or wireless data communication with an apparatus outside the computer 800 via the Internet N. The communication I/F unit 811 may be provided external to the computer 800. In that case, the communication I/F unit 811 is connected to the computer 800, for example, via an interface such as a USB.

The display apparatus 813 is a device for displaying various types of information. Specific examples of the display apparatus 813 include a liquid crystal display, an organic EL (electro-luminescence) display, a wearable display, and the like. The display apparatus 813 may be provided external to the computer 800. In that case, the display apparatus 813 is connected to the computer 800, for example, via a display cable or the like.

Second Embodiment
In the first embodiment, an example was described in which the learning apparatus 1 is used for the vehicle 90 controlled to drive autonomously. However, the apparatus to which the learning apparatus 1 is applied is not limited to the example shown in the first embodiment, and the learning apparatus 1 can be applied to various apparatuses. In this embodiment, an example will be described in which the learning apparatus 1 is applied to the control of a robot whose task is to perform a pick-and-place operation. In the second embodiment, mainly differences from the first embodiment will be described.

First, differences from the first embodiment in terms of the system configuration according to this embodiment will be described with reference to FIG. 10. The configuration of the learning apparatus 1 is as in the first embodiment. On the other hand, the configuration outside the learning apparatus 1 in this embodiment is such that a control sensor 91' is configured by a sensor group for performing a pick-and-place operation. Specifically, it is configured by a workpiece detection sensor (image sensor), a robot holding force sensor, or the like. Furthermore, the control sensor 91' includes an image recognition algorithm, and can recognize the shape of a workpiece that it is holding. The other portions of the configuration outside the learning apparatus 1 are as described in the first embodiment.

Next, differences between learning according to this embodiment and learning according to the first embodiment will be described.
The pick-and-place operation that is a task according to this embodiment refers to an operation that is performed following the procedure below.
1. Recognize workpiece shape and hold the workpiece
2. Lift workpiece held thereby
3. Move lifted workpiece to predetermined position according to workpiece shape
4. Stack workpiece in tube according to shape

In learning of robot control according to this embodiment, a given learning purpose and learning requirements are as follows.

Learning Purpose
- Stack ten workpieces in each tube with an inlet (in the shape of a circle, a quadrangle, and a triangle) according to workpiece shape within a predetermined period of time, through pick-and-place operation, from a container containing, in a mixed manner, workpieces having three different shapes (e.g., workpieces in three shapes consisting of a cylinder, a quadratic prism, and a triangular prism)
Learning Requirements
- Do not place workpieces at a position other than predetermined position
- Stack ten workpieces in tubes according to workpiece shapes
- “Stack one workpiece in proper workpiece-shaped tube” in initial stage level

In this embodiment, the task is to stack workpiece in tube according to shape. Furthermore, in this embodiment, the pick-and-place operation that is to be learned may be classified into partial operations according to scenes following a procedure similar to that in the first embodiment in which the task is divided into scenes based on the course for driving of the vehicle 90, and partial operations are classified based on the scenes. For example, in this embodiment, the task is divided based on a displacement amount in the operation involved in the learning of the task, into a scene corresponding to an operation of holding a workpiece, a scene corresponding to an operation of carrying a workpiece, and a scene corresponding to an operation of stacking a workpiece. The pick-and-place operation is classified into partial operations according to the divided scenes.

Furthermore, in this embodiment, the state in which the learning is made non-continuable refers to, for example, a state in which a workpiece cannot be placed into a tube. Accordingly, in the learning stage of control with which the learning is made non-continuable, the control that is to be learned is, for example, as follows.
- Wrong placement position (workpiece shape and tube inlet shape are different)
- Wrong stacking orientation of workpiece (workpiece shape orientation and tube inlet shape orientation are different)

With the learning apparatus 1 according to this embodiment, if the learning of control with which the learning is made non-continuable is performed in advance, learning of properly recognizing workpiece shapes and tube shapes and orientations when holding workpieces can be performed in advance. Accordingly, making the learning non-continuable can be avoided in the learning in the final stage, and thus the efficiency of learning can be further increased. That is to say, the time required to achieve the learning purpose can be shortened.
The other portions of the configuration are similar to those in the first embodiment.

Above, embodiments of the present invention have been described. The foregoing embodiments are for the purpose of facilitating understanding of the present invention, and are not to be interpreted as limiting the present invention. The invention can be altered and improved without departing from the gist thereof. For example, the steps in the above-described processing flows can be partially omitted, be rearranged in any desired order, or be executed in parallel, as long as doing so does not cause conflict in the processing content.

In the foregoing embodiments, examples were described in which the system according to the present invention is used to manage abilities acquired by machines according to AI technology such as deep learning, but the present invention is not limited thereto, and can be applied to a wide variety of fields. Examples thereof include distinguishing non-defective products from defective products, foods, machine parts, chemical products, drugs, and the like in various fields including industrial fields, fishery fields, agricultural fields, forestry fields, service industries, and medical and health fields. Furthermore, the present invention can be applied to cases where AI technology is applied to products in embedding fields, systems such as social systems using IT techniques, analysis of big data, classifying processing in a wide variety of control apparatuses, and the like.

Note that, in this specification, a “portion”, “unit” or “procedure” does not merely mean a physical configuration, and there are also cases where processing that is performed by a “portion” is realized by software. Furthermore, processing that is performed by one “portion”, “unit”, “procedure”, or apparatus may be performed by two or more physical configurations or apparatuses, and processing that is performed by two or more “portions”, “units”, “procedures”, or apparatuses may be performed by one physical unit or apparatus.

Note that part of or the entirety of the foregoing embodiments may be described as in Additional Remarks below, but there is no limitation to this.

Additional Remark 1
A learning apparatus including at least one hardware processor,
wherein the hardware processor
accepts learning data containing a learning purpose,
performs learning based on the learning data, and
outputs a learning result obtained by a neural network,
the performing learning includes performing a first learning process for achieving an initial stage of the learning purpose, performing a second learning process for learning control with which an operation involved in the learning is made non-continuable, based on a result of the first learning process, and performing a third learning process for achieving the learning purpose, with the control with which an operation involved in the learning is made non-continuable being excluded, based on a result of the second learning process.

Additional Remark 2
A learning method that is performed by at least one hardware processor for executing a step of performing learning, the method including:
a step of accepting learning data containing a learning purpose;
a step of performing learning based on the learning data;
a step of outputting a learning result obtained by a neural network,
wherein the step of performing learning includes a step of performing a first learning process for achieving an initial stage of the learning purpose, performing a second learning process for learning control with which an operation involved in the learning is made non-continuable, based on a result of the first learning process, and performing a third learning process for achieving the learning purpose, with the control with which an operation involved in the learning is made non-continuable being excluded, based on a result of the second learning process.

Claims

A learning apparatus configured to learn control of an operation involved in a predetermined task, the apparatus comprising:
a learning data accepting unit configured to accept learning data containing a learning purpose;
a neural network configured to perform learning based on the learning data; and
an output unit configured to output a learning result obtained by the neural network,
wherein the neural network
performs a first learning process for achieving an initial stage of the learning purpose, performs a second learning process for learning control with which an operation involved in the learning is made non-continuable, based on a result of the first learning process, and performs a third learning process for achieving the learning purpose, with the control with which an operation involved in the learning is made non-continuable being excluded, based on a result of the second learning process.
The learning apparatus according to claim 1, wherein the output unit outputs the result of the second learning process.
The learning apparatus according to claim 1,
wherein the learning apparatus is a learning apparatus configured to learn control of a series of operations involved in a predetermined task,
the learning apparatus further comprises a classifying unit configured to divide the task into a plurality of scenes, and, for each of the divided scenes, specify a partial operation that is to be performed in the scene, from among the series of operations, and
the neural network performs the second learning process and the third learning process for each partial operation.
An autonomous driving control learning apparatus configured to learn control of a series of operations involved in autonomous driving of a vehicle that does laps of a predetermined course, the apparatus comprising:
a learning data accepting unit configured to accept learning data containing a learning purpose in which a purpose is to do a predetermined number of laps of the course within a predetermined period of time;
a neural network configured to perform learning based on the learning data; and
an output unit configured to output a learning result obtained by the neural network,
wherein the neural network performs a first learning process for achieving running of one lap of the course by the vehicle, performs a second learning process for learning control with which an operation involved in the learning is made non-continuable, based on a result of the first learning process, and performs a third learning process for achieving the learning purpose, with the control with which an operation involved in the learning is made non-continuable being excluded, based on a result of the second learning process.
A robot control learning apparatus configured to learn control of a series of operations involved in a task of holding a predetermined workpiece and stacking the workpiece at a placement position according to a shape of the workpiece, the apparatus comprising:
a learning data accepting unit configured to accept learning data containing a learning purpose in which a purpose is to stack a predetermined number of the workpieces at the placement position within a predetermined period of time;
a neural network configured to perform learning based on the learning data; and
an output unit configured to output a learning result obtained by the neural network,
wherein the neural network performs a first learning process for achieving stacking of one workpiece at the placement position, performs a second learning process for learning control with which an operation involved in the learning is made non-continuable, based on a result of the first learning process, and performs a third learning process for achieving the learning purpose, with the control with which an operation involved in the learning is made non-continuable being excluded, based on a result of the second learning process.
A learning method for learning control of an operation involved in a predetermined task, the method comprising:
accepting, by a computer, learning data containing a learning purpose;
performing, by the computer, learning based on the learning data; and
outputting, by the computer, a learning result obtained in the learning,
wherein performing learning includes performing, by the computer, a first learning process for achieving an initial stage of the learning purpose, performing, by the computer, a second learning process for learning control with which an operation involved in the learning is made non-continuable, based on a result of the first learning process, and performing, by the computer, a third learning process for achieving the learning purpose, with the control with which an operation involved in the learning is made non-continuable being excluded, based on a result of the second learning process.
A program for causing a computer configured to learn control of an operation involved in a predetermined task to execute:
accepting learning data containing a learning purpose;
performing learning based on the learning data; and
outputting a learning result obtained in the learning,
wherein performing learning includes performing a first learning process for achieving an initial stage of the learning purpose, performing a second learning process for learning control with which an operation involved in the learning is made non-continuable, based on a result of the first learning process, and performing a third learning process for achieving the learning purpose, with the control with which an operation involved in the learning is made non-continuable being excluded, based on a result of the second learning process.
An apparatus configured to perform a predetermined task, the apparatus comprising:
a first sensor configured to sense information necessary for an operation of the apparatus to perform the task;
an actuator;
a second sensor configured to sense a change in a status of the apparatus caused by the actuator;
a control unit configured to control the actuator based on sensor values output from the first sensor and the second sensor; and
a storage unit configured to store a learning result obtained by the learning apparatus according to any one of claims 1 to 3,
wherein the control unit determines a control amount that corresponds to the sensor values output from the first sensor and the second sensor, based on the learning result stored in the storage unit.