WO2020158217A1

WO2020158217A1 - Information processing device, information processing method, and information processing program

Info

Publication number: WO2020158217A1
Application number: PCT/JP2019/049300
Authority: WO
Inventors: 井手　直紀
Original assignee: ソニー株式会社
Priority date: 2019-02-01
Filing date: 2019-12-17
Publication date: 2020-08-06

Abstract

This information processing device comprises: an acquisition part that acquires method information that indicates a learning method for machine learning; and a determination part that, on the basis of the method information acquired by the acquisition part, determines a supply method for data for the machine learning.

Description

Information processing apparatus, information processing method, and information processing program

The present disclosure relates to an information processing device, an information processing method, and an information processing program.

Information processing using machine learning is utilized in various technical fields, and tools for designing networks such as neural networks have been provided.

International Publication No. 2017/154284

According to the conventional technology, the network structure related to the neural network according to the environment is searched.

However, the conventional technology is not always able to appropriately determine the method of supplying machine learning data according to the learning method. For example, since the method for supplying network-generated data differs depending on each learning method such as supervised learning and unsupervised learning, it is only necessary to search the network structure according to the environment to supply the data according to the learning method. It is difficult to deal with. Therefore, there is also a problem that the convenience of the user is not high because, for example, a user who desires to generate a network sets parameters used at the time of learning according to a learning method.

Therefore, the present disclosure proposes an information processing device, an information processing method, and an information processing program capable of appropriately determining a machine learning data supply method according to a learning method.

In order to solve the above problems, an information processing device according to an aspect of the present disclosure is based on an acquisition unit that acquires method information indicating a learning method related to machine learning, and the method information acquired by the acquisition unit. And a determining unit that determines a method of supplying the machine learning data.

It is a figure showing an example of information processing concerning an embodiment of this indication. It is a figure showing an example of information processing concerning an embodiment of this indication. It is a figure showing an example of composition of an information processor concerning an embodiment of this indication. It is a figure showing an example of the corresponding information storage part concerning the embodiment of this indication. 5 is a flowchart showing a procedure of information processing according to the embodiment of the present disclosure. 8 is a flowchart showing learning such as supervised learning according to the present disclosure. It is a figure which shows an example of the data set of supervised learning concerning this indication. It is a figure showing an example of a data set of unsupervised learning concerning this indication. It is a figure showing an example of a mini-batch of semi-teacher learning concerning this indication. It is a figure which shows an example of the network structure of the metric learning which concerns on this indication. It is a figure showing an example of a mini-batch of metric learning concerning this indication. It is a figure which shows an example of the input screen of the metric learning which concerns on this indication. 9 is a flowchart showing learning of metric learning according to the present disclosure. It is a figure which shows another example of the network configuration of the metric learning which concerns on this indication. It is a figure which shows another example of the mini-batch of the metric learning which concerns on this indication. It is a figure which shows another example of the input screen of the metric learning which concerns on this indication. 9 is a flowchart showing another learning of metric learning according to the present disclosure. 8 is a flowchart showing learning in the case of hard negative mining according to the present disclosure. It is a figure which shows an example of the network structure of the meta learning which concerns on this indication. It is a figure which shows an example of the mini-batch of the meta learning which concerns on this indication. It is a figure which shows an example of the input screen of the meta learning which concerns on this indication. 9 is a flowchart illustrating learning of meta learning according to the present disclosure. It is a figure which shows an example of the input screen of the transfer learning which concerns on this indication. It is a figure which shows an example of the network design screen of the transfer learning which concerns on this indication. It is a figure which shows the structural example of the information processing system which concerns on the modification of this indication. It is a figure which shows the structural example of the information processing apparatus which concerns on the modification of this indication. It is a figure showing other examples of composition of an information processor concerning a modification of this indication. It is a figure which shows the example of the display screen of the network design support of this indication. It is a figure which shows the example of the provision aspect of the information processing which concerns on the modification of this indication. It is a hardware block diagram which shows an example of the computer which implement|achieves the function of an information processing apparatus.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. Note that the information processing apparatus, the information processing method, and the information processing program according to the present application are not limited to this embodiment. Further, in each of the following embodiments, the same reference numerals are given to the same portions, and the overlapping description will be omitted.

The present disclosure will be described in the following item order.
1. Embodiment 1-1. Overview of information processing according to embodiments of the present disclosure 1-2. Configuration of information processing apparatus according to embodiment 1-3. Information processing procedure according to embodiment 1-4. Supervised learning, unsupervised learning 1-5. Semi-teacher learning, weak teacher learning 1-6. Metric learning 1-6-1. Shamisen Network 1-6-2. Triplet network 1-7. Meta learning 1-8. Transfer learning 2. Other Embodiments 2-1. Modification 1 (other configuration examples)
2-2. Modification 2 (estimation of learning method)
2-3. Modification 3 (Use within the development framework)
2-4. Modification 4 (provided as a standalone application)
3. Hardware configuration

[1. Embodiment]
[1-1. Outline of information processing according to embodiment of the present disclosure]
FIG. 1 is a diagram illustrating an example of information processing according to an embodiment of the present disclosure. The information processing according to the embodiment of the present disclosure is realized by the information processing device 100 illustrated in FIG. 1.

The information processing device 100 is an information processing device that executes information processing according to the embodiment. The information processing apparatus 100 provides (presents) a plurality of learning methods related to machine learning to a user, and based on information indicating a learning method selected by the user (hereinafter also referred to as “method information”), learning data It is an information processing apparatus that determines a supply method. In the embodiment, a case will be described as an example in which information is provided to a user by a technique related to a GUI (Graphical User Interface) such as the screen IM1 shown in FIG. 2 and the like. Note that the information processing apparatus 100 provides information to users by various technologies such as CUI (Character User Interface) and TUI (Text User Interface) as well as GUI as long as the processing of the embodiment can be realized. You may go.

With reference to FIG. 1, the information processing apparatus 100 uses a GUI to display information indicating a plurality of learning methods to a user (hereinafter referred to as “user X”) who desires to execute machine learning (hereinafter referred to as “machine learning MLX”). The display shows the case where the data supply method of the machine learning MLX is determined based on the one learning method selected by the user X. First, the information processing apparatus 100 provides the user X with information indicating a plurality of learning methods. For example, the information processing apparatus 100 displays a plurality of learning methods on the display (corresponding to the output unit 13 in FIG. 3). In the example of FIG. 1, the information processing apparatus 100 displays information indicating eight learning methods LT1 to LT8 corresponding to each of the eight learning types, as shown in the provision list LS1. For example, the provision list LS1 may be a UI (User Interface) for selecting a learning type (learning method) from a pull-down menu. For example, the information processing apparatus 100 transitions to a detailed setting screen (screen IM1 in FIG. 2 or screen IM4 in FIG. 12) according to the user's selection of a learning method, which will be described in detail later.

In the example of FIG. 1, the information processing apparatus 100 displays the provision list LS1 including the learning method LT1 that is supervised learning. The information processing apparatus 100 also displays the provision list LS1 including the learning method LT2 that is unsupervised learning. Further, the information processing apparatus 100 displays the provision list LS1 including the learning method LT3 that is the semi-teacher learning. The information processing apparatus 100 also displays the provision list LS1 including the learning method LT4 that is transfer learning. The information processing apparatus 100 also displays the provision list LS1 including the learning method LT5 that is weak teacher learning. The information processing apparatus 100 also displays the provision list LS1 including the learning method LT6 that is metric learning. The information processing apparatus 100 also displays the provision list LS1 including the learning method LT7 that is meta-learning. In addition, the information processing apparatus 100 displays the provision list LS1 including the learning method LT8 that is the fusion learning. The details of the learning methods LT1 to LT8 will be described later. The learning methods LT1 to LT8 described above are examples, and the information processing apparatus 100 may display the provision list LS1 including various learning methods without being limited to the learning methods LT1 to LT8.

Then, the user X who confirms the provision list LS1 selects the learning method of the machine learning MLX from the plurality of learning methods LT1 to LT8 provided by the information processing apparatus 100 (step S11). For example, the user X specifies one learning method out of the plurality of learning methods LT1 to LT8. In the example of FIG. 1, the user X selects the learning method LT1 that is learning with a teacher from among the plurality of learning methods LT1 to LT8. As a result, the information processing apparatus 100 acquires method information indicating the learning method LT1 that is supervised learning (step S12).

The information processing apparatus 100 determines the method of supplying the data for machine learning MLX based on the method information indicating the learning method LT1 that is supervised learning (step S13). For example, the information processing apparatus 100 determines input data, parameters, and the like used for supervised learning as a method of supplying data for machine learning MLX based on method information indicating a learning method LT1 that is supervised learning. For example, the information processing apparatus 100 uses input information used for machine learning MLX by using correspondence information indicating correspondence between a plurality of learning methods and a plurality of supply methods, as shown in the correspondence information storage unit 141 in FIG. Determine parameters etc.

The information processing apparatus 100 determines input data, parameters, etc. used for supervised learning as a method of supplying data for machine learning MLX based on comparison between method information indicating supervised learning and correspondence information. The information processing apparatus 100 uses the information corresponding to the supervised learning among the correspondence information to determine the input data, the parameters, and the like used for the supervised learning as the method for supplying the data of the machine learning MLX.

The information processing apparatus 100 uses the information corresponding to the supervised learning among the correspondence information, and as shown in the data supply information DM1, inputs the data and parameters used in the machine learning MLX, and supplies the machine learning data. To decide. The information processing apparatus 100 determines the input data used for the machine learning MLX using the information regarding the input corresponding to the learning with a teacher among the correspondence information. In the example of FIG. 1, the information processing apparatus 100 determines the input data used for the machine learning MLX as “labeled data”. Further, the information processing apparatus 100 determines the parameter used for the machine learning MLX by using the information regarding the parameter corresponding to the supervised learning among the correspondence information. The information processing apparatus 100 determines the parameter used for the machine learning MLX using the setting parameter information PINF1 shown in FIG. For example, the setting parameter information PINF1 includes a parameter used for supervised learning, specification of whether the user inputs the value of the parameter, recommended value of the parameter, and the like. In the example of FIG. 1, the information processing apparatus 100 determines the parameter used for the machine learning MLX to be the “repetition number” or the “batch size”.

As described above, the information processing apparatus 100 acquires the method information indicating the learning method designated by the user, and determines the method of supplying the machine learning data based on the method information, thereby responding to the learning method. A method of supplying machine learning data can be appropriately determined. As described above, the information processing apparatus 100 provides the data design and the optimization tool necessary for the learning algorithm design to the user (developer) who uses the machine learning, especially the deep learning, in the development of the functional module. Can be easily made available.

Recently, deep learning, which represents the overall technology for learning neural networks, has brought about a major revolution in improving the performance of functional modules that have various recognition functions such as object recognition, voice recognition, and action recognition. For example, the performance in “general object recognition”, which assigns a class of an object from an image showing the object, exceeds that of a person.

The above-mentioned "supervised learning" is a technique for learning a function for deriving a label from data by using a combination (pair) of data indicating a problem and a label indicating its answer. For example, in general object recognition, the data is a photograph of the object and the label is the type of the object. Such a functional design solution using supervised learning may be able to obtain a high-performance module if a sufficient data set, proper selection of a neural network, and knowledge of a basic learning algorithm are available.

Also, recently, with the establishment of supervised learning solutions, there are few labels that show problem data and answers, and learning in situations where it is not sufficient and acquisition of other than recognition modules are progressing. A typical example of learning in a situation where the label showing the answer is not sufficient is called "semi-teacher learning", and learning when the label is a hint rather than the correct answer to predict is called "weak teacher learning". To be done.

Also, a typical example of learning when the amount of data is small and insufficient is called "one shot learning (fushot learning)". The one-shot learning is called “meta-learning” which includes the learning method, and the learning algorithm of the learning algorithm learns the initial value of the model. There are various approaches to "meta-learning" such as those based on "metric learning" and those based on "transfer learning". "Metric learning" is a learning in which a plurality of data are labeled with a distance label (for example, the same class is close, and different classes are far), and a plurality of data with the label is used. .. “Transfer learning” is a technique for diverting a model learned in advance and learning a new data set. A representative example of acquisition other than the recognition module is called a "generation model". In the learning of the “generation model”, the likelihood of data, that is, the likelihood of data or the authenticity of data is learned. "Unsupervised learning" is often used for learning generative models.

As described above, the learning method (learning type) includes "semi-teacher learning", "weak teacher learning", "one-shot learning", "fushot learning", and "meta-learning" in addition to "supervised learning". , "Metric learning", "transfer learning", "learning generative model" and so on. For example, when using these learning methods, the user (developer) has not only knowledge of the network architecture of how to incorporate the "recognition module" into the network, but also how to supply data with that architecture. Knowledge of optimization (learning) is required. Further, the user (developer) may need to code various learning devices for actually supplying data and performing optimization.

For example, in the case of "semi-teacher learning", it is necessary to supply "labeled data" and "unlabeled data", and in "weak teacher learning", it is often necessary to set "transfer learning". In the case of “meta-learning” of “one-shot learning”, it is necessary to randomly select a small amount of learning data (support) and test data (query) called “episode”. In the case of "metric learning", it is necessary to generate a plurality of data and labels of distance information of the data. In the case of "transfer learning", it is necessary to transfer the learned model to the initial model.

Due to this situation, it is desired that anyone can easily develop a module using deep learning. In particular, in order to enable the user to learn by various learning methods, it is necessary to appropriately supply data and train the model.

Therefore, the information processing apparatus 100 automatically determines a data supply method based on the selected learning method by providing the user with a plurality of learning methods and allowing the user to select the learning method. As a result, the information processing apparatus 100 does not need knowledge of the data supply method in the learning method (learning algorithm), and can automatically determine the data supply method according to the problem (task) that the user wants to solve. In addition, the information processing apparatus 100 enables the user to select the learning method using the GUI, and thus can appropriately determine the machine learning data supply method according to the learning method.

As shown in FIG. 1, the information processing apparatus 100 has an interface that allows a user to browse a list of learning methods (learning algorithms) (provided list LS1) and select one from them. Further, the information processing apparatus 100 may automatically generate a source code including a learning method (learning algorithm) selected by the user or an execution code (memory).

The information processing apparatus 100 may provide the user with a list of data supply methods, not limited to the list of learning methods (learning algorithms) (provided list LS1). In this case, the information processing apparatus 100 has an interface that allows the user to browse the list of data supply methods associated with the learning method (learning algorithm) and select one from them. Further, the information processing apparatus 100 may automatically generate a source code including a learning method (learning algorithm) selected by the user or an execution code (memory).

In addition, the information processing apparatus 100 has an interface that allows the user to input the setting when one of the data supply means associated with the learning method (learning algorithm) is selected. The source code including the learned algorithm or the execution code (memory) may be automatically generated.

Here, the development of deep learning is divided into four stages: learning, test data collection, network design, learning algorithm construction, and learned module evaluation. The learning and test data collection is collection of learning data for learning parameters of a module such as a recognition module and test data for evaluating generalization performance indicating how well the learning data operates properly.

The network design also includes the design of final modules such as recognition modules, and the network design for learning that incorporates these modules for learning the final modules.

In addition, learning algorithm design is not limited to selecting a solver such as whether to use a simple gradient method or the latest method, but also includes determining a learning method and a data supply method suitable for a learning network. Be done.

Then, the learned module can be evaluated and finally shipped as a module by satisfying predetermined criteria such as certain accuracy and performance criteria.

The information processing apparatus 100 can provide a simple development environment to the user (developer) with respect to a part related to the learning algorithm design, among them, a learning method and a data supply method suitable for the learning network. It is for That is, the information processing apparatus 100 realizes the simplification of the data supply method regarding the learning method (learning algorithm) among these functions. The information processing apparatus 100 enables automatic and semi-automatic construction of various learning algorithms in the development environment of machine learning, particularly deep learning, by determining the above-described data providing method.

Also, for the deep learning development framework by GUI, for example, frameworks such as NeuralNetworkConsole and NeuralNetworkSAAS are provided. These frameworks design a neural network by combining blocks and set a learning algorithm on an integrated development environment having a GUI. For example, information processing by the information processing apparatus 100 as illustrated in FIG. 1 is applied to NeuralNetworkConsole. In the following, NeuralNetworkConsole is shown as an example of the deep learning integrated development environment, but the application is not limited to this application.

From here, using FIG. 2, an example of provision of information on the determined supply method to the user is shown. FIG. 2 is a diagram illustrating an example of information processing according to the embodiment of the present disclosure. Specifically, FIG. 2 shows a case where the information indicating the parameters determined by the information processing apparatus 100 in FIG. 1 is provided to the user X, and the user X is made to input the value of the parameter used for the machine learning MLX.

In FIG. 2, the information processing apparatus 100 generates a screen IM1 including parameters “repetition number”, “batch size”, etc. used for machine learning MLX. The information processing apparatus 100 generates a screen IM1 which is an input screen for inputting the value of the parameter used for the determined machine learning MLX. For example, the information processing apparatus 100 generates the screen IM1 by using various conventional techniques related to image generation. For example, the screen IM1 is a setting screen for a data supply method and the like of a learning method (learning algorithm) regarding the NeuralNetworkConsole.

In FIG. 2, the information processing apparatus 100 generates a screen IM1 including a character string “Max Epoch” indicating the parameter “repetition number” and an input field BX1 for inputting the value of the parameter “repetition number” in the central area AR1. Further, the information processing apparatus 100 generates a screen IM1 including a character string "Batch Size" indicating the parameter "batch size" and an input field BX2 for inputting the value of the parameter "batch size" in the area AR1.

Further, in FIG. 2, the information processing apparatus 100 determines the recommended values of the parameter “repetition number” and the parameter “batch size” recommended for the user X. For example, the information processing apparatus 100 uses the recommended value of each parameter stored in the storage unit 14 (see FIG. 3) to determine the recommended value of the parameter “number of repetitions” or the parameter “batch size”. For example, the information processing apparatus 100 may determine the recommended value of each parameter based on the past learning history. For example, the information processing apparatus 100 may determine the average of each parameter in past learning as the recommended value.

In the example of FIG. 2, the information processing device 100 determines the recommended value of the parameter “number of repetitions” to be “100”. The information processing apparatus 100 also determines the recommended value of the parameter “batch size” to be “64”. Then, the information processing apparatus 100 generates the screen IM1 in which the recommended value “100” of the parameter “repetition number” is arranged in the input field BX1 and the recommended value “64” of the parameter “batch size” is arranged in the input field BX2. ..

Then, the information processing apparatus 100 provides the screen IM1 to the user X. In the example of FIG. 2, in the information processing apparatus 100, the recommended value “100” of the parameter “repetition number” is arranged in the input field BX1, and the recommended value “64” of the parameter “batch size” is arranged in the input field BX2. The screen IM1 is displayed. The user X who confirms the screen IM1 inputs the value of each parameter while referring to the recommended value of each parameter. Then, when the value of the input field BX1 is changed by the user, the information processing apparatus 100 determines the changed value as the value of the parameter “number of repetitions”. When the value in the input field BX2 is not changed by the user, the information processing apparatus 100 determines the recommended change value “64” as the value of the parameter “batch size”.

As described above, the information processing apparatus 100 can allow the user to more easily perform the machine learning by providing the user with the determined parameter and the recommended value of the parameter to be input by the user. .. Further, the information processing apparatus 100 may provide a screen that allows the user to input data corresponding to the determined input data. In the case of the example of FIG. 1, the information processing apparatus 100 may provide the user X with a screen for inputting “labeled data”. The information processing apparatus 100 may provide the user X with a screen including information indicating that the type (structure) of the input data is “labeled data”. For example, the information processing apparatus 100 may provide the user X with a screen including a display of textual information such as “Please input data with label”.

Note that the information processing apparatus 100 may realize the above-described processing such as display and operation acceptance by a predetermined application. Further, the information processing apparatus 100 may acquire a script to be executed on a predetermined software application, and execute information processing such as information display and operation reception as described above by the control information such as the acquired script. .. For example, the control information corresponds to a program that realizes information processing such as information display and operation reception by the information processing apparatus 100 according to the embodiment. For example, CSS (Cascading Style Sheets), JavaScript (registered trademark), It is realized by HTML (Hyper Text Markup Language) or any language capable of describing information processing such as information display and operation reception by the information processing apparatus 100 described above.

Also, the information processing apparatus 100 may determine the parameter for which the user inputs a value based on the parameter information corresponding to the learning method. The information processing apparatus 100 may determine the parameter that causes the user X to input a value based on the information stored in the correspondence information storage unit 141 in FIG. For example, the information processing apparatus 100 may determine the parameter for which the user input is designated among the parameters stored in the correspondence information storage unit 141 in FIG. 4 as the parameter for allowing the user to input a value. For example, the information processing apparatus 100 may determine the parameter “repetition number” as a parameter that allows the user to input a value when the parameter “repetition number” for supervised learning is designated by the user. Further, the information processing apparatus 100 may determine the parameter “batch size” as a parameter that does not allow the user to input a value when the parameter “batch size” for supervised learning is not designated by the user.

[1-2. Configuration of Information Processing Device According to Embodiment]
Next, the configuration of the information processing apparatus 100, which is an example of the information processing apparatus that executes information processing according to the embodiment, will be described. FIG. 3 is a diagram illustrating a configuration example of the information processing device 100 according to the embodiment of the present disclosure.

As shown in FIG. 3, the information processing device 100 includes a communication unit 11, an input unit 12, an output unit 13, a storage unit 14, and a control unit 15. In the example of FIG. 3, the information processing apparatus 100 includes an input unit 12 (for example, a keyboard and a mouse) that receives various operations from an administrator of the information processing apparatus 100, and an output unit 13 (for example, a keyboard and a mouse) for displaying various information. , Liquid crystal display, etc.).

The communication unit 11 is realized by, for example, a NIC (Network Interface Card), a communication circuit, or the like. The communication unit 11 is connected to a network N (Internet or the like) by wire or wirelessly, and transmits/receives information to/from another device or the like via the network N.

The user inputs various operations to the input unit 12. The input unit 12 receives an input from the user. The input unit 12 receives the selection of the learning method by the user. The input unit 12 may receive various operations from the user via a keyboard, a mouse, or a touch panel provided in the information processing device 100. The input unit 12 receives input of parameter values by the user. The input unit 12 may accept a user's utterance as an input.

The output unit 13 outputs various information. The output unit 13 displays various information. The output unit 13 is a display device (display unit) such as a display, and displays various information. The output unit 13 outputs the learning method. The output unit 13 displays a plurality of learning methods. The output unit 13 outputs the providing method determined by the determination unit 152. The output unit 13 displays the providing method determined by the determination unit 152. The output unit 13 outputs (displays) the information generated by the generation unit 153. The output unit 13 outputs (displays) the information provided by the providing unit 154. The output unit 13 displays the screen IM1. The output unit 13 may have a function of outputting voice. For example, the output unit 13 may include a speaker that outputs sound.

The storage unit 14 is realized by, for example, a semiconductor memory device such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. The storage unit 14 includes a correspondence information storage unit 141 and a data storage unit 142. Although illustration is omitted, the data storage unit 142 stores various data used for learning.

The correspondence information storage unit 141 stores various kinds of information regarding correspondence between a plurality of learning methods and a plurality of supply methods. FIG. 4 is a diagram illustrating an example of the correspondence information storage unit according to the embodiment of the present disclosure. FIG. 4 shows an example of the correspondence information storage unit 141 according to the embodiment. In the example of FIG. 4, the correspondence information storage unit 141 has items such as “type ID”, “learning type”, “input”, “output”, and “setting parameter information”.

"Type ID" indicates identification information for identifying the learning type (learning method). The "learning type" indicates a learning type (learning method) identified by the learning method. “Input” indicates the type of data used as input in the corresponding learning method. “Output” indicates the type of model (network) generated (learned) by the corresponding learning method. “Setting parameter information” indicates information on parameters set in the corresponding learning method. FIG. 4 shows an example in which conceptual information such as “PINF1” and “PINF2” is stored in the “setting parameter information”, but actually, information for specifying each parameter to be set and recommendation of the parameter are shown. Parameter information including a value and information indicating whether to allow the user to input the value of the parameter, or a file path name indicating the storage location is stored.

In the example of FIG. 4, the learning method LT1 corresponding to the learning type identified by the type ID “LT1” indicates that there is supervised learning. The input of the learning method LT1 is labeled data, and the output of the learning method LT1 is a recognition model. Further, the setting parameter information of the learning method LT1 indicates that it is the parameter information PINF1. For example, the parameter information PINF1 includes “repetition number” in the parameter, the recommended value thereof is “100”, and includes information indicating that the user inputs the value of the parameter. In addition, the parameter information PINF1 includes “batch size” in the parameter, the recommended value thereof is “64”, and includes information indicating that the user inputs the value of the parameter.

Note that the correspondence information storage unit 141 is not limited to the above, and may store various information according to the purpose.

Return to Figure 3 and continue the explanation. In the control unit 15, for example, a program (for example, an information processing program according to the present disclosure) stored in the information processing apparatus 100 by a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like is a RAM (Random Access Memory). ) Etc. are executed as a work area. The control unit 15 is a controller, and may be realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

As illustrated in FIG. 3, the control unit 15 includes an acquisition unit 151, a determination unit 152, a generation unit 153, a provision unit 154, and a learning unit 155, and has functions and actions of information processing described below. Realize or execute. The internal configuration of the control unit 15 is not limited to the configuration shown in FIG. 3, and may be another configuration as long as it is a configuration for performing information processing described later.

The acquisition unit 151 acquires various types of information. The acquisition unit 151 acquires various types of information from an external information processing device. The acquisition unit 151 acquires various types of information from the storage unit 14. The acquisition unit 151 acquires the input information received by the input unit 12.

The acquisition unit 151 acquires method information indicating a learning method related to machine learning. The acquisition unit 151 acquires method information from a user who specifies a learning method. The acquisition unit 151 acquires method information indicating the learning method selected by the user from the plurality of learning methods. The acquisition unit 151 acquires method information indicating a learning method LT1 that is learning with a teacher.

The decision unit 152 makes various decisions. The determination unit 152 determines various information based on the information acquired by the acquisition unit 151. The determining unit 152 determines various information based on the information stored in the storage unit 14. The determination unit 152 makes various determinations. The determination unit 152 determines various information based on the information acquired by the acquisition unit 151. The determination unit 152 determines various information based on the information stored in the storage unit 14.

The deciding unit 152 decides a method for supplying data for machine learning based on the method information acquired by the acquiring unit 151. The determination unit 152 determines the supply method based on the comparison between the method information and the correspondence information indicating the correspondence between the plurality of learning methods and the plurality of supply methods. The determination unit 152 determines a parameter regarding a supply method that allows a predetermined user to input a value, based on the method information. The deciding unit 152 decides a parameter that allows a predetermined user to input a value based on the parameter information corresponding to the learning method.

The deciding unit 152 decides a recommended value of a parameter recommended to a predetermined user based on the method information. The determining unit 152 determines the recommended value based on the performance of the device that executes machine learning. The determination unit 152 determines the recommended value based on the network learned by machine learning.

For example, the determination unit 152 determines the recommended value to be higher as the performance of the device that executes machine learning is higher. For example, the determination unit 152 compares the performance of a device that executes machine learning (target device) with a reference value such as the average of the performance of devices that have performed machine learning in the past, and recommends that the performance of the target device is higher. The value may be determined high. For example, when the performance of the target device is 1.5 times the reference value, the determining unit 152 may increase the recommended value to 1.5 times higher than usual. For example, when the performance of the target device is 1.5 times the reference value, the determination unit 152 may use a value that is 1.5 times the recommended value of the parameter stored in the storage unit 14.

For example, the determination unit 152 determines a higher recommended value as the network learned by machine learning is simpler. For example, the determining unit 152 compares the scale (hierarchy, etc.) of the network (target network) learned by machine learning with a reference value such as the average of the scale (hierarchy, etc.) of the network learned by past machine learning. The smaller the size of the target network, the higher the recommended value may be determined. For example, when the scale of the target network is 0.5 times the reference value, the determining unit 152 may increase the recommended value to twice the normal value. For example, when the scale of the target network is 0.5 times the reference value, the determining unit 152 may use the doubled recommended value of the parameter stored in the storage unit 14.

The deciding unit 152 decides a method of supplying machine learning data by appropriately using various techniques. The determining unit 152 determines the input data used for the machine learning MLX by using the information regarding the input corresponding to the learning with a teacher among the correspondence information. The determination unit 152 determines the input data used for the machine learning MLX as “labeled data”. Further, the determining unit 152 determines the parameter used for the machine learning MLX using the information regarding the parameter corresponding to the supervised learning in the correspondence information. The deciding unit 152 decides the parameter used for the machine learning MLX to be the "repetition number" or "batch size".

The determining unit 152 determines a network to be learned by machine learning based on the supply method. The determining unit 152 determines a network to be learned by machine learning based on the parameter corresponding to the supply method. The determination unit 152 determines the network structure based on the supply data supplied by the supply method. The determining unit 152 determines the network structure based on the domain of the supplied data.

For example, the determination unit 152 determines a network to be learned by machine learning based on the parameter regarding the number of classes corresponding to the supply method. The determining unit 152 determines the network to be learned by machine learning based on the value of the parameter related to the number of classes. For example, the determination unit 152 determines a network including the number of partial networks corresponding to the value of the parameter of the number of support classes in meta learning. For example, the determining unit 152 determines a network including five partial networks (corresponding to “Likelihood Network” in FIG. 19) when the value of the parameter of the number of support classes in meta learning is “5”. For example, the determination unit 152 determines a network including the number of partial networks corresponding to the value of the parameter of the number of support classes in one-shot learning or fuse shot learning.

For example, the determining unit 152 determines a network to be learned by machine learning based on the domain of the supplied data. For example, the determining unit 152 may determine the network to be learned by machine learning based on whether the supplied data is an image (moving image, still image), voice, item, or text. For example, the determining unit 152 determines the domain of the supply data based on the file extension or the like.

For example, the determining unit 152 determines whether to use the linear layer in the network as the convolutional layer or the fully connected layer according to the domain of the supplied data. For example, when the domain of the supplied data is image or sound, the determining unit 152 determines a network having a linear layer as a convolutional layer. For example, when the domain of the supplied data is an item or a text, the determination unit 152 determines a network in which the linear layer is the fully connected layer.

The generation unit 153 performs various generations. The generation unit 153 generates various information based on the information acquired by the acquisition unit 151. The generation unit 153 generates various information based on the information stored in the storage unit 14.

The generation unit 153 registers various information based on the information acquired by the acquisition unit 151. The generation unit 153 registers the information acquired by the acquisition unit 151 in the storage unit 14. The generation unit 153 stores the learning-related information (data) acquired by the acquisition unit 151 in the data storage unit 142. The generation unit 153 generates an input screen for inputting the value of the parameter determined by the determination unit 152. The generation unit 153 generates the screen IM1 as shown in FIG.

Note that the generation unit 153 generates information (image) related to the screen such as the screen IM1 shown in FIG. 2 by appropriately using various conventional techniques related to the image. The generation unit 153 generates an image such as the screen IM1 shown in FIG. 2 by appropriately using various conventional techniques related to GUI. For example, the generation unit 153 generates an image such as the screen IM1 shown in FIG. 2 in CSS, Javascript (registered trademark), HTML, or any language capable of describing information processing such as information display and operation reception described above. May be.

The generation unit 153 generates the screen IM1 including the parameters “repetition number”, “batch size”, etc. used in the machine learning MLX. The generation unit 153 generates a screen IM1 which is an input screen for inputting the values of the parameters used in the machine learning MLX. For example, the generation unit 153 generates the screen IM1 using various conventional techniques related to image generation.

The generating unit 153 generates a screen IM1 including a character string "Max Epoch" indicating the parameter "repetition number" and an input field BX1 for inputting the value of the parameter "repetition number". Further, the generation unit 153 generates a screen IM1 including a character string “Batch Size” indicating the parameter “batch size” and an input field BX2 for inputting the value of the parameter “batch size”.

The providing unit 154 provides various information. The providing unit 154 provides various information based on the information determined by the determining unit 152. The providing unit 154 causes the output unit 13 to display various information. The providing unit 154 causes the output unit 13 to display the information determined by the determining unit 152. In this case, the providing unit 154 functions as a display control unit that controls the display by the output unit 13.

The providing unit 154 provides various information to an external information processing device. The providing unit 154 transmits various information to an external information processing device. The providing unit 154 provides the input screen generated by the generating unit 153. The providing unit 154 provides various information determined by the determining unit 152. The providing unit 154 provides the information indicating the network by the determining unit 152. The providing unit 154 provides the user with information indicating the network by the determining unit 152. The providing unit 154 causes the determining unit 152 to display information indicating the network on the output unit 13.

The providing unit 154 provides the screen IM1 to the user X. The providing unit 154 outputs to the output unit 13 the screen IM1 in which the recommended value “100” of the parameter “repetition number” is arranged in the input field BX1 and the recommended value “64” of the parameter “batch size” is arranged in the input field BX2. indicate.

The learning unit 155 performs various learning. The learning unit 155 learns various information based on the information acquired by the acquisition unit 151. The learning unit 155 learns various information based on the information stored in the storage unit 14. The learning unit 155 learns (generates) a model. The learning unit 155 learns the model based on the information acquired by the acquisition unit 151. The learning unit 155 learns the model based on the information stored in the storage unit 14. For example, the learning unit 155 learns network parameters.

The learning unit 155 learns a model using various machine learning technologies. The learning unit 155 learns the model based on the designated learning method (learning algorithm). The learning unit 155 learns the model based on the designated one learning method (learning algorithm) among the eight learning methods LT1 to LT8.

The learning unit 155 learns the model based on the information regarding the data supply method determined by the determination unit 152. The learning unit 155 learns the model based on the input data and parameters determined by the determination unit 152. The learning unit 155 learns the model corresponding to the machine learning MLX based on the value of the parameter “repetition number” and the value of the parameter “batch size” determined by the determination unit 152.

The learning unit 155 learns a model based on a learning method designated by a predetermined means and a data supply method corresponding to the learning method. When the learning method is the supervised learning, the learning unit 155 uses various conventional techniques of supervised learning to appropriately learn the model. If the learning method is unsupervised learning, the learning unit 155 uses various conventional techniques of unsupervised learning to learn the model. When the learning method is the semi-teacher learning, the learning unit 155 learns the model by appropriately using various conventional techniques of the semi-teacher learning. When the learning method is transfer learning, the learning unit 155 uses various conventional transfer learning techniques to learn the model. When the learning method is weak teacher learning, the learning unit 155 learns the model by appropriately using various conventional techniques of weak teacher learning. When the learning method is metric learning, the learning unit 155 learns a model by appropriately using various conventional techniques of metric learning. When the learning method is meta-learning, the learning unit 155 uses various conventional techniques of meta-learning as appropriate to learn the model. When the fuse shot learning is meta learning, the learning unit 155 uses various conventional techniques for fuse shot learning to learn the model.

[1-3. Information Processing Procedure According to Embodiment]
Next, a procedure of information processing according to the embodiment will be described with reference to FIG. First, the flow of learning processing according to the embodiment of the present disclosure will be described using FIG. 5. FIG. 5 is a flowchart showing a procedure of information processing according to the embodiment of the present disclosure.

As shown in FIG. 5, the information processing apparatus 100 acquires method information indicating a learning method (step S101). For example, the information processing apparatus 100 acquires method information indicating the learning method designated by the user.

The information processing apparatus 100 determines a method for supplying machine learning data based on the method information (step S102). For example, the information processing apparatus 100 determines the supply method based on the comparison between the method information and the correspondence information indicating the correspondence between the plurality of learning methods and the plurality of supply methods.

Then, the information processing apparatus 100 generates a screen for allowing the user to input the value of the parameter regarding the supply method (step S103). For example, the information processing apparatus 100 generates a screen in which recommended values of parameters regarding the supply method are arranged in the input field. Then, the information processing device 100 provides the generated screen (step S104). For example, the information processing device 100 displays the generated screen.

From here, we will explain each learning method in detail.

[1-4. Supervised learning, unsupervised learning]
First, in the case of either the supervised learning shown in FIGS. 1 and 2 or the unsupervised learning (hereinafter may be collectively referred to as “supervised learning, etc.”) with reference to FIGS. 6 to 8. The learning flow and data of will be explained. First, a learning flow such as supervised learning will be described with reference to FIG. FIG. 6 is a flowchart showing learning such as supervised learning according to the present disclosure.

As shown in FIG. 6, the information processing apparatus 100 prepares a data set (step S201). For example, the information processing apparatus 100 acquires a labeled data set or an unlabeled data set.

Then, the information processing apparatus 100 determines whether or not shuffle is designated (step S202). The shuffle here is a rearrangement of the order of labeled data or unlabeled data.

On the other hand, the information processing apparatus 100 shuffles the data set when the shuffle is designated (step S202: Yes) (step S203). If the shuffle is not designated (step S202: No), the information processing apparatus 100 performs the process of step S204.

Then, the information processing device 100 supplies data (step S204). Then, the information processing apparatus 100 determines whether the mini-batch size can be cut out (step S205). For example, one epoch is defined until a mini-batch cannot be cut out from the data set, that is, until all data in the data set is used for learning once. When the information processing apparatus 100 cannot cut out the mini-batch size (step S205: No), the information processing apparatus 100 returns to step S202 and repeats the process of providing again from the beginning, for example.

The information processing device 100 performs learning (step S206) when the mini-batch size can be cut out (step S205: Yes). The process of step S206 corresponds to a process of inputting mini-batch data to the neural network, performing loss calculation (Forward), error back propagation (Backward), and updating by the gradient method (Update).

Then, the information processing apparatus 100 determines whether it is a condition for continuing the algorithm (step S207). The termination condition of step S207 is the maximum number of iterations of mini-batch learning, the maximum number of epochs, or whether the loss value or the loss improvement amount exceeds a threshold value.

If the condition for continuing the algorithm is satisfied (step S207: Yes), the information processing apparatus 100 returns to step S206 and repeats the processing. On the other hand, when the condition for continuing the algorithm is not satisfied (step S207: No), the information processing apparatus 100 returns to step S203 and repeats the process.

For example, the information processing apparatus 100 acquires information regarding the cutout size (mini-batch size) and the convergence condition via the GUI. For example, the information processing apparatus 100 acquires the information regarding the cutout size (mini-batch size) and the convergence condition by using the technology of NeuralNetworkConsole.

Here, the data set for supervised learning will be described with reference to FIG. 7. FIG. 7 is a diagram showing an example of a data set for supervised learning according to the present disclosure.

The data DT1 which is the data set for supervised learning shown in FIG. 7 includes data (Data) and a label (Label). The label indicates the concept to be estimated from the data. For example, the label indicates the number given to the class, especially when the concept is divided into classes. The serial number (DataId) shown in FIG. 7 is an example, and the present invention is not limited to this.

For example, the information processing apparatus 100 cuts out a data block (mini-batch) from the data DT1 that is a data set for supervised learning as shown in FIG.

Next, the data set for unsupervised learning will be described using FIG. FIG. 8 is a diagram showing an example of a data set for unsupervised learning according to the present disclosure.

Data DT2, which is a data set for unsupervised learning, contains only data (Data). As described above, the unsupervised learning data set is composed of only data and does not use labels. The serial number (DataId) shown in FIG. 8 is an example, and the present invention is not limited to this.

For example, the information processing apparatus 100 cuts out a block of data (mini-batch) from the data DT2 that is a data set for unsupervised learning as shown in FIG. 8 and supplies it.

In the learning methods of supervised learning and unsupervised learning as described above, the information processing apparatus 100 performs learning by supplying a data batch (minibatch) as shown in FIG. 6, for example.

[1-5. Semi-teacher learning, weak teacher learning]
Next, the semi-teacher learning and the weak teacher learning will be described. Semi-teaching learning here is a typical example of learning in a situation where there are few labels. For example, semi-supervised learning is used when labeling is more difficult than data collection. In the semi-supervised learning, the labeled data and the unlabeled data are combined to learn a function for deriving a label from the data. The weak teacher learning here is used when it is difficult to give a label, but a relatively easy label that can be a hint of a formal label can be given. Weak teacher learning uses hinted data instead of labels to learn a function that derives a label from the data.

In “Semi-supervised learning,” both labeled and unlabeled datasets are mixed for learning. A data block (mini-batch) as shown in FIG. 9 is supplied from the labeled data set and the unlabeled data set. FIG. 9 is a diagram illustrating an example of a semi-batch learning mini-batch according to the present disclosure. Data DT3, which is a mini-batch in FIG. 9, is a combination of a data set for supervised learning and a data set for unsupervised learning.

The data supply flow for semi-supervised learning is the same as for supervised learning and unsupervised learning, but the information processing apparatus 100 independently performs mini-batch in supervised learning and unsupervised learning. In the semi-supervised learning, the mini-batch size of the labeled data set and the mini-batch size of the unlabeled data set do not necessarily have to be the same.

The setting of the learning method of semi-teacher learning can also be realized by setting the supply of labeled data and unlabeled data. Since the parameter settings for the supervised learning and the unsupervised learning described above can both be realized by the NeuralNetworkConsole, the setting of a plurality of separate data sets in the semi-supervised learning can also be realized similarly.

Regarding the above-mentioned supervised learning, unsupervised learning, and semi-supervised learning, the information processing apparatus 100 can provide an interface for easily switching the learning method to another learning method. Detailed settings on the GUI can be supported by expanding existing technologies such as Neural Network Console. For weak teacher learning, it is necessary to set "transfer learning" described later.

A representative model other than the recognition module is a generative model. The generation model is a function that generates data from random numbers, etc., and a special neural network structure called variational encoder (VAE: Variational Auto Encoder) or adversarial generation network (GAN: Generative Adversarial Network) is known. There is. Unsupervised learning is used for learning such a generative model, and is used for acquiring a method of expressing data-likeness only from data.

The VAE is disclosed in the following document, for example.
・Auto-Encoding Variational Bayes, Diederik P Kingma et al. <https://arxiv.org/abs/1312.6114>

Further, GAN is disclosed in the following document, for example.
・Generative Adversarial Networks, Ian J. Goodfellow et al. <https://arxiv.org/abs/1406.2661>

[1-6. Metric learning]
Next, metric learning will be described. Metric learning is a technique for learning a space that reflects the distance of concepts (classes). Specifically, in the metric learning, a distance relationship between the data is input as a plurality of data pairs and labels, and a function for projecting the data in a space reflecting the distance is learned. It is well known to use a structure called Siamese network (Shamises network) or Triplet network (triplet network). In this case, the information processing apparatus 100 determines the matching or non-matching of the classes of the two data belonging classes from the distance in the projected space.

The learning network of the feature extractor by metric learning is generated by combining a plurality of feature extractors (feature extraction nets) that share parameters and a distance calculation layer. For example, as networks for metric learning, there are known a Shamese network and a triplet network.

For example, when the user selects the metric learning which is the learning method LT6 from the provision list LS1, the information processing apparatus 100 acquires the method information indicating the learning method LT6 which is the metric learning. When the metric learning which is the learning method LT6 is selected, the information processing apparatus 100 provides the user with a network selection screen for selecting a network. For example, when the user selects metric learning, which is the learning method LT6, from the provision list LS1, the information processing apparatus 100 causes the user to select (designate) which of the Shamisen network and the triplet network is to be used. Provide the user with a network selection screen. For example, the information processing device 100 displays a network selection screen. When the user selects the Shamisen network, the information processing apparatus 100 acquires method information including selection information indicating that the Shamisen network has been selected. When the user selects the triplet network, the information processing apparatus 100 acquires method information including selection information indicating that the triplet network has been selected.

The Siamese network is disclosed in the following document, for example.
・Siamese Neural Networks for One-shot Image Recognition, Gregory Koch et al. <https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf>

The Triplet network is disclosed in, for example, the following document.
・Deep metric learning using Triplet network, Elad Hoffer et al. <https://arxiv.org/abs/1412.6622>

[1-6-1. Shamisen Network]
From here, the network configuration, the data set, the input screen, and the learning flow of the Shamisen network, which is an example of metric learning, will be described with reference to FIGS. For example, deep learning metric learning is used for learning the feature extractor. First, the network configuration for metric learning will be described with reference to FIG. FIG. 10 is a diagram illustrating an example of a metric learning network configuration according to the present disclosure.

The network MD4 shown in FIG. 10 is a typical configuration of a Shamisen network. The Shamisen network is composed of a feature extraction net that shares two parameters, a distance calculation layer and a loss calculation layer, and inputs two data and a label indicating the relationship between them, and inputs the feature amount of each of the two inputs. Outputs the loss calculated between the distance and the relationship label. For example, “x1” and “x2” in FIG. 10 correspond to two pieces of input data, and “y” in FIG. 10 corresponds to an input label.

In the learning by the Shamisen net, the information processing apparatus 100 supplies the data DT4, which is a block of data (mini-batch) as shown in FIG. FIG. 11 is a diagram illustrating an example of a metric learning mini-batch according to the present disclosure. Specifically, FIG. 11 is a diagram showing an example of a mini-batch for learning the Shamisen network.

As shown in the data DT4 of FIG. 11, the label is a label related to the distance between the two input data x1 and x2, unlike a normal class label. Regarding this distance label, as a simple method, if two data belong to the same class, "same" is used, and if two data belong to different classes, "different" is used. As a result, the information processing apparatus 100 can learn a function that projects data belonging to the “same” class closer in the feature space and data belonging to the “different” class far. For example, the items “Data1” and “Data2” in FIG. 11 correspond to two pieces of data, and the item “EqLabel” in FIG. 11 corresponds to a label. For example, if the data classes of the items “Data1” and “Data2” corresponding to the same DataID have the same class, the value of the item “EqLabel” is “0”. Further, for example, when the classes of the data of the item “Data1” and the data of the “Data2” corresponding to the same DataID are different, the value of the item “EqLabel” is “1”.

In order to realize this, the information processing device 100 is required to generate a label indicating “same” or “different” from a normal “class” label in terms of an algorithm. When the information processing apparatus 100 randomly selects two data classes, the ratio of the two data classes being “different” increases as the number of classes increases. Therefore, the information processing apparatus 100 needs to avoid these points.

Here, what is required for the algorithm related to learning the Shamisen network as shown in FIG. 13 is the coincidence probability α of two input classes of batch size. Therefore, for metric learning, it is sufficient if there is a UI for setting the batch size and the matching probability. Therefore, the information processing apparatus 100 generates a screen IM4 for inputting the value of the parameter “batch size” and the value of the parameter “match probability α” as shown in FIG. 12, and provides it to the user. FIG. 12 is a diagram illustrating an example of a metric learning input screen according to the present disclosure.

In FIG. 12, the information processing apparatus 100 generates a screen IM4 which is an input screen including an input field BX41 for inputting a value of the parameter “batch size” together with a character string “batch size” indicating the parameter “batch size”. The information processing apparatus 100 also generates a screen IM4 including a character string “match probability of input belonging class” indicating the parameter “match probability α” and an input field BX42 for inputting a value of the parameter “match probability α”. Then, the information processing apparatus 100 determines the value input by the user in the input field BX41 as the value of the parameter "batch size". Further, the information processing apparatus 100 determines the value input by the user in the input field BX42 as the value of the parameter “match probability α”.

For example, when the user selects the Shamisen network, the information processing apparatus 100 provides the user with a screen IM4 as shown in FIG. The information processing apparatus 100 determines the data supply method based on the method information indicating the learning method LT4 that is metric learning. For example, the information processing apparatus 100 determines, as a data supply method, input data, parameters, and the like used for metric learning of a Shamisen network, based on method information indicating a learning method LT4 that is metric learning. For example, the information processing apparatus 100 determines input data, parameters, and the like by using correspondence information indicating correspondence between a plurality of learning methods and a plurality of supply methods as shown in the correspondence information storage unit 141 in FIG.

In the example of FIG. 12, the information processing apparatus 100 determines the input data used for the metric learning of the Shamisen network as “data pair” and “relationship label”. Further, the information processing apparatus 100 determines the parameter used for the metric learning of the Shamisen network using the setting parameter information PINF4 shown in FIG. For example, the setting parameter information PINF4 includes a parameter used for metric learning of the Shamisen network, specification of whether the user inputs the value of the parameter, a recommended value of the parameter, and the like. The information processing apparatus 100 determines the parameters used for metric learning as “batch size” and “match probability α”.

Then, the information processing apparatus 100 executes the learning flow shown in FIG. 13 using the value of the parameter “batch size” and the value of the parameter “match probability α” input by the user. FIG. 13 is a flowchart showing learning of metric learning according to the present disclosure. Specifically, FIG. 13 is a flowchart showing learning of the Shamisen network according to the present disclosure.

As shown in FIG. 13, the information processing apparatus 100 prepares a data set (step S301). For example, the information processing apparatus 100 acquires a data set including data to which a label related to the distance between the two input data x1 and x2 is attached.

Then, the information processing apparatus 100 randomly selects one input class on one side (step S302). For example, the information processing apparatus 100 selects the input class on one side in a predetermined order.

Then, the information processing device 100 determines the other class (step S303). Then, the information processing apparatus 100 uses a random number to sample 0 or 1 with a predetermined probability α (step S304). For example, in the case of “0”, the information processing apparatus 100 selects the same class. In the case of “1”, the information processing apparatus 100 selects a different class. Then, the information processing apparatus 100 uses the obtained 0 or 1 as a label (distance label). The process of step S304 may be performed in step S303.

Then, the information processing apparatus 100 selects a sample of each class (step S305). For example, the information processing device 100 acquires the input data x1 and x2.

Then, the information processing apparatus 100 determines whether the batch size has been executed (step S306). If the information processing apparatus 100 has not executed the batch size (step S306: No), the information processing apparatus 100 returns to step S302 and repeats the processing.

The information processing apparatus 100, if executed for the batch size (step S306: Yes), performs learning (step S307). Then, the information processing apparatus 100 determines whether it is a condition for continuing the algorithm (step S308). The condition of step S308 is the number of repetitions, whether it is a convergence end condition, or the like.

If the condition for continuing the algorithm is satisfied (step S308: Yes), the information processing apparatus 100 returns to step S307 and repeats the processing. On the other hand, when the condition for continuing the algorithm is not satisfied (step S308: No), the information processing apparatus 100 returns to step S302 and repeats the processing.

[1-6-2. Triplet network]
From here, a network configuration, a data set, an input screen, and a learning flow of a triplet network, which is an example of metric learning, will be described with reference to FIGS. 14 to 18. For example, deep learning metric learning is used for learning the feature extractor. First, the network configuration for metric learning will be described with reference to FIG. FIG. 14 is a diagram illustrating another example of a metric learning network configuration according to the present disclosure.

The network MD5 shown in FIG. 14 is a typical configuration of a triplet network. The triplet network is composed of a feature extraction net that shares three parameters, a distance calculation layer, and a loss calculation layer. This network inputs three data and outputs loss. The three data are data called anchor (x_a), data called positive (x_p), and data called negative (x_n).

In the learning by the triplet network, the information processing apparatus 100 supplies the data DT5 which is a data batch (mini-batch) as shown in FIG. FIG. 15 is a diagram showing another example of the metric learning mini-batch according to the present disclosure. Specifically, FIG. 15 is a diagram illustrating an example of a mini-batch for learning a triplet network. For example, the item “Anchor” in FIG. 15 corresponds to anchor class data, the item “Positive” in FIG. 15 corresponds to positive class data, and the item “Negative” in FIG. 15 is negative class. Corresponding to. The anchor class and the positive class are the same class. Also, the anchor class and the negative class are different classes. For example, the data of the items “Anchor” and “Positive” corresponding to the same DataID are data belonging to the same class. Further, for example, the data of the items “Anchor” and “Negative” corresponding to the same DataID are data belonging to different classes.

As shown in data DT5 of FIG. 15, the triplet network mini-batch has no label and is composed of three data. For example, a mini-batch of a triplet network contains data corresponding to anchor classes, positive classes, and negative classes, respectively. The algorithm for sampling and learning the data DT5 as shown in FIG. 15 has a flow as shown in FIGS. 17 and 18 described later.

Here, I will explain the difference between distance learning by triplet network and distance learning by chamises network. In the case of a Shamisen network, the belonging classes of two input data tend to be biased toward different classes unless special measures are taken, and it is necessary to control the probability. On the other hand, the triplet network has a characteristic that a combination can be generated naturally. Further, as described above, the distance label is required in the Shamisen network, but the label is not required in the triplet, which is one difference.

Here, in the learning of the triplet network, it is necessary to select an algorithm shown in FIG. 17 and an algorithm called hard negative mining as shown in FIG. 18, and to make detailed settings in the case of hard negative mining. Therefore, for the learning of the triplet network, a UI for setting batch size, On/OFF switching of hard negative mining, and selection of class representative in the case of hard negative mining is sufficient. Therefore, the information processing apparatus 100 generates a screen IM5 as shown in FIG. 16 in which the value of the parameter “batch size” and the value of the parameter “hard negative mining” are input and the class representative point in the case of hard negative mining is selected. And provide it to the user. FIG. 16 is a diagram showing another example of the metric learning input screen according to the present disclosure.

In FIG. 16, the information processing apparatus 100 generates a screen IM5 which is an input screen including an input field BX51 for inputting a value of the parameter “batch size” together with a character string “batch size” indicating the parameter “batch size”. Further, the information processing apparatus 100 is a screen including an input field BX52 for inputting a value of the parameter "hard negative mining" together with a character string "hard negative mining (selectively distance neighboring classes)" indicating the parameter "hard negative mining". Generate IM5. The information processing apparatus 100 also generates a screen IM5 including a selection list CB5 which is a combo box for selecting the value of the parameter “class representative point” together with the character string “hard negative mining setting” indicating the parameter “class representative point”. .. For example, the selection list CB5 includes options corresponding to the parameter “class representative point” such as “random for each class”, “class center of gravity”, or “class median”. Then, the information processing apparatus 100 determines the value input by the user in the input field BX51 as the value of the parameter "batch size". The information processing apparatus 100 also determines the value input by the user in the input field BX52 as the value of the parameter “hard negative mining”. Further, the information processing apparatus 100 determines the value designated by the user in the selection list CB5 as the value of the parameter “class representative point”.

For example, when the user selects the triplet network, the information processing apparatus 100 provides the screen IM5 as shown in FIG. 16 to the user. The information processing apparatus 100 determines the data supply method based on the method information indicating the metric learning of the triplet network. For example, the information processing apparatus 100 determines input data, parameters, and the like used for metric learning of the triplet network as a data supply method based on method information indicating metric learning of the triplet network. For example, the information processing apparatus 100 determines input data, parameters, and the like by using the correspondence information corresponding to the metric learning of the triplet network stored in the correspondence information storage unit 141 of FIG.

In the example of FIG. 16, the information processing apparatus 100 determines the input data used for metric learning of the triplet network as “three data without labels”. Further, the information processing apparatus 100 determines the parameter used for the metric learning of the triplet network using the setting parameter information (setting parameter information PINFX) stored in the correspondence information storage unit 141 of FIG. For example, the setting parameter information PINFX includes a parameter used for metric learning of the triplet network, specification of whether the user inputs a value of the parameter, a recommended value of the parameter, and the like. The information processing apparatus 100 determines the parameters used for metric learning as "batch size", "hard negative mining", and "class representative point".

Then, the information processing apparatus 100 uses the value of the parameter “batch size”, the value of the parameter “hard negative mining”, and the value of the parameter “class representative point” input by the user to perform the learning illustrated in FIG. 17 or FIG. Run the flow. When the value of the parameter “hard negative mining” corresponds to OFF of the hard negative mining, the information processing apparatus 100 executes the learning flow of FIG. In this case, it is not necessary to select the parameter "class representative point".

FIG. 17 is a flowchart showing another learning of the metric learning according to the present disclosure. Specifically, FIG. 17 is a flowchart showing learning of a triplet network in the case of not performing hard negative mining according to the present disclosure.

As shown in FIG. 17, the information processing device 100 prepares a data set (step S401). Then, the information processing apparatus 100 randomly selects one anchor class and one positive class (step S402). The anchor class and the positive class are the same class.

Then, the information processing apparatus 100 selects one negative class so as to be different from the anchor class (step S403). The information processing apparatus 100 samples one data each from the anchor, positive class, and negative class (step S404). Then, the information processing device 100 performs learning (step S405). Then, the information processing apparatus 100 determines whether it is a condition for continuing the algorithm (step S406). The condition of step S406 is the number of repetitions, whether it is a convergence end condition, or the like.

If the condition for continuing the algorithm is satisfied (step S406: Yes), the information processing apparatus 100 returns to step S405 and repeats the processing. On the other hand, when the condition for continuing the algorithm is not satisfied (step S406: No), the information processing apparatus 100 returns to step S402 and repeats the processing.

Note that the algorithm corresponding to the flow shown in Fig. 17 allows the anchor/positive class to be selected as a negative class that is easily determined to be negative. Therefore, in the flow shown in FIG. 17, negative learning may be insufficient. Therefore, it is conceivable to selectively select a close negative as a negative class with respect to the anchor positive class, as in the processing called hard negative mining shown in FIG.

The information processing apparatus 100 executes the learning flow of FIG. 18 when the value of the parameter “hard negative mining” corresponds to ON of hard negative mining. FIG. 18 is a flowchart showing learning in the case of hard negative mining according to the present disclosure.

As shown in FIG. 18, the information processing apparatus 100 prepares a data set (step S451). Then, the information processing apparatus 100 uses the feature extractor to derive a representative point on the feature space for each class (step S452). For example, the information processing apparatus 100 obtains a representative point on the feature space for each class by using the feature extractor at the current time (processing time).

Then, the information processing apparatus 100 generates a list of pairs of classes having the closest feature points for each class (step S453). The information processing apparatus 100 randomly selects one anchor class and one positive class (step S454). The anchor class and the positive class are the same class.

Then, the information processing apparatus 100 sets the negative class to the class closest to the anchor class (step S455). The information processing apparatus 100 samples one data each from the anchor, positive class, and negative class (step S456).

Then, the information processing device 100 performs learning (step S457). Then, the information processing apparatus 100 determines whether it is a condition for continuing the algorithm (step S458). The condition of step S458 is the number of repetitions, whether it is a convergence end condition, or the like.

If the condition for continuing the algorithm is satisfied (step S458: Yes), the information processing apparatus 100 returns to step S457 and repeats the processing. On the other hand, if the condition for continuing the algorithm is not satisfied (step S458: No), the information processing apparatus 100 returns to step S452 and repeats the processing.

[1-7. Meta learning]
The meta-learning here corresponds to learning of a learning algorithm. That is, the concept of meta-learning is learning of a learning algorithm. Attention is focused on learning with a small amount of data, which is so-called one-shot learning or fushot learning. One-shot learning and fushot learning are learning with a small amount of data, and the meta-learning for these learning will be described below.

First, the one-shot learning and the fushot learning described here are typical examples of learning with a small amount of data. For example, in the one-shot learning or the fushot learning, the concept of the class is learned from a very small number of labeled data such as 1 to several samples, and the classification of the class is realized when the data is newly input. As described above, the one-shot learning (fuse shot learning) is merely a setting that the amount of data is small, and is not a learning technique. Although it is a small amount of data, normal supervised learning may be performed. However, in that case, there is a problem that the "concept" of the class cannot be statistically acquired due to too small amount of data, and generalization and overlearning occur. In order to avoid this, the model using "meta-learning" is often learned in advance.

≪Meta-learning here means learning that learns the learning algorithm itself. For example, in the meta-learning for one-shot learning, the one-shot learning “learning a small amount of learning data and classifying test data” algorithm itself is learned. The methods include "metric learning" based methods and "transfer learning" based (also called gradient based) methods.

In the metric learning-based meta-learning here, learning is performed using the learning data and the label at the time of one-shot learning setting, instead of the pair of data and the label for it. By projecting the learning data of the one-shot learning and the test data in this space and checking the distance, it is possible to estimate which class of the learning data the test data belongs to.

As described above, the one-shot learning and the fushot learning are learning when the learning data per class is extremely small, from 1 to several samples. The processing of the one-shot learning will be described. In the one-shot learning, the information processing apparatus 100 selects a class to be learned. Such classes are called support classes. Then, the information processing apparatus 100 selects one sample or several samples per class as a data set. Such a data set is called a support set. The information processing apparatus 100 uses the support set to construct a support class classification model. Such processing is called one-shot learning or fuse-shot learning.

The information processing device 100 selects one or more samples to be predicted. Such a sample is called a query set. Also, the query class is usually guaranteed to be one of the support classes. The information processing apparatus 100 classifies the query into any of the support classes.

As shown in the flow above, one-shot learning and future shot learning consist of a support set and a query set. Such support sets and query sets are called episodes in one-shot learning.

For example, the information processing apparatus 100 randomly selects some support classes. The number of support classes is called N-way. For example, if there are five support classes, it is called 5-way. The information processing apparatus 100 randomly selects a predetermined number of samples for each support class. The number of samples selected for each class is referred to as N-shot. For example, when the number of samples selected for each class is 1, it is called 1-shot.

The information processing device 100 randomly selects a query class from the support classes. Further, the information processing apparatus 100 randomly selects some samples from this query class. Then, the information processing apparatus 100 combines these to generate an episode.

From here, I will explain the meta-learning for one-shot learning. As described above, the one-shot learning is merely learning when there is a small amount of learning data, and various approaches such as using a neural net or hierarchical Bayes can be considered. However, no matter which approach is adopted, it is difficult to sufficiently reflect the statistical characteristics in the model created with only a small amount of training data. Therefore, in the model generated by the one-shot learning, there is a problem that it is not possible to properly classify slightly different data from the learning data. Such a problem is called "cannot generalize" or "over-learning occurs" because there is little data.

In order to solve the above problems, it is effective to acquire knowledge in advance. The knowledge referred to here is, for example, to learn in advance in one-shot classification of “handwritten characters” how to distinguish between a character and whether it is the same or different. The knowledge in this case corresponds to the knowledge of remembering the set of characters seen for the first time and classifying them the next time, on the assumption that they are characters.

For example, in one-shot learning meta-learning, knowledge can be acquired by preparing episodes of various handwritten characters and learning so that queries can be correctly classified with these episodes. In this case, the information processing apparatus 100 can estimate the support class of the query data for the support set of the handwritten character data set for the first time. Such an algorithm will be described in detail in FIG.

Regarding the one-shot learning (fushot learning), for example, the following documents are disclosed.
・Matching Networks for One Shot Learning, Oriol Vinyals et al. <https://arxiv.org/abs/1606.04080>
・Meta-Transfer Learning for Few-Shot Learning, Qianru Sun et al. <https://arxiv.org/abs/1812.02391>

Further, for meta-learning, for example, the following documents are disclosed.
・Deep Meta-Learning: Learning to Learn in the Concept Space, Fengwei Zhou et al. <https://arxiv.org/abs/1802.03596>
・Learning to Generalize: Meta-Learning for Domain Generalization, Da Li et al. <https://arxiv.org/abs/1710.03463>

Here, the network configuration of meta-learning, the data set, the input screen, and the learning flow will be described with reference to FIGS. 19 to 22. First, the network configuration for meta-learning will be described with reference to FIG. FIG. 19 is a diagram illustrating an example of a network configuration of meta learning according to the present disclosure.

The network MD6 shown in FIG. 19 shows an example of a meta-learning network for inputting episodes. The metric learning-based meta-learning constitutes a network such as the network MD6.

As shown in network MD6 of FIG. 19, metric-based meta-learning includes a network for inputting a query (xq) and a network for inputting support (xs). Then, the network MD6 inputs a query and support of the corresponding class for each support class and calculates a block like the likelihood of the query (also referred to as logit), and these likelihoods (logit). , A block for calculating the loss by inputting the label (y) of the support class.

Matching networks and prototype networks are typical networks for metric-based meta-learning. There is a difference between the matching network and the prototype network in the blocks that calculate the likelihood of queries and support. For example, in the case of a matching network, the information processing apparatus 100 calculates the distance to the support of the nearest neighbor (softmax) of the query in the feature space. On the other hand, in the case of the prototype network, the information processing apparatus 100 calculates the distance between the query and the average support (prototype). As the distance index, various indices such as cosine distance, Euclidean distance, or extension thereof may be used.

Metric-based meta-learning includes a matching network that includes learning of class classification by the neighborhood method (nearest neighbor method) as fushot learning, and a prototype network that includes learning of prototype classification as fushot learning. As a method inferred from it, there may be a method such as a voting network that includes learning of the majority method. In this case, the information processing apparatus 100 calculates the distance between the query and each support, and simply adds the distances to use as the likelihood (logit) of the support class for the query.

Note that when the number of shots is 1, the blocks for calculating the likelihood of the query and the support are the same, and the information processing apparatus 100 simply calculates the distance between the support and the query. This network configuration is equivalent to an extended version of the Triplet network called the n-pair network.

In the learning by meta-learning, the information processing apparatus 100 supplies the data DT6 which is a data batch (mini-batch) as shown in FIG. FIG. 20 is a diagram showing an example of a meta-learning mini-batch according to the present disclosure. Specifically, FIG. 20 is a diagram showing an example of a mini-batch of meta-learning using episodes.

Data DT6 in FIG. 20 shows the structure of the episode. The data DT6 of FIG. 20 shows an episode when the number of classes is 1, for simplification of description, but the number of classes is usually larger than 1, and each unit is repeated.

Here, the algorithm related to meta-learning as shown in FIG. 22 requires the number of support classes, the number of shots for each class (also simply referred to as “shot number”), and the number of queries for each class (simply “query”). It is also called the "number") and the number of repetitions of learning. Therefore, for meta-learning, a UI for setting the number of support classes, the number of shots, the number of queries, and the number of learning repetitions is sufficient.

Therefore, when the user selects "meta-learning", the transition of the input screen shown in Fig. 21 can be considered. FIG. 21 is a diagram illustrating an example of a meta-learning input screen according to the present disclosure. The information processing apparatus 100 displays a screen IM6 for inputting the value of the parameter "number of support classes", the value of the parameter "shot number", the value of the parameter "query number", and the value of the parameter "repetition number" as shown in FIG. Is generated and provided to the user.

In FIG. 21, the information processing apparatus 100 generates a screen IM6 which is an input screen including a character string “support class number” indicating the parameter “support class number” and an input field BX61 for inputting the value of the parameter “support class number”. To do. The information processing apparatus 100 also generates a screen IM6 including a character string “shot number/class” indicating the parameter “shot number” and an input field BX62 for inputting the value of the parameter “shot number”. The information processing apparatus 100 also generates a screen IM6 including a character string “query count/class” indicating the parameter “query count” and an input field BX63 for inputting the value of the parameter “query count”. The information processing apparatus 100 also generates a screen IM6 including a character string “learning repetition number” indicating the parameter “repetition number” and an input field BX64 for inputting the value of the parameter “repetition number”.

Then, the information processing apparatus 100 determines the value input by the user in the input field BX61 as the value of the parameter “number of support classes”. Further, the information processing apparatus 100 determines the value input by the user in the input field BX62 as the value of the parameter “number of shots”. In addition, the information processing apparatus 100 determines the value input by the user in the input field BX63 as the value of the parameter “query count”. Further, the information processing apparatus 100 determines the value input by the user in the input field BX64 as the value of the parameter “repetition number”.

For example, when the user selects the meta learning that is the learning method LT7 or the fushot shot learning that is the learning method LT8 in the provision list LS1, the information processing apparatus 100 provides the screen IM6 as shown in FIG. 21 to the user. .. For example, the information processing apparatus 100 acquires method information indicating the learning method LT7 that is meta-learning.

The information processing apparatus 100 determines the data supply method based on the method information indicating the learning method LT7 that is meta-learning. For example, the information processing apparatus 100 determines input data, parameters, and the like used for meta-learning as the data supply method based on the method information indicating the learning method LT7 that is meta-learning. For example, the information processing apparatus 100 determines input data, parameters, and the like by using correspondence information indicating correspondence between a plurality of learning methods and a plurality of supply methods as shown in the correspondence information storage unit 141 in FIG.

In the example of FIG. 21, the information processing apparatus 100 determines the input data used for meta-learning as “support data” and “query data”. Further, the information processing apparatus 100 determines the parameter used for meta-learning by using the setting parameter information PINF7 shown in FIG. For example, the setting parameter information PINF7 includes a parameter used for meta-learning, designation of whether to input the value of the parameter to the user, recommended value of the parameter, and the like. The information processing apparatus 100 determines the parameters used for meta-learning as “the number of support classes”, “the number of shots”, “the number of queries”, and “the number of repetitions”.

For example, the information processing apparatus 100 uses the value of the parameter “number of support classes”, the value of the parameter “shot number”, the value of the parameter “query number”, and the value of the parameter “repetition number” input by the user, as illustrated in FIG. The learning flow shown in is executed. FIG. 22 is a flowchart showing learning of meta learning according to the present disclosure.

As shown in FIG. 22, the information processing apparatus 100 prepares a data set (step S501). In this case, a data set with a large number of classes is desirable. Then, the information processing apparatus 100 randomly selects a predetermined number of support classes (step S502). For example, the information processing apparatus 100 selects a predetermined number of support classes greater than one.

Then, the information processing apparatus 100 randomly selects a query class from the support classes (step S503). The information processing apparatus 100 generates a query set that selects several samples from each support class (step S504).

Then, the information processing apparatus 100 selects several samples from each query class to generate a query set (step S505). The information processing apparatus 100 generates an episode in which the support set and the query set are combined (step S506). The information processing apparatus 100 inputs the support set and the query set into the meta learning network (step S507).

Then, the information processing device 100 performs meta learning (step S508). Then, the information processing apparatus 100 determines whether it is a condition for continuing the algorithm (step S509). The condition of step S509 is the number of repetitions, whether it is a convergence end condition, or the like.

If the condition for continuing the algorithm is satisfied (step S509: Yes), the information processing apparatus 100 returns to step S508 and repeats the processing. On the other hand, when the condition for continuing the algorithm is not satisfied (step S509: No), the information processing apparatus 100 returns to step S502 and repeats the processing.

[1-8. Transfer learning]
The transfer learning referred to here is a technique for learning a new data set by diverting a trained model. Transfer learning has a feature that knowledge obtained from pre-learned data can be diverted. In the case of a neural network, the information processing apparatus 100 regenerates the final fully connected layer of the neural network learned with a large amount of data (for example, ImageNet or the like) according to the class of the new learning data set, and the fully connected layer, Or re-learn the whole thing.

Regarding transfer learning, for example, the following disclosures are made.
・Transfer Learning-Machine Learning's Next Frontier, Sebastian Ruder <http://ruder.io/transfer-learning/>

Transfer learning-based meta-learning (also called gradient-based meta-learning) is "pre-learning of transfer learning" that allows one-shot learning to be performed by "one update" in transfer learning. is there. In this "pre-learning of transfer learning", unlike normal transfer learning, a randomized one-shot learning data set (learning data and test data for one-shot) is used to test with one parameter update from the learning data. Learn to classify data correctly.

When the user selects “transfer learning”, the transition input screen may be as shown in FIG. 23. FIG. 23 is a diagram illustrating an example of a transfer learning input screen according to the present disclosure. The information processing apparatus 100 generates a screen IM7 for inputting the value of the parameter “learned model path”, the value of the parameter “layer to be transferred”, and the value of the parameter “transfer model name” as shown in FIG. Provide to users.

In FIG. 23, the information processing apparatus 100 is an input screen including an input field BX71 for inputting the value of the parameter “learned model path” together with the character string “learned model Path” indicating the parameter “learned model path”. Generate IM7. The information processing apparatus 100 also generates a screen IM7 including a selection list CB7 that is a pull-down menu for selecting the value of the parameter “layer to be transferred” together with the character string “layer to be transferred” indicating the parameter “layer to be transferred”. To do. For example, the selection list CB7 in which the list is slid by the slide bar includes options for specifying layers (layers) such as "Convolution_5", "BatchNormalization_5", "Rule_5", and "Affine_6". The information processing apparatus 100 also generates a screen IM7 including a character string “transition model name” indicating the parameter “transition model name” and an input field BX72 for inputting the value of the parameter “transition model name”.

The input field BX71 corresponding to the "learned model path" here is a place where the folder or directory in which the learned model files are stored is input. In the selection list CB7 corresponding to the “layer to be transferred” here, the layers from the input of the learned model to the output are pull-down menus, which can be viewed via a slide bar or the like. If one of the pull-down menus in the selection list CB7 is selected, it is possible to instruct to cut out a module having this layer as the final layer.

Also, the input field BX72 corresponding to the "transition model name" here is a place for designating the name to be given to the module cut out via the selection list CB7 which is a pull-down menu. The network name input to the input field BX72 is linked with the module name in the network design portion, and the setting is made in a format fitted in that portion.

Then, the information processing apparatus 100 determines the value input by the user in the input field BX71 as the value of the parameter “learned model path”. Further, the information processing apparatus 100 determines the value designated by the user in the selection list CB7 as the value of the parameter “layer to be transferred”. In addition, the information processing apparatus 100 determines the value input by the user in the input field BX72 as the value of the parameter “transition model name”.

For example, when the user selects the transfer learning which is the learning method LT4 in the provision list LS1, the information processing apparatus 100 provides the screen IM7 as shown in FIG. 23 to the user. For example, the information processing apparatus 100 acquires method information indicating the learning method LT4 that is transfer learning.

The information processing apparatus 100 determines a data supply method based on method information indicating a learning method LT4 that is transfer learning. For example, the information processing apparatus 100 determines input data, parameters, and the like used for transfer learning as a data supply method based on method information indicating a learning method LT4 that is transfer learning. For example, the information processing apparatus 100 determines input data, parameters, and the like by using correspondence information indicating correspondence between a plurality of learning methods and a plurality of supply methods as shown in the correspondence information storage unit 141 in FIG.

In the example of FIG. 23, the information processing apparatus 100 determines the input data used for transfer learning as “preliminary learning model” and “labeled data”. Further, the information processing apparatus 100 determines the parameter used for transfer learning using the setting parameter information PINF4 shown in FIG. For example, the setting parameter information PINF4 includes a parameter used for transfer learning, specification of whether the user inputs a value of the parameter, a recommended value of the parameter, and the like. The information processing apparatus 100 determines the parameters used for transfer learning as “learned model path”, “layer to be transferred”, and “transfer model name”.

The information processing apparatus 100 may also provide the user with a network design screen as shown in FIG. The information processing device 100 generates a screen IM71 as shown in FIG. FIG. 24 is a diagram showing an example of a transfer learning network design screen according to the present disclosure. A screen IM71, which is a network design screen in FIG. 24, includes a plurality of blocks such as an input layer, a transfer module, an additional layer 1 (total coupling layer), and an additional layer 2 (loss). The location of the transfer module in the screen IM71 corresponds to the transfer layer specified in FIG. For example, the transfer module in the screen IM71 is given the name of the transfer model specified in FIG.

In the case of transfer learning, the learning data supply method is determined based on what you want to learn by setting transfer learning. For example, there are various patterns such as "transfer learning" and "supervised learning" or "semi-supervised learning". Therefore, the information processing apparatus 100 may determine the type of learning method after these settings are made.

[2. Other Embodiments]
The processing according to each of the above-described embodiments may be implemented in various different modes (modifications) other than each of the above-described embodiments. The information processing device that determines the data providing method is not limited to the example described above, and may have various modes. This point will be described with reference to FIGS. 25 to 30. Note that, in the following, description of the same points as those of the information processing apparatus 100 according to the embodiment will be appropriately omitted.

[2-1. Modification 1 (other configuration example)]
For example, in the above-described example, the information processing device that performs information processing is the information processing device 100, but the information processing device and the terminal device that displays the GUI may be separate entities. This point will be described with reference to FIGS. 25 and 26. FIG. 25 is a diagram illustrating a configuration example of the information processing system according to the modified example of the present disclosure. FIG. 26 is a diagram illustrating a configuration example of the information processing device according to the modified example of the present disclosure.

As shown in FIG. 25, the information processing system 1 includes a terminal device 10 and an information processing device 100A. The terminal device 10 and the information processing device 100A are communicably connected to each other via a network N in a wired or wireless manner. Note that the information processing system 1 illustrated in FIG. 25 may include a plurality of terminal devices 10 and a plurality of information processing devices 100A. In this case, the information processing apparatus 100A communicates with the terminal device 10 via the network N, provides information to the terminal device 10, and based on information such as parameters specified by the user via the terminal device 10, the model You may also study.

The terminal device 10 is an information processing device used by a user. The terminal device 10 is realized by, for example, a notebook PC (Personal Computer), a desktop PC, a smartphone, a tablet terminal, a mobile phone, a PDA (Personal Digital Assistant), or the like. The terminal device 10 may be any terminal device as long as it can display the information provided by the information processing device 100A.

Further, the terminal device 10 receives an operation by the user. In the example illustrated in FIG. 25, the terminal device 10 displays the information provided by the information processing device 100A on the screen. In addition, the terminal device 10 transmits information such as parameter values input by the user to the information processing device 100A.

The information processing apparatus 100A provides information to the terminal apparatus 10 and implements the same information processing as the information processing apparatus 100, except that the information processing apparatus 100A differs from the information processing apparatus 100 in that the values of the parameters acquired from the terminal apparatus 10 are used. ..

As shown in FIG. 26, the information processing device 100A includes a communication unit 11, a storage unit 14, and a control unit 15A. The communication unit 11 is connected to a network N (Internet or the like) by wire or wirelessly, and transmits/receives information to/from the terminal device 10 via the network N. In this case, the information processing apparatus 100A may not have the GUI function like the information processing apparatus 100. The information processing apparatus 100A may include an input unit (for example, a keyboard or a mouse) or an output unit (for example, a liquid crystal display) used by an administrator of the information processing apparatus 100A. Further, the information processing apparatus 100A may not have an input unit or an output unit for displaying various information when it does not accept various operations from an administrator or the like who manages the information processing apparatus 100A.

The control unit 15A is realized by, for example, a CPU, an MPU, or the like executing a program (for example, the information processing program according to the present disclosure) stored inside the information processing apparatus 100A using a RAM or the like as a work area. The control unit 15A is a controller, and may be realized by an integrated circuit such as an ASIC or FPGA.

As shown in FIG. 26, the control unit 15A includes an acquisition unit 151A, a determination unit 152, a generation unit 153, a provision unit 154A, and a learning unit 155, and the functions and actions of information processing described below. Realize or execute. The internal configuration of the control unit 15A is not limited to the configuration shown in FIG. 26, and may be another configuration as long as it is a configuration for performing information processing described later.

The acquisition unit 151A acquires various types of information, similar to the acquisition unit 151. The acquisition unit 151A acquires various kinds of information from the terminal device 10. The acquisition unit 151A acquires the parameter value from the terminal device 10. The acquisition unit 151A acquires various types of information from the storage unit 14.

The providing unit 154A provides various information similarly to the providing unit 154. The providing unit 154A provides various information to the terminal device 10. The providing unit 154A transmits various information to the terminal device 10. The providing unit 154A provides the terminal device 10 with the input screen generated by the generating unit 153.

[2-2. Modification 2 (estimation of learning method)]
In the example of FIG. 1, the case where the user specifies the learning method is shown, but the learning method may be estimated, and the data providing method may be determined based on the method information indicating the estimated learning method. This point will be described with reference to FIG. FIG. 27 is a diagram illustrating another configuration example of the information processing device according to the modified example of the present disclosure.

As shown in FIG. 27, the information processing device 100B has a communication unit 11, an input unit 12, an output unit 13, a storage unit 14, and a control unit 15B. In the example of FIG. 27, the information processing device 100B is different from the information processing device 100 in that the information processing device 100B includes an estimation unit 156.

Return to FIG. 27 and continue the explanation. The control unit 15B is realized by, for example, a CPU or the like executing a program (for example, an information processing program according to the present disclosure) stored inside the information processing apparatus 100B using a RAM or the like as a work area. Further, the control unit 15B may be realized by, for example, an integrated circuit such as ASIC or FPGA.

As illustrated in FIG. 27, the control unit 15B includes an acquisition unit 151B, a determination unit 152B, a generation unit 153, a provision unit 154, a learning unit 155, and an estimation unit 156, and information described below. Realize or execute processing functions and actions. The internal configuration of the control unit 15B is not limited to the configuration shown in FIG. 27, and may be another configuration as long as it is a configuration for performing information processing described later.

The acquisition unit 151B acquires method information indicating the learning method estimated by the estimation unit 156. The acquisition unit 151B acquires learning data used for machine learning. The acquisition unit 151B acquires information indicating a network learned by machine learning.

The decision unit 152B makes various decisions in the same manner as the decision unit 152. The determining unit 152B determines the machine learning data supply method based on the method information indicating the learning method estimated by the estimating unit 156. The determination unit 152B determines the machine learning data supply method based on the method information indicating the learning method acquired by the acquisition unit 151B.

The estimation unit 156 makes various estimations. The estimation unit 156 estimates various types of information based on the information acquired by the acquisition unit 151. The estimation unit 156 estimates various types of information based on the information stored in the storage unit 14.

The estimation unit 156 estimates the learning method based on the learning data. The estimation unit 156 estimates the learning method based on the information indicating the structure of the learning data. For example, the estimation unit 156 estimates the learning method based on the learning data set information. For example, the estimation unit 156 may estimate a domain based on the extension of a file included in the learning data set, and estimate that the learning method is based on the estimated domain.

The estimation unit 156 may estimate the learning method based on the learning data set acquired by the acquisition unit 151 and the information indicating the structure of the data set used for each learning method. The estimation unit 156 may estimate the learning method as unsupervised learning or triplet network metric learning when the learning data set does not include a label. For example, the estimation unit 156 may estimate the learning method as unsupervised learning when the learning data set does not include a label and has two data pairs. For example, the estimation unit 156 may estimate that the learning method is triplet network metric learning when the learning data set does not include a label and is a set of three data. The estimation unit 156 may estimate the learning method as semi-supervised learning when the learning data set includes both labeled and unlabeled data. When the learning data set includes a label, the estimation unit 156 may estimate the learning method to be supervised learning or Shame's network metric learning. For example, when the learning data set includes a label and the information indicates whether or not the label is in the same class, the estimation unit 156 may estimate the learning method as the metric learning of the Shamisen network. Note that the above is an example, and the estimation unit 156 may estimate the learning method by appropriately using various information.

The estimation unit 156 estimates the learning method based on the information indicating the network. For example, the estimation unit 156 estimates from the learning data set that the problem that the user is trying to solve is a regression problem, and estimates that the learning method is a learning method corresponding to the regression problem. For example, the estimation unit 156 may estimate that the problem that the user is trying to solve is a regression problem when the label of the learning data set is a real value.

For example, when the label type is a continuous value (when the label value is a decimal value), the estimation unit 156 estimates that the regression problem is about to be solved and selects the square error function as the loss function. For example, the estimation unit 156 estimates from the learning data set that the problem that the user is trying to solve is a classification problem, and estimates that the learning method is a learning method corresponding to the classification problem. For example, when the label of the learning data set has an integer value, the estimation unit 156 may estimate that the problem that the user is trying to solve is a classification problem. For example, the estimation unit 156 may estimate that the problem that the user is trying to solve is a multi-value classification problem when there are two or more integer values that represent the label of the learning data set. For example, the estimation unit 156 may estimate that the problem that the user is trying to solve is a binary classification problem when there are two types of integer values that represent the labels of the learning data set.

For example, the information regarding the learning data or the information indicating the network may be acquired through the screen as shown in FIG. FIG. 28 is a diagram illustrating an example of a display screen of network design support according to the present disclosure. For example, FIG. 28 may be a network design screen of NeuralNetworkConsole.

A toolbar in which buttons used for selecting a tool are arranged is displayed above the screen IM2 shown in FIG. 28, and a first area AR21 and a second area AR22 are provided below the toolbar.

The rectangular first area AR21 provided at the left end in FIG. 28 is an area used for selection of various components that configure the network. In the example of FIG. 28, each component is displayed for each category such as “IO”, “Loss”, “Parameter”, “Basic”, and “Pooling”.

For example, "Loss" components include "SquaredError", "HuberLoss", "AbsoluteError", etc. The components of "Parameter" include "Parameter", "Working Memory" and the like. Further, the "Basic" component includes "Affine", "Convolution", "Deconvolution", "Embed", and the like.

The second area AR22 is an area in which a network designed using the components shown in the first area AR21 is displayed. In the example of FIG. 28, the case where each component of “Input”, “Affine”, “Sigmoid”, and “BinaryCrossEntropy” is selected in order is shown, and blocks BK1 to BK4 representing each component are displayed side by side. The block BK1 shown in FIG. 28 corresponds to the input layer, the block BK2 corresponds to the linear layer, the block BK3 corresponds to the activation layer, and the block BK4 corresponds to the loss function layer. As described above, blocks BK1 to BK4 shown in FIG. 28 represent networks (learners) including an input layer, a linear layer, an activation layer, and a loss function layer. ‥

Then, when a learning data set is specified and the user instructs execution of learning, learning using this network is performed. In this way, the user can design the network by selecting the component from the first area AR21.

For example, the estimation unit 156 may estimate the learning method based on the network generated by the screen IM2 as shown in FIG. The estimation unit 156 may estimate the learning method based on the information indicating the network acquired by the acquisition unit 151 and the information indicating the type of the network learned by each learning method.

[2-3. Modification 3 (use within development framework)]
Further, in the above-described example, the GUI has been described as an example, but any providing mode may be used as long as a service related to information processing such as determination of a data providing method can be provided. For example, the information processing described above may be made available within a development framework using ordinary coding.

For example, a development environment (development framework) using a programming language such as C language or Python is provided with various development frameworks (also simply referred to as “framework”) such as Tensorflow. For example, in addition to Tensorflow, various frameworks such as Theano, Caffe, Torch, Chainer, Keras, CNTK, MxNet, PyTorch, NNabla are provided. These frameworks design neural networks and set learning by using programming languages.

In many cases, these frameworks have basic functions such as functions and solvers. As the learning samples actually used, samples in which a learning execution loop including a graph designing unit, an optimization setting unit, a data supply setting unit, and a performance evaluation unit are described are prepared.

In the example shown in FIG. 29, in addition to the above-mentioned functions, it relates to the data supply in the learning execution sample. For the data supply, many data supply setting classes are prepared. FIG. 29 is a diagram illustrating an example of a provision mode of information processing according to the modified example of the present disclosure. In the framework configuration ST2 shown in FIG. 29, a data provider group is provided for data supply.

Even at present, there is a development framework in which multiple classes are prepared for the data provider group (eg NNabla). However, those classes are only variations regarding hardware and system settings such as what kind of source to read data from, and whether to read from a file that can support the data supply method in distributed learning.

On the other hand, in the framework configuration ST2 of FIG. 29, as a data provider group, a data provider group specific to various learning algorithms is prepared as a lineup. The data providers of the data provider group may be classes to which learning algorithms such as meta learning data provider and metric learning data provider are explicitly added. Further, the data providers of the data provider group may be classes to which a data supply method such as a Shamisen net data generator, a triplet generator, an episode generator, etc. is added.

[2-4. Modification 4 (provided as a standalone application)]
You may provide only the function which performs the information processing regarding the above-mentioned data supply method. For example, the function of executing information processing relating to the data supply method may be provided as a single application without being plugged into an individual development environment.

As described above, the specific implementation of information processing regarding the determination of the data providing method is not limited to the addition of the UI and the function in the GUI development framework (Visual Programming environment) as shown in FIG. Specifically, as shown in FIG. 29, by adding a function (function) to an existing well-known development framework (such as sdk), or by providing it as a single application as described above, There may be various embodiments such as a developer independently constructing and implementing as a support tool for deep learning.

The various information processings described above enable deep learning developers to easily realize different data supply procedures according to various learning methods without deep knowledge. Here, the data supply portion has a structure independent of the structure unique to the learning of the neural network. For example, the data supply part has a variable structure having both data and gradient. Therefore, the deep learning development environment that can be used by the above-mentioned information processing is easy for intermediate programming people, such as a development environment or a framework that can be easily developed even by a beginner in programming such as GUI. It is useful not only for the development environment that can be done, but also for advanced users who want to build a deep learning development environment by themselves.

By realizing the data supply development mechanism for each of these learning methods as a stand-alone application or as a built-in application within a development application, module development by deep learning is further accelerated, and not only developers of deep learning modules, It is expected to bring a wide range of beneficial effects to users of the deep learning module and eventually to users.

Further, of the processes described in the above embodiments, all or part of the processes described as being automatically performed may be manually performed, or the processes described as being manually performed. All or part of the above can be automatically performed by a known method. In addition, the processing procedures, specific names, and information including various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each drawing is not limited to the illustrated information.

Also, each component of each device shown in the drawings is functionally conceptual, and does not necessarily have to be physically configured as shown. That is, the specific form of distribution/integration of each device is not limited to that shown in the figure, and all or part of the device may be functionally or physically distributed/arranged in arbitrary units according to various loads and usage conditions. It can be integrated and configured.

Also, the above-described respective embodiments and modified examples can be appropriately combined within a range in which the processing content is not inconsistent.

Also, the effects described in this specification are merely examples and are not limited, and there may be other effects.

[3. Hardware configuration]
The information devices such as the

information processing devices

100, 100A, and 100B according to the above-described embodiments are realized by the computer 1000 having the configuration illustrated in FIG. 30, for example. FIG. 30 is a hardware configuration diagram illustrating an example of a computer 1000 that realizes the functions of the information processing apparatuses such as the

information processing apparatuses

100, 100A, and 100B. Hereinafter, the information processing apparatus 100 according to the embodiment will be described as an example. The computer 1000 has a CPU 1100, a RAM 1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input/output interface 1600. The respective units of the computer 1000 are connected by a bus 1050.

The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400, and controls each part. For example, the CPU 1100 expands a program stored in the ROM 1300 or the HDD 1400 into the RAM 1200 and executes processing corresponding to various programs.

The ROM 1300 stores a boot program such as a BIOS (Basic Input Output System) executed by the CPU 1100 when the computer 1000 starts up, a program dependent on the hardware of the computer 1000, and the like.

The HDD 1400 is a computer-readable recording medium that non-temporarily records a program executed by the CPU 1100, data used by the program, and the like. Specifically, the HDD 1400 is a recording medium that records an information processing program according to the present disclosure, which is an example of the program data 1450.

The communication interface 1500 is an interface for connecting the computer 1000 to an external network 1550 (for example, the Internet). For example, the CPU 1100 receives data from another device or transmits the data generated by the CPU 1100 to another device via the communication interface 1500.

The input/output interface 1600 is an interface for connecting the input/output device 1650 and the computer 1000. For example, the CPU 1100 receives data from an input device such as a keyboard or a mouse via the input/output interface 1600. The CPU 1100 also transmits data to an output device such as a display, a speaker, a printer, etc. via the input/output interface 1600. The input/output interface 1600 may also function as a media interface for reading a program or the like recorded in a predetermined recording medium (medium). Examples of media include optical recording media such as DVD (Digital Versatile Disc) and PD (Phase change rewritable Disk), magneto-optical recording media such as MO (Magneto-Optical disk), tape media, magnetic recording media, and semiconductor memory. Is.

For example, when the computer 1000 functions as the information processing device 100 according to the embodiment, the CPU 1100 of the computer 1000 realizes the functions of the control unit 15 and the like by executing the information processing program loaded on the RAM 1200. Further, the HDD 1400 stores the information processing program according to the present disclosure and the data in the storage unit 14. Note that the CPU 1100 reads the program data 1450 from the HDD 1400 and executes the program data, but as another example, these programs may be acquired from another device via the external network 1550.

Note that the present technology may also be configured as below.
(1)
An acquisition unit that acquires method information indicating a learning method related to machine learning,
Based on the method information acquired by the acquisition unit, a determination unit that determines a method of supplying the machine learning data,
An information processing apparatus including.
(2)
The determination unit is
Determining the supply method based on a comparison between the method information and correspondence information indicating correspondence between a plurality of learning methods and a plurality of supply methods,
The information processing device according to (1) above.
(3)
The determination unit is
Determining a parameter related to the supply method that allows a predetermined user to input a value, based on the method information,
The information processing apparatus according to (1) or (2) above.
(4)
The determination unit is
Determining, based on parameter information corresponding to the learning method, the parameter that allows the predetermined user to input a value;
The information processing device according to (3).
(5)
A generation unit that generates an input screen for inputting the value of the parameter determined by the determination unit,
With
The information processing apparatus according to (3) or (4) above.
(6)
A providing unit that provides the input screen generated by the generating unit,
With
The information processing device according to (5) above.
(7)
The determination unit is
Determining a recommended value of the parameter recommended to the predetermined user based on the method information,
The information processing apparatus according to any one of (3) to (6) above.
(8)
The determination unit is
Determining the recommended value based on the performance of the device that executes the machine learning,
The information processing device according to (7).
(9)
The determination unit is
Determining the recommended value based on the network learned by the machine learning,
The information processing device according to (7) or (8).
(10)
The acquisition unit is
Obtaining the method information from a user who specifies the learning method,
The information processing apparatus according to any one of (1) to (9) above.
(11)
The acquisition unit is
Acquiring the method information indicating the learning method selected by the user from a plurality of learning methods,
The information processing device according to (10).
(12)
An estimation unit that estimates the learning method,
Equipped with
The acquisition unit is
Acquiring the method information indicating the learning method estimated by the estimation unit,
The information processing apparatus according to any one of (1) to (9) above.
(13)
The acquisition unit is
Acquire the learning data used for the machine learning,
The estimation unit is
Estimating the learning method based on the learning data,
The information processing device according to (12).
(14)
The estimation unit is
Estimating the learning method based on information indicating the structure of the learning data,
The information processing device according to (13).
(15)
The determination unit is
Determining a network to be learned by the machine learning based on the supply method,
The information processing apparatus according to any one of (1) to (14) above.
(16)
The determination unit is
Determining a network to be learned by the machine learning based on a parameter corresponding to the supply method,
The information processing apparatus according to any one of (1) to (15) above.
(17)
The determination unit is
Determining the structure of the network based on the supply data supplied by the supply method,
The information processing device according to (15) or (16).
(18)
The determination unit is
Determining the structure of the network based on the domain of the supply data,
The information processing device according to (17).
(19)
Acquire method information indicating the learning method related to machine learning,
Based on the acquired method information, determine the method of supplying the machine learning data,
An information processing method for performing processing.
(20)
Acquire method information indicating the learning method related to machine learning,
Based on the acquired method information, determine the method of supplying the machine learning data,
An information processing program that causes processing to be executed.

100, 100A, 100B Information processing device 11 Communication unit 12 Input unit 13 Output unit (display)
14 storage unit 141 correspondence information storage unit 142 data storage unit 15, 15A, 15B control unit 151, 151A, 151B acquisition unit 152, 152B determination unit 153

generation unit

154, 154A provision unit 155 learning unit 156 estimation unit

Claims

An acquisition unit that acquires method information indicating a learning method related to machine learning,
Based on the method information acquired by the acquisition unit, a determination unit that determines a method of supplying the machine learning data,
An information processing apparatus including.
The determination unit is
Determining the supply method based on a comparison between the method information and correspondence information indicating correspondence between a plurality of learning methods and a plurality of supply methods,
The information processing apparatus according to claim 1.
The determination unit is
Determining a parameter related to the supply method that allows a predetermined user to input a value, based on the method information,
The information processing apparatus according to claim 1.
The determination unit is
Determining, based on parameter information corresponding to the learning method, the parameter that allows the predetermined user to input a value;
The information processing device according to claim 3.
A generation unit that generates an input screen for inputting the value of the parameter determined by the determination unit,
With
The information processing device according to claim 3.
A providing unit that provides the input screen generated by the generating unit,
With
The information processing device according to claim 5.
The determination unit is
Determining a recommended value of the parameter recommended to the predetermined user based on the method information,
The information processing device according to claim 3.
The determination unit is
Determining the recommended value based on the performance of the device that executes the machine learning,
The information processing device according to claim 7.
The determination unit is
Determining the recommended value based on the network learned by the machine learning,
The information processing device according to claim 7.
The acquisition unit is
Obtaining the method information from a user who specifies the learning method,
The information processing apparatus according to claim 1.
The acquisition unit is
Acquiring the method information indicating the learning method selected by the user from a plurality of learning methods,
The information processing device according to claim 10.
An estimation unit that estimates the learning method,
Equipped with
The acquisition unit is
Acquiring the method information indicating the learning method estimated by the estimation unit,
The information processing apparatus according to claim 1.
The acquisition unit is
Acquire the learning data used for the machine learning,
The estimation unit is
Estimating the learning method based on the learning data,
The information processing apparatus according to claim 12.
The estimation unit is
Estimating the learning method based on information indicating the structure of the learning data,
The information processing device according to claim 13.
The determination unit is
Determining a network to be learned by the machine learning based on the supply method,
The information processing apparatus according to claim 1.
The determination unit is
Determining a network to be learned by the machine learning based on a parameter corresponding to the supply method,
The information processing apparatus according to claim 1.
The determination unit is
Determining the structure of the network based on the supply data supplied by the supply method,
The information processing device according to claim 15.
The determination unit is
Determining the structure of the network based on the domain of the supplied data,
The information processing apparatus according to claim 17.
Acquire method information that indicates the learning method related to machine learning,
Determining a method of supplying the machine learning data based on the acquired method information,
An information processing method for performing processing.
Acquire method information that indicates the learning method related to machine learning,
Determining a method of supplying the machine learning data based on the acquired method information,
An information processing program that causes processing to be executed.