WO2016189675A1

WO2016189675A1 - Neural network learning device and learning method

Info

Publication number: WO2016189675A1
Application number: PCT/JP2015/065159
Authority: WO
Inventors: 泰幸工藤; 純一宮越
Original assignee: 株式会社日立製作所
Priority date: 2015-05-27
Filing date: 2015-05-27
Publication date: 2016-12-01
Also published as: JP6258560B2; JPWO2016189675A1

Abstract

There have been cases in which a large amount of learning time is required when training data that is difficult for a user to classify or training data having different characteristics is to be learned in a neural network (NN) learning phase. The present invention divides training data in accordance with the characteristics of the training data, causes the divided training data to be learned in separated NNs, and thereafter causes the entirety of the training data to be relearned in an integrated NN.

Description

Neural network learning apparatus and learning method

The present invention relates to a learning device and a learning method for a neural network.

So-called supervised machine learning, in which the relationship between system inputs and outputs is modeled by a neural network (hereinafter abbreviated as NN) to predict responses to unknown inputs or classify patterns, is widely used. . In general, supervised machine learning requires a learning phase, and prepares multiple training data sets consisting of inputs and outputs until the relationship between all training data inputs and outputs satisfies a predetermined standard. , Adjust various parameters related to NN.

Here, when the number of input / output vector elements included in the training data is large or when the number of samples of the training data is large, the problem becomes complicated, so that it often takes a lot of time to adjust various parameters. As a method for solving this problem, there is a method described in (Patent Document 1). This publication describes a method in which NNs are separated and merged after each NN performs desired learning. Moreover, there exists a method of (patent document 2) as another method of solving the said subject. This publication describes a method for preferentially learning non-conforming training data in which the relationship between input and output does not satisfy the standard.

No. 7-502357 Japanese Patent Laid-Open No. 5-197700

In the method described in (Patent Document 1), for example, it is described that the training data to be learned by the NN divided into two are alphabets A to L and M to Z, respectively. That is, this method is based on the premise that the user classifies training data to be input in advance. Therefore, when the user does not know how to classify the training data, for example, when training data having the same output is divided into two, the application is difficult. Further, the above method is highly effective when the learning times for A to L and M to Z are equal, but is less effective when the learning times of both are greatly different. In general, since the learning time greatly depends on the characteristics of the training data, it is difficult to stably obtain a high effect.

On the other hand, the method described in (Patent Literature 2) preferentially learns non-conforming training data and adjusts various parameters related to the NN. For this reason, in the stage of re-learning all the subsequent training data, some of the training data that have been adapted so far may be changed to non-conforming. This tendency becomes conspicuous when, for example, the characteristics of conformity and nonconformity training data are greatly different, and it is considered that much time is required for re-learning.

The present invention has been made in view of the above-mentioned problems, and its purpose is to stably shorten the learning time of NN even for training data that is difficult for a user to classify or training data having greatly different characteristics. And providing a learning device and a learning method.

In order to solve the above problems, for example, the configuration described in the claims is adopted. The present application includes a plurality of means for solving the above-described problems. For example, the application includes a neural network dividing unit that divides a neural network into a plurality of neural networks, and a plurality of samples used for learning the neural network. A training data dividing unit that divides training data into a plurality of training data, and any one of the plurality of neural networks is uniquely assigned to each of the plurality of training data, and learning by the assigned neural network Using the divided neural network learning unit for executing the learning, the neural network integration unit for generating an integrated neural network by integrating the plurality of neural networks in which the execution result of the learning satisfies a predetermined condition, and the training data before the division The above Presenting learning device characterized by comprising an integrated neural network learning portion that performs learning by the neural network.

According to the present invention, in the learning phase of the neural network, the NN learning time can be stably reduced even for training data that is difficult for the user to classify or training data having greatly different characteristics. Further, it is possible to minimize the resources necessary for forming the NN.

It is a block diagram explaining the structure of information processing apparatus. It is a flowchart explaining operation | movement of information processing apparatus. It is a block diagram explaining the structure of a neural network. It is a block diagram explaining the structure of a neural network. It is a table | surface explaining the training data given to a neural network. It is a flowchart explaining operation | movement of information processing apparatus. It is a block diagram explaining the structure of information processing apparatus. It is a block diagram explaining the structure of information processing apparatus. It is a flowchart explaining operation | movement of information processing apparatus. It is a time series graph explaining the effect of an information processor. It is a table | surface explaining the classification method of training data. It is a flowchart explaining operation | movement of information processing apparatus.

Hereinafter, examples will be described with reference to the drawings.

Example 1 shows a method of shortening the learning time by dividing the NN into two parts, learning using different training data, and then re-learning all training data with the integrated NN. Hereinafter, this method will be described with reference to FIGS.

FIG. 1 is a functional block diagram of the information processing apparatus according to the first embodiment, which is roughly composed of an NN learning device and a storage device. In FIG. 1, 101 is an NN learning device, 102 is an NN dividing unit, 103 is a training data dividing unit, 104 to 105 are divided NN learning units, 106 is an NN integrating unit, and 107 is an integrated NN learning unit. Reference numeral 108 denotes a storage device, 109 denotes an NN information storage unit, 110 denotes a division information storage unit, 111 denotes a training data storage unit, and 112 denotes a learning result storage unit. All the information in these storage units is read and written by the user. It shall be possible. Although not shown, the NN learning device 101 includes a processor and a memory as a hardware configuration. Various functions of the NN learning device 101 can be realized by executing a program stored in the memory by the processor.

Next, the operation of the information processing apparatus according to the first embodiment will be described with reference to the flowchart of FIG. In FIG. 2, 201 is NN division, 202 is training data division, 203 is division NN learning, 204 is setting change, 205 is NN integration, and 206 is integrated NN learning.

When learning the NN, first, the NN division 201 is executed. This processing is realized by the NN dividing unit 102, and NN given from the NN information storage unit 109 is divided into two based on the division information given from the division information storage unit 110. Hereinafter, the NN dividing method will be described. First, the division information given from the division information storage unit 110 includes information on how many divisions the NN is divided at. In the first embodiment, it is assumed that the information that “NN is equally divided into two, one of which is assigned to the divided NN learning unit 104 and the other is assigned to the divided NN learning unit 105” is included. Information shall be included. Next, an image of the NN given from the NN information storage unit 109 is shown in FIG. In FIG. 3, 301 is an input block, 302 to 303 are first layer feature extraction blocks, 304 to 305 are second layer feature extraction blocks, 306 is a link between feature extraction blocks, and 307 is an output block. is there. Note that the NN of the first embodiment assumes a convolution NN as its form, and one feature extraction block is formed of a convolution layer (unit of initial c) and a pooling layer (unit of initial p). . Therefore, each of the feature extraction blocks has a function of independently extracting the features of the training data. Considering this function, it can be considered that the method of using the feature extraction block as a unit is efficient for dividing the NN, and as a result, the relearning time after the NN integration can be shortened. FIG. 4 is an example in which the NN of FIG. 3 is divided into two equal parts according to the above concept. Since it is divided into two equal parts, the upper part and the lower part are basically separated in units of feature extraction blocks. However, since the input block and the output block are common blocks, they are separated after duplication. Further, the link shown in FIG. 3 is inherited for the link between blocks, but the link between divided NNs (hereinafter, divided NN), that is, the link 306 shown in FIG. 3 is deleted. As described above, the NN division processing 201 can be realized by the configuration and operation described above.

Next, training data division 202 is executed. This process is realized by the training data dividing unit 103 and divides the training data given from the training data storage unit 110 into two based on the division information given from the division information storage unit 110. Hereinafter, a method for dividing training data will be described. First, the division information given from the division information storage unit 110 includes information on how to group and divide the training data. In the first embodiment, information that “divides training data into two equal parts in the first half and the second half of the sample number and assigns the first half to the divided NN learning unit 104 and the second half to the divided NN learning unit 105” is included. To do. Next, an image of training data given from the training data storage unit 111 is shown in FIG. As shown in FIG. 5, in the training data, one sample is composed of a set of input vectors iv1 to iv4 and output vectors ov1 to ov2, and 400 samples are prepared. Therefore, when the training data is divided into two according to the above instructions, the training data may be divided into two groups of sample numbers 1 to 200 and 201 to 400. In the first embodiment, it is assumed that the value of the input vector continuously changes with respect to the sample number. In this case, if the first half and the second half of the sample numbers are grouped, the possibility that a large difference will occur in the characteristics of the training data of each group increases. Then, by separately learning these training data, it is possible to generate divided NNs with higher discrimination ability, and it can be considered that the time for relearning after NN integration can also be shortened. Of course, there may be a case where the value of the input vector does not continuously change with respect to the sample number, and the division method in this case will be described in the second and subsequent embodiments.

When the division of the NN and the training data is completed, the divided NN learning 203 is executed next. This process is realized by the divided NN learning units 104 to 105, and learning of the divided NN is performed in parallel using the divided training data. Hereinafter, the operation of the upper divided NN in FIG. 4 will be described as an example with reference to the flowchart of FIG. In FIG. 6, 601 is an initial setting, 602 is a head training data input, 603 is an NN calculation, 604 is an error calculation, 605 is training data update, 606 is parameter adjustment, and 607 is a result list generation. When learning the divided NN, initial setting 601 is first executed. This process mainly involves initial setting of the link coupling coefficient and bias value. For example, a random number from −1 to +1 is set as an initial value as a predetermined value. When the initial setting is completed, the leading training data input 602 is executed next. Specifically, the training data sample number 1 shown in FIG. 5 is selected, and the input vector is transferred to the input block 401 of the divided NN and the output vector is transferred to the output block 403. Next, the NN operation 603 is executed. This process corresponds to a “forward operation” in the convolution NN, and performs various processes such as convolution (filtering), pooling (partial sampling), and activation on the input training data, and finally the output block 403 Calculate the output value of each unit. The error calculation 604 compares the calculation result on the left with the output vector of the training data, and calculates the error. Thereafter, training data update 605 is executed. This processing corresponds to an operation of selecting training data of the next sample number in the first embodiment. Then, the NN calculation 603 and the error calculation 604 are executed again on the training data, and these series of processes are repeated until the calculation of all the training data is completed. Thereafter, if the total sum of errors is less than or equal to a predetermined reference, it is determined that learning has succeeded, result list creation 606 is executed, and learning ends. The result list includes information indicating success or failure of learning, error for each training data, latest information on various parameter setting values related to the divided NN, and the like. These are transferred to the learning result storage unit 112 and used for analysis of the learning result of the user. On the other hand, when the sum of errors is equal to or larger than the reference, parameter adjustment 607 is executed. This process corresponds to “backward calculation” in the convolution NN, and the error propagation in the output block 403 is calculated using the error propagation method. Back propagate in the direction of the input block 401. As a result, it is possible to correct the coupling coefficient and bias value of each link so that the error becomes smaller. After the parameter adjustment 607, the “forward calculation” using the training data is executed again, and the parameter adjustment 607 is repeated until the total sum of errors is below the reference. If the processing time exceeds a predetermined standard during this operation, it is determined that learning has failed, result list creation 606 is performed, and learning ends.

As a result of learning each divided NN, if there is a divided NN that has failed in learning, a setting change 204 is executed. In the first embodiment, this process is performed by the user, for example, for changing the initial setting 601 of the divided NN, increasing the number of layers and units of the divided NN, and further dividing the divided NN and training data. The contents of the NN information storage unit 109 and the contents of the NN division information storage unit 110 are corrected. Thereafter, learning of the divided NN is executed again, and these series of processes are repeated until learning of all the divided NNs is successful.

When learning of all the divided NNs is successful, NN integration 205 is executed next. This processing is realized by the NN integration unit 106. First, the duplicated input blocks 401 and 402 and

output blocks

403 and 404 are shared by one block. Further, the link 306 deleted by the NN dividing process is restored. With this process, it is possible to integrate the NN.

Integral NN learning 206 is executed after the integration of the divided NNs. This process is realized by the integrated NN learning unit 107, and the content thereof is almost the same as that of the partial NN learning 203. The different processing is that the coupling coefficient and the bias value adjusted by the divided NN learning 203 are reflected on the initial value of each link in the integrated NN. Note that the initial value of the restored link 306 is a random number from −1 to +1, for example, because there is no learning result.

As a result of learning the integrated NN, if the learning fails, the setting change 204 is executed again, and this series of processing is repeated until the integrated NN is successfully learned. The latest information on various parameter setting values related to the integrated NN is transferred to the NN information storage unit 109 and used in the test phase after the learning phase.

As described above, the information processing apparatus according to the first embodiment divides the convolution NN into two by using the feature extraction block as a unit, and learns the division NN in parallel using the training data divided in the first half and the second half of the sample number, and then Retrain all training data with the integrated NN. This makes it possible to generate a segmented NN with high identification capability for training data in which the value of the input vector changes continuously with respect to the sample number, and shorten the time for relearning after NN integration. Is possible. Therefore, it is possible to provide an information processing method and system capable of shortening the NN learning time, which is an object of the present invention.

In the first embodiment, the number of divisions of the NN and the training data is two. However, the number of divisions is not limited to this, and may be set to three or more. Moreover, although the balance of the division of the NN and the training data is made equal, the present invention is not limited to this, and may be set unevenly. Furthermore, the number of training data samples, the number of input / output vector elements, and the like can be freely set.

Example 2 shows a method for appropriately dividing training data when the characteristics of the training data are unknown. Hereinafter, this method will be described with reference to FIG. FIG. 7 is a functional block diagram of the information processing apparatus according to the first embodiment, in which 701 is a division information storage unit and 702 is an NN analysis unit. Since the other parts perform the same processing as the part shown in FIG. 1 of the first embodiment, the same numbers are given.

The feature of the processing of the second embodiment is that a training data analysis process is newly added to the training data division 202 shown in FIG. The data division 202 is realized by the NN analysis unit 702 illustrated in FIG. 7, and divides the training data provided from the training data storage unit 111 based on the division information provided from the division information storage unit 701. Here, the division information given from the division information storage unit 701 includes information on how many divisions the training data is divided at what ratio. In the second embodiment, it is assumed that information that “the training data is equally divided into two and one of them is assigned to the divided NN learning unit 104 and the other is assigned to the divided NN learning unit 105” is included. Next, the NN analysis unit 702 executes analysis for equally dividing the training data into two. In the second embodiment, as this analysis method, so-called clustering that classifies training data based on the proximity of the Euclidean distance is used. As a typical algorithm, for example, a K-average method can be cited. The analysis by clustering divides the training data into two groups, and further generates information about which sample number of training data belongs to which group.

In addition, since it becomes the same structure and operation | movement as Example 1 except the process of the above training data division | segmentation 202, the description is omitted.

As described above, the information processing apparatus according to the second embodiment divides the convolution NN into two parts, learns the divided NNs in parallel using the training data divided into two parts by clustering analysis, and then all the training data with the integrated NNs. Let them learn again. Thereby, it is possible to generate a divided NN with high identification ability for training data whose characteristics are unknown, and it is possible to shorten the time for relearning after NN integration. Therefore, an object of the present invention is to provide an information processing method and system capable of stably reducing the learning time of an NN even for training data that is difficult for a user to classify or training data having greatly different characteristics. Is possible.

As in the first embodiment, the number and balance of NN and training data, the number of training data samples, the number of input / output vector elements, and the like can be freely set. Furthermore, it is possible to switch between the first embodiment and the second embodiment as different processing modes. This makes it possible to stably shorten the learning time of the NN with respect to training data having more various characteristics.

Example 3 shows another method for appropriately dividing training data when the characteristics of the training data are unknown. Hereinafter, this method will be described with reference to FIGS. FIG. 8 is a functional block diagram of the information processing apparatus according to the third embodiment, in which 801 is a division information storage unit and 802 is a division adjustment unit. Since the other parts perform the same processing as the part shown in FIG. 1 of the first embodiment, the same numbers are given.

The feature of the processing of the third embodiment is that the training data is adaptively classified by using the NN as a classifier. Details of this processing will be described below with reference to the flowchart of FIG. In FIG. 9, 901 is initial NN learning, 902 is non-conformance training data extraction, 903 is NN addition, 904 is additional NN learning, 905 is NN integration, 906 is integrated NN learning, and 907 is a result list creation.

In the NN learning, first, the initial NN learning 901 is executed. This process is almost the same as the divided NN learning 203 of the first embodiment, but is characterized in that all training data is learned by one divided NN, for example, the upper divided NN in FIG. In order to realize this process, the division information output by the division adjustment unit 802 may be devised. For example, the division information to be output to the NN division unit 102 is “divide NN equally into two and assign one of them to the division NN learning unit 104”. This can be realized by assigning to the divided NN learning unit 104.

If the learning is successful as a result of the initial NN learning 901, the result list creation 907 is executed and the learning is completed, but if it is unsuccessful, the non-conforming training data extraction 902 is executed. This process is realized by the division adjustment unit 802, and the error calculated in the initial initial NN learning 901 is checked for each training data, and training data whose error is equal to or more than a predetermined standard is extracted as non-conforming training data. After extracting the non-conforming training data, NN addition 903 and additional NN learning 904 are executed. Prior to this, if necessary, resources necessary for forming the NN can be secured, that is, there is an unlearned divided NN that can be added. If the NN cannot be added, the result list creation 907 is executed and the learning ends.

The processing of the NN addition 903 and the additional NN learning 904 is almost the same as the divided NN learning 203 of the first embodiment, but the non-conforming training data is learned by a different divided NN, for example, the lower divided NN in FIG. There is a feature in the point. In order to realize this process, the division information output from the division adjustment unit 802 may be updated. For example, the division information to be output to the NN division unit 102 is “assign the remaining division NN divided into two to the division NN learning unit 105”, and the division information to the training data division unit 103 is “divide non-conforming training data into NN learning unit” It can be realized by “assign to 105”. If the learning fails as a result of learning the additional NN, the non-conforming training data extraction 902 is executed again, and a series of processes for learning the non-conforming training data by adding the NN is performed. Repeat until it becomes impossible to secure.

When the learning of the additional NN is successful, the NN integration 905 and the integrated NN learning 906 are executed. These processes are the same as the NN integration and the integrated NN learning 206 of the first embodiment. In the third embodiment, there is an unlearned division NN, but the unlearned division NN is not integrated, and the learned division NN (additional NN) is integrated. And as a result of having learned integrated NN using all the training data, when learning fails, nonconformity training data extraction 902 is performed again, and a series of processing which learns nonconformity training data by adding NN, Iterate until integrated NN learning is successful or resources cannot be secured. As in the first embodiment, the latest information on various parameter setting values related to the integrated NN is transferred to the NN information storage unit 109 and used in the test phase after the learning phase. Further, the result list generated by the result list creation 907 includes information indicating success or failure of learning, error for each training data, latest information on various parameter setting values related to the divided NN, and the like. These are transferred to the learning result storage unit 112 and used for analysis of the learning result of the user.

As described above, the information processing apparatus according to the third embodiment divides the convolution NN into two parts, classifies the training data as conforming and nonconforming using one of the subdividing NNs. Is used to re-learn the training data, and then all the training data is re-learned with the integrated NN. Thereby, it is possible to generate a divided NN with high identification ability for training data whose characteristics are unknown, and it is possible to shorten the time for relearning after NN integration. Therefore, an object of the present invention is to provide an information processing method and system capable of stably reducing the learning time of an NN even for training data that is difficult for a user to classify or training data having greatly different characteristics. Is possible. Furthermore, since the NN is adaptively added according to the learning result, resources necessary for forming the NN can be saved. Furthermore, compared with the method of clustering the training data shown in the second embodiment, the method of the third embodiment is more direct, so the training time for the NN can be stably reduced for training data with more diverse characteristics. It becomes possible to do.

As in the first embodiment, the number and balance of NN and training data, the number of training data samples, the number of input / output vector elements, and the like can be freely set. Further, the first to third embodiments can be switched as different processing modes. This makes it possible to stably shorten the NN learning time for training data having various characteristics.

In Example 3, in the initial NN learning 901, all training data is learned and non-conforming training data is extracted. However, the present invention is not limited to this. For example, it is also possible to extract part of the training data at random and let the initial NN be pre-learned, and then to learn all the training data and extract the non-conforming training data. With this process, it can be expected that an initial NN with higher discrimination ability can be generated. FIG. 10 shows the result of actually learning NN using this method. It can be seen that the present invention succeeds in learning with a very small number of learning repetitions compared to a method that does not perform divided learning.

Embodiment 4 shows a method for more efficiently realizing the division and addition of NNs in the information processing apparatuses of Embodiments 1 to 3. The feature of the processing of the fourth embodiment is that the training data is classified according to the magnitude of the error obtained by the error calculation 604, and the NN division and the additional policy are determined according to the result. Hereinafter, this method will be described with reference to FIG.

FIG. 11 is an example of a list of errors calculated by the error calculation 604. This error list is realized by the divided NN learning units 104 to 105. As shown in FIG. 11, for example, training data is sorted in descending order of error, and further classified according to the magnitude of error. In FIG. 11, the number of samples is reduced to 20 to simplify the description.

As an example of utilizing the error list of FIG. 11, for example, application to the NN addition 903 in the third embodiment is conceivable. For example, as in the case of FIG. 11, when the error level varies greatly in the non-conforming training data, there is a high possibility that learning will fail even if re-learning with one additional NN. On the other hand, it is considered that the success probability of learning increases when three additional NNs are prepared and training data with large error, medium error, and small error are separately learned. In this way, by obtaining the information “size of error”, it is possible to efficiently determine the NN addition policy. This concept is extremely effective for the setting change 204 by the user shown in the first and second embodiments.

As described above, the information processing apparatus according to the fourth embodiment determines NN division and additional policies according to the magnitude of error in the training data in addition to the configuration and operation described in the first to third embodiments. To do. As a result, the success rate of NN learning can be further increased. Although the number of error level classifications shown in FIG. 11 is four, it is not limited to this, and it is desirable to set it appropriately in consideration of resource conditions and the like.

Embodiment 5 shows a method for minimizing resources necessary for forming an NN in the information processing apparatuses of Embodiments 1 to 4. The characteristic of the processing of the fifth embodiment is that the NN scale is reduced and re-learning is performed on the NN that has succeeded in learning, and if the learning succeeds again, excess NN resources are accumulated. Hereinafter, this method will be described with reference to FIG. FIG. 12 is a flowchart for realizing the resource minimum processing, where 1201 is NN scale reduction, 1202 is reduced NN learning, and 1203 is surplus resource accumulation. These processes are realized by the divided

NN learning units

104 and 105.

First, the NN scale reduction 1201 receives the divided NN, the initial NN, or the additional NN that has been successfully learned, and reduces the scale of the NN by a predetermined amount. As a method for reducing the NN scale, for example, reduction in units of intermediate layers, feature extraction blocks, or units can be considered. Next, the reduced NN learning 1202 is executed. This process is the same as the divided NN learning 203 of the first embodiment, and the description thereof is omitted.

When the learning of the reduced NN is successful, the surplus resource accumulation 1203 is executed. In this process, the NN resources reduced by the NN scale reduction 1201 are accumulated as surplus resources. Thereafter, the NN scale reduction 1201 is executed again, and this series of processing is repeated until learning of the reduction NN fails.

Then, if the learning of the reduced NN fails, it is determined that the NN scale cannot be reduced any more, the accumulated surplus resources are transferred to the NN information storage unit 109, and the resource minimization process is completed. The accumulated surplus resources are recycled when a new NN needs to be added.

As described above, the information processing method and system thereof according to the fifth embodiment is a resource minimization process for minimizing the resources necessary for forming the NN in addition to the configurations and operations shown in the first to fourth embodiments. I do. As a result, the learning time of the NN can be shortened with fewer resources. Further, with limited resources, it is possible to stably shorten the learning time of the NN for training data with more diverse characteristics.

As described above, in Embodiments 1 to 5, the NN is convoluted with the NN. However, the present invention is not limited to this and can be applied to other types of NN. However, when other types of NN are applied, it is desirable that the NN is relatively easy to divide, such as a convolution NN.

101 NN learning device 102 NN dividing unit 103 training data dividing unit 104 divided NN learning unit 105 divided NN learning unit 106 NN integrating unit 107 integrated NN learning unit 108 storage device 109 NN information storage unit 110 divided information storage unit 111 training data storage unit 112 Learning result storage unit 201 NN division 202 Training data division 203 Division NN learning 204 Setting change 205 NN integration 206 Integrated NN learning 601 Initial setting 602 First training data input 603 NN calculation 604 Error calculation 605 Training data update 606 Result list generation 607 Parameter Adjustment 701 Division information storage unit 702 NN analysis unit 801 Division information storage unit 802 Division adjustment unit 901 Initial NN learning 902 Incompatible training data extraction 903 NN addition 904 Additional NN learning 905 NN integration 906 Integration NN learning 907 Result Strike generated 1201 NN downsizing 1202 reduced NN learning 1203 surplus resources accumulated

Claims

A learning device for performing learning by a neural network,
A neural network dividing unit for dividing the neural network into a plurality of neural networks including a first and a second neural network;
A training data dividing unit that divides training data composed of a plurality of samples used for learning of the neural network into first and second training data;
A divided neural network learning unit that performs learning by the first neural network using the first training data, and that performs learning by the second neural network using the second training data;
A neural network integration unit that generates a third neural network by integrating the first and second neural networks after successful learning in the divided neural network learning unit;
A learning apparatus comprising: an integrated neural network learning unit that performs learning by the third neural network using the training data before division.
The learning device according to claim 1,
The training data dividing unit divides the training data into first and second training data based on the Euclidean distance between the plurality of samples.
A learning device for performing learning by a neural network,
A neural network dividing unit for dividing the neural network into a plurality of neural networks;
A training data dividing unit that divides training data composed of a plurality of samples used for learning of the neural network into a plurality of training data;
A divided neural network learning unit that uniquely assigns any of the plurality of neural networks to each of the plurality of training data, and executes learning by the assigned neural network;
A neural network integration unit that integrates the plurality of neural networks that have performed the learning and generates an integrated neural network;
A learning apparatus comprising: an integrated neural network learning unit that performs learning by the integrated neural network using the training data before division.
The learning device according to claim 3,
The training data dividing unit divides the training data according to a learning execution result by any of the plurality of neural networks using the training data before the division.
The learning device according to claim 4,
The learning execution result is an error between an output value included in the training data before division and a calculation result of the integrated neural network.
The learning device according to claim 5,
The divided neural network learning unit determines whether an execution result of the learning using each of the plurality of training data satisfies a predetermined condition, and determines that the predetermined condition is not satisfied, the divided neural network learning unit A learning apparatus that performs learning using a neural network.
The learning device according to claim 6,
The learning apparatus according to claim 1, wherein the divided neural network learning unit performs learning using a part of the assigned neural network when it is determined that the predetermined condition is satisfied.
The learning device according to claim 3,
The learning apparatus according to claim 1, wherein the neural network is a convolutional neural network.
A learning method using a learning device for performing learning by a neural network,
Dividing the neural network into a plurality of neural networks;
Dividing training data consisting of a plurality of samples used for learning of the neural network into a plurality of training data;
For each of the plurality of training data, uniquely assign any one of the plurality of neural networks, to perform learning by the assigned neural network,
Integrating the plurality of neural networks that have performed the learning to generate an integrated neural network;
A learning method comprising performing learning by the integrated neural network using the training data before division.
The learning method according to claim 9,
A learning method, comprising: dividing the training data according to a learning execution result by any of the plurality of neural networks using the training data before division.
The learning method according to claim 10, wherein
The learning execution result is an error between an output value included in the training data before division and a calculation result of the integrated neural network.
The learning method according to claim 11,
It is determined whether or not an execution result of the learning using each of the plurality of training data satisfies a predetermined condition, and when it is determined that the predetermined condition is not satisfied, learning by the different divided neural networks is executed A learning method characterized by that.
The learning method according to claim 12,
If it is determined that the predetermined condition is satisfied, learning is performed using a part of the assigned neural network.
The learning method according to claim 9,
The learning method according to claim 1, wherein the neural network is a convolutional neural network.