JP2017126260A

JP2017126260A - Machine learning method and machine learning device

Info

Publication number: JP2017126260A
Application number: JP2016006161A
Authority: JP
Inventors: 泰金田; Yasushi Kaneda; 秋山　靖浩; Yasuhiro Akiyama; 靖浩秋山; 健人緒方; Taketo Ogata; 吉孝内田; Yoshitaka Uchida
Original assignee: Clarion Co Ltd
Current assignee: Faurecia Clarion Electronics Co Ltd
Priority date: 2016-01-15
Filing date: 2016-01-15
Publication date: 2017-07-20
Anticipated expiration: 2036-01-15
Also published as: JP6643905B2

Abstract

PROBLEM TO BE SOLVED: To provide a method for enabling the learning rate to be autonomously optimized in the learning process such as a neural network and preventing learning from being led to a dead end.SOLUTION: A storage device holds programs, a plurality of systems including a plurality of structure parameters corresponding to the programs, and a learning parameter corresponding to each system. A machine learning method includes: the first procedure for causing each system to learn a prescribed data set by using a learning parameter corresponding to each system; the second procedure for evaluating each system by a prescribed evaluation method; and the third procedure for selecting the first system having the lower evaluation and the second system having the higher evaluation than the first system out of the plurality of systems, generating the copy of the second system and changing at least one of the learning parameter corresponding to the second system and the learning parameter corresponding to the copy of the second system. The machine learning method executes the first procedure to the third procedure again for the plurality of systems other than the first system.SELECTED DRAWING: Figure 5

Description

本発明はニューラルネットワーク等を使用した機械学習に関する。 The present invention relates to machine learning using a neural network or the like.

近年、多層ニューラルネットによる音声、画像などの認識に関する研究、いわゆる深層学習の研究が活性化している。この活性化は、第１に従来は学習させることが困難だった４層以上の多層（深層）ニューラルネットを、auto-encoderという機構を使用して学習させる方法が開発されたこと、第２に、たたみこみニューラルネットによる音声や画像の認識率がおおきく向上したことなどによっている。 In recent years, research on the recognition of speech, images, etc. by multilayer neural networks, so-called deep learning research, has been activated. This activation is based on the development of a method to learn a multi-layer (deep) neural network of 4 or more layers that was difficult to learn by using a mechanism called auto-encoder. This is because the recognition rate of voice and images by the convolutional neural network has been greatly improved.

深層学習にかぎらず、ニューラルネットの訓練などで使用される逆伝搬学習法のための基本アルゴリズムとして、また各種の学習や最適化の手法として最急降下法（steepest descent method）が使用されている。この方法は決定的な探索法（deterministic search）である。しかし、この方法はほぼ確実に大域最適ではない局所最適値にとらわれるため、通常は確率的な探索（stochastic search）である確率的勾配降下法（stochastic gradient descent method）などが使用される。ニューラルネットの学習においては、学習を制御するパラメタとして学習率（learning rate）がある。学習率の初期値は実験者（人間）が決定し、学習の過程において定数であるか、またはあらかじめきめられたスケジュールで変化する。特許文献１においては、学習過程でえられた解の評価がたかいときは学習率を増加させ、評価がひくいときは学習率を低下させる。このように学習率を適応的にきめる方法は他にも提案されているが、いずれも適応可能な問題やネットワークが限定されている。 The steepest descent method is used as a basic algorithm for the back propagation learning method used in neural network training and the like as well as deep learning, and as various learning and optimization methods. This method is a deterministic search. However, since this method is almost certainly caught by a local optimum value that is not a global optimum, a stochastic gradient descent method that is usually a stochastic search is used. In learning of a neural network, there is a learning rate as a parameter for controlling learning. The initial value of the learning rate is determined by an experimenter (human) and is a constant in the learning process or changes according to a predetermined schedule. In Patent Document 1, the learning rate is increased when the evaluation of the solution obtained in the learning process is high, and the learning rate is decreased when the evaluation is poor. Other methods for adaptively determining the learning rate in this way have been proposed, but all have problems and networks that can be adapted.

確率的な最適化のための方法として遺伝的アルゴリズム（ＧＡ）がある。ＧＡはもともとニューラルネットとは独立に発展してきた最適化法だが、機械学習の方法とくみあわせて使用されることもある。とくに、ニューラルネットにおいては逆伝搬学習とＧＡとをくみあわせて使用する方法も多数、開発されている。もっとも多いのは、特許文献２および非特許文献１のようにＧＡによってニューラルネットの構造およびウェイトを最適化する方法であるが、学習法を最適化するためにもＧＡが使用されている。これらの方法においては逆伝搬学習の全過程を実施したのちにＧＡの操作すなわち変異または交叉を実施することをくりかえす。なお、ほかの確率的探索法とのくみあわせとして、比較的近年開発され成功をおさめている確率的探索法である粒子群最適化法（particle swarm optimization methods）とくみあわせた方法も研究されている。また、確率的勾配降下法を並列化したアルゴリズムも開発されている。 There is a genetic algorithm (GA) as a method for probabilistic optimization. GA is an optimization method originally developed independently of neural networks, but is sometimes used in conjunction with machine learning methods. In particular, many methods for using back propagation learning and GA in combination have been developed in neural networks. The most common method is to optimize the structure and weight of a neural network by GA as in Patent Document 2 and Non-Patent Document 1, but GA is also used to optimize the learning method. In these methods, the GA operation, that is, mutation or crossover is repeated after the entire process of backpropagation learning is performed. In addition, as a combination with other probabilistic search methods, a method combined with particle swarm optimization methods, which is a probabilistic search method that has been developed relatively recently, has been studied. Yes. An algorithm that parallels the stochastic gradient descent method has also been developed.

米国特許第６２６９３５１号明細書US Pat. No. 6,269,351 米国特許第６６０１０５３号明細書US Pat. No. 6,601,053

Marshall, S. J. and Harrison, R．F., “Optimization and Training of Feedforward Neural Networks by Genetic Algorithms”, 2nd International Conference on Artificial Neural Networks, pp. 39-43, November 1991.Marshall, S. J. and Harrison, R. F., “Optimization and Training of Feedforward Neural Networks by Genetic Algorithms”, 2nd International Conference on Artificial Neural Networks, pp. 39-43, November 1991.

近年の深層学習およびニューラルネットの逆伝搬学習にかかわる研究の発展にもかかわらず、ニューラルネットの逆伝搬学習においてはいくつかの困難な課題がのこっている。第１の課題は学習率の決定法である。すなわち、最適な学習率はニューラルネットの構造によってもことなり、問題によってもことなる。さらに、学習率を学習の過程において一定にする方法が比較的ひろく使用されているが、通常は学習がすすむにつれて低下させるのがよいため、学習過程においてそれを変化させる方法をくふうする必要がある。学習率のきめかたに関しては多数の文献がある。学習過程における時刻の関数として学習率が自動的にきまる方法も提案されている。最近ではほかにもさまざまな適応的な方法が考案されているが、これらの方法のおおくは巧妙な方法だがうまくいかない場合もある。より単純で強力な方法がもとめられる。 Despite recent advances in research related to deep learning and back propagation learning of neural networks, there are some difficult problems in back propagation learning of neural networks. The first problem is how to determine the learning rate. In other words, the optimal learning rate depends on the structure of the neural network and also on the problem. Furthermore, the method of keeping the learning rate constant in the learning process is relatively widely used, but usually it is better to decrease as learning progresses, so it is necessary to include a method to change it in the learning process. is there. There are many references on how to determine the learning rate. A method has also been proposed in which the learning rate is automatically determined as a function of time in the learning process. Recently, various other adaptive methods have been devised, but most of these methods are clever but may not work. A simpler and more powerful method is sought.

逆伝搬学習における第２の課題は袋小路（local minima）から脱出することである。すなわち、ニューラルネットのおもみ（weight）の初期値によっては、逆伝搬学習しても最小化するべき関数（エラー率など）の値として最小値からほどとおい値しかもとめられないことがある。また、いったんは比較的よい値がもとめられても、さらに学習をすすめると最小値からとおざかったままもどらないことがある。このような袋小路から脱出し、最適にちかい値をもとめることが課題である。 The second problem in back propagation learning is to escape from the local minima. That is, depending on the initial value of the weight of the neural network, only a value that is as low as the minimum value can be obtained as a value of a function (such as an error rate) that should be minimized even by back propagation learning. In addition, even if a relatively good value is obtained once, it may not return from the minimum value if learning is further promoted. It is a problem to escape from such a narrow path and find the optimum threshold value.

逆伝搬学習が袋小路にとらわれやすいのは、逆伝搬学習が局所探索の方法だからである。前記のように逆伝搬のための方法として通常は確率的勾配降下法が使用される。しかし、確率的な探索をおこなうのであれば、確立されたさまざまな確率的探索法のなかのいずれかとくみあわせて使用することによって、改善をはかることがかんがえられる。従来の逆伝搬学習とＧＡとのくみあわせはそれを目的としているが、従来の方法においてはＧＡが１回の逆伝搬学習の過程には作用しないため、１回の学習のなかで学習率を最適化することはできない。 The reason why back propagation learning is easily caught by a dead path is that back propagation learning is a method of local search. As described above, the stochastic gradient descent method is usually used as a method for back propagation. However, if a probabilistic search is to be performed, improvement can be expected by using it in combination with any of a variety of established probabilistic search methods. The purpose of combining conventional back propagation learning and GA is to achieve this, but in the conventional method, GA does not act in the process of one back propagation learning. It cannot be optimized.

上記の課題を解決するために、本発明の一形態は、プロセッサと、前記プロセッサに接続される記憶装置と、を有する計算機が実行する機械学習方法であって、前記記憶装置は、所定の処理を実行する複数のシステムを実現するための複数のプログラムと、前記複数のプログラムの各々に対応する複数の構造パラメタと、前記複数のシステムに対応する複数の学習パラメタと、を保持し、前記各学習パラメタは、前記各システムが実行する学習における前記構造パラメタの変更を指定するパラメタであり、前記機械学習方法は、前記プロセッサが、前記各システムに対応する学習パラメタを用いて、前記各システムに所定のデータセットを学習させる第１手順と、前記プロセッサが、前記各システムを所定の評価方法によって評価する第２手順と、前記プロセッサが、前記複数のシステムから第１システムおよび前記第１システムより評価が高い第２システムを選択し、前記第２システムに対応する前記プログラムおよび前記複数の構造パラメタの複製を、前記第２システムの複製を実現するためのプログラムおよびそれに対応する複数の構造パラメタとして生成し、前記第２システムに対応する前記学習パラメタの複製を前記第２システムの複製に対応する学習パラメタとして生成し、前記第２システムに対応する学習パラメタおよび前記第２システムの複製に対応する学習パラメタが互いに異なるように前記第２システムに対応する学習パラメタおよび前記第２システムの複製に対応する学習パラメタの少なくとも一方を変更する第３手順と、を含み、前記第１システム以外の前記複数のシステムについて、前記第１手順から前記第３手順が再度実行されることを特徴とする。 In order to solve the above problems, an embodiment of the present invention is a machine learning method executed by a computer including a processor and a storage device connected to the processor, wherein the storage device is a predetermined process. Holding a plurality of programs for realizing a plurality of systems for executing a plurality of, a plurality of structural parameters corresponding to each of the plurality of programs, a plurality of learning parameters corresponding to the plurality of systems, The learning parameter is a parameter for designating the change of the structural parameter in the learning executed by each system, and the machine learning method uses the learning parameter corresponding to each system to cause each processor to A first procedure for learning a predetermined data set; and a second procedure in which the processor evaluates each of the systems by a predetermined evaluation method. And the processor selects a first system and a second system having a higher evaluation than the first system from the plurality of systems, and copies the program corresponding to the second system and the plurality of structure parameters. A program for realizing replication of the second system and a plurality of structural parameters corresponding thereto are generated, and a copy of the learning parameter corresponding to the second system is generated as a learning parameter corresponding to the replication of the second system. At least of the learning parameter corresponding to the second system and the learning parameter corresponding to the duplication of the second system so that the learning parameter corresponding to the second system and the learning parameter corresponding to the duplication of the second system are different from each other A third procedure for changing one of the other systems, For serial multiple systems, wherein the third instructions from said first procedure is executed again.

本発明の一態様によれば、学習率（または学習パラメタ、最適化・探索制御パラメタ）を自律的に決定することによって最適化することができる。 According to one aspect of the present invention, optimization can be performed by autonomously determining the learning rate (or learning parameter, optimization / search control parameter).

本発明の実施形態における多層ニューラルネットのパラメタの染色体へのエンコードの説明図である。It is explanatory drawing of encoding to the chromosome of the parameter of the multilayer neural network in embodiment of this invention. 本発明の実施形態における逆伝搬学習とＧＡとをくみあわせた学習法の説明図である。It is explanatory drawing of the learning method which combined back propagation learning and GA in embodiment of this invention. 本発明の実施形態における学習率の変化の例の説明図である。It is explanatory drawing of the example of the change of the learning rate in embodiment of this invention. 本発明の実施形態における最良の個体とその他の各個体とのユークリッド距離の例の説明図である。It is explanatory drawing of the example of the Euclidean distance of the best individual | organism | solid and each other individual | organism | solid in embodiment of this invention. 本発明の実施形態のデータ識別用ニューラルネット設計ツールの構成および動作の概要の説明図である。It is explanatory drawing of the outline | summary of a structure and operation | movement of the neural network design tool for data identification of embodiment of this invention. 本発明の実施形態のデータ識別用ニューラルネット設計ツールのハードウェア構成を説明するブロック図である。It is a block diagram explaining the hardware constitutions of the neural network design tool for data identification of embodiment of this invention.

本発明の実施形態について説明する。この実施形態においてはこの実施形態における方法では並列化した逆伝搬学習の１ステップ（1 epoch）ごとにＧＡにおける選択と変異とをおこなうことによって、従来の逆伝搬学習法およびそれとＧＡとをくみあわせた方法と同様に学習結果としてニューラルネットを最適化するとともに、従来の方法においてはできなかった逆伝搬学習過程における学習率の最適化がなされる。 An embodiment of the present invention will be described. In this embodiment, the method in this embodiment combines the conventional back propagation learning method and the GA by performing selection and mutation in the GA for each step of parallel back propagation learning (1 epoch). Similar to the method described above, the neural network is optimized as a learning result, and the learning rate in the back propagation learning process, which was not possible with the conventional method, is optimized.

（全体構成）
図５を使用してこの実施形態の全体構成すなわちデータ識別用ニューラルネット設計ツール５００の構成と動作の概要を説明する。データ識別用ニューラルネット設計ツール５００は画像データの識別などのためのニューラルネットをユーザが設計するためのツールであり、学習用のデータである原データ５０４と、原データ５０４を説明する教師情報５０５が入力される。原データ５０４としてはビデオや静止画を入力することができ、教師情報５０５はそのビデオや静止画のどの位置に識別するべき情報たとえば車両または歩行者が存在するかを指示する複数の矩形領域（bounding box）の情報をふくむ。設計されたニューラルネットは識別用ニューラルネット５０８のためのおもみ、バイアスというかたちで学習用ニューラルネット群５０７からとりだされる。この設計結果はテストデータとして識別するべきデータ５０９をあたえてテストすることができ、その結果として識別結果出力５１０が出力される。 (overall structure)
An overall configuration of this embodiment, that is, an overview of the configuration and operation of the data identification neural network design tool 500 will be described with reference to FIG. The data identification neural network design tool 500 is a tool for the user to design a neural network for image data identification and the like. The original data 504 as learning data and teacher information 505 for explaining the original data 504 are used. Is entered. A video or a still image can be input as the original data 504, and the teacher information 505 is a plurality of rectangular areas (indicating where to identify information such as vehicles or pedestrians in the video or still image). Include bounding box) information. The designed neural network is taken out from the learning neural network group 507 in the form of weight and bias for the identifying neural network 508. This design result can be tested by giving data 509 to be identified as test data, and as a result, an identification result output 510 is output.

学習制御コンピュータ５０１はユーザの入力を学習制御プログラム５０２につたえ、学習制御プログラムからの出力をユーザにつたえる端末である。また、学習制御コンピュータ５０１は学習制御プログラム５０２、学習データ生成プログラム５０３、複数のニューラルネットによって構成される学習用ニューラルネット群５０７および識別用ニューラルネット５０８をそのうえで動作させることも可能であり、これらのプログラムの入出力である原データ５０４、教師情報５０５、教師情報つき学習データ５０６、識別するべきデータ５０９、および識別結果出力５１０を格納することができる。ただし、学習データ生成プログラム５０３、学習用ニューラルネット群５０７、識別用ニューラルネット５０８、原データ５０４、教師情報５０５、教師情報つき学習データ５０６、識別するべきデータ５０９、および識別結果出力５１０は学習制御プログラム５０２から指示される他のコンピュータ上に格納し実行させることもできる（図６参照）。 The learning control computer 501 is a terminal that gives a user input to the learning control program 502 and gives an output from the learning control program to the user. The learning control computer 501 can also operate a learning control program 502, a learning data generation program 503, a learning neural network group 507 constituted by a plurality of neural networks, and an identifying neural network 508. It is possible to store original data 504 which is input / output of a program, teacher information 505, learning data 506 with teacher information, data 509 to be identified, and identification result output 510. However, the learning data generation program 503, the learning neural network group 507, the identification neural network 508, the original data 504, the teacher information 505, the learning data with teacher information 506, the data 509 to be identified, and the identification result output 510 are learning control. It can also be stored and executed on another computer designated by the program 502 (see FIG. 6).

ユーザは学習制御コンピュータ５０１に原データ５０４および教師情報５０５としてどのデータを使用するかを指示する情報、学習データ生成プログラム５０３、ならびに、学習用ニューラルネット群５０７および識別用ニューラルネット５０８のためのパラメタを入力する。学習用ニューラルネット群５０７のためのパラメタは、後述するようにニューラルネットの数すなわち個体数、ニューラルネットのおもみおよびバイアスをランダムにきめるための乱数の種、複数のニューラルネットのおもみおよびバイアスの分布をきめる正規分布などの分布関数、平均値および標準偏差、ならびに、学習率の初期値の平均値および標準偏差をふくむ。ただし、これらの値として既定値を使用するときはユーザはそれを入力する必要はない。また、この入力はニューラルネットの停止条件としてステップ数（epoch数）の上限や誤差の目標値、学習率の目標値をふくむことができる。ユーザはこの入力の際に識別するべきデータ５０９もあわせて入力することができる。 The user instructs the learning control computer 501 which data to use as the original data 504 and the teacher information 505, the learning data generation program 503, and the parameters for the learning neural network group 507 and the identifying neural network 508. Enter. Parameters for the learning neural network group 507 include, as will be described later, the number of neural networks, that is, the number of individuals, the seeds of random numbers for randomly determining the weights and biases of the neural network, the weights and biases of a plurality of neural networks. The distribution function such as a normal distribution that determines the distribution of, the average value and the standard deviation, and the average value and standard deviation of the initial value of the learning rate are included. However, when using default values as these values, the user does not need to input them. This input can include the upper limit of the number of steps (epoch number), the target value of error, and the target value of learning rate as stop conditions of the neural network. The user can also input data 509 to be identified at the time of this input.

ユーザの指示によって学習制御プログラム５０２は学習データ生成プログラム５０３を動作させ、教師情報つき学習データ５０６を生成させる。原データ５０４は比較的おおきなサイズ（たとえば６４０×４８０）のフレーム画像であり、そのままでは学習用ニューラルネット群５０７があつかえないため、１枚のフレーム画像を使用して比較的ちいさなサイズ（たとえば２４×４８）の多数のパッチ画像を生成し、教師情報つき学習データ５０６とする。これらの画像は学習データ生成プログラム５０３が教師情報５０５を使用することによって正例すなわち検出するべき画像と負例すなわち検出するべきでない画像とに分類されるため、教師情報つき学習データ５０６においてはその分類が画像と１対１に対応するかたちで格納される。教師情報５０５においては検出するべき画像がクラスわけされていることもあり、この場合には教師情報つき学習データ５０６にはそのクラスが格納される。すなわち、パッチ画像とクラスとの対が格納される。 In response to a user instruction, the learning control program 502 operates the learning data generation program 503 to generate learning data 506 with teacher information. The original data 504 is a frame image having a relatively large size (for example, 640 × 480), and the learning neural network group 507 cannot be used as it is, so that a single frame image is used for a relatively small size (for example, 24). A large number of × 48) patch images are generated and used as learning data 506 with teacher information. These images are classified into positive examples, that is, images to be detected and negative examples, ie, images that should not be detected, by the learning data generation program 503 using the teacher information 505. The classification is stored in a one-to-one correspondence with the image. In the teacher information 505, the images to be detected may be divided into classes. In this case, the class is stored in the learning data with teacher information 506. That is, a pair of patch image and class is stored.

ユーザの指示によって学習制御プログラム５０２は学習用ニューラルネット群５０７を動作させて逆伝搬学習をおこなう。すなわち、教師情報つき学習データ５０６がふくむ画像をニューラルネットに入力し、その出力と教師情報との差にもとづいて学習用ニューラルネット群５０７のウェイトとバイアスを更新することがくりかえされる。複数のニューラルネットを学習させる方法は後述する。 In response to a user instruction, the learning control program 502 operates the learning neural network group 507 to perform back propagation learning. That is, an image including the learning data with teacher information 506 is input to the neural network, and the weight and bias of the learning neural network group 507 are updated based on the difference between the output and the teacher information. A method for learning a plurality of neural networks will be described later.

図６は、本発明の実施形態のデータ識別用ニューラルネット設計ツール５００のハードウェア構成を説明するブロック図である。 FIG. 6 is a block diagram illustrating the hardware configuration of the data identification neural network design tool 500 according to the embodiment of this invention.

本実施形態のデータ識別用ニューラルネット設計ツール５００は、例えば、ネットワーク６３０によって相互に接続された計算機６００、６１０および６２０によって実現することができる。 The neural network design tool 500 for data identification according to the present embodiment can be realized by, for example, computers 600, 610, and 620 connected to each other by a network 630.

計算機６００は、図５の学習制御コンピュータ５０１に相当し、相互に接続されたＣＰＵ（Central Processing Unit）６０１、メモリ６０２、Ｉ／Ｆ（Interface）６０３およびＨＤＤ（Hard Disk Drive）６０４を有する。ＣＰＵ６０１は、メモリ６０２に格納されたプログラムを実行するプロセッサである。メモリ６０２は、ＣＰＵ６０１によって実行されるプログラム及び処理されるデータ等を格納するいわゆる主記憶装置である。本実施形態のメモリ６０２は、学習制御プログラム５０２を格納する。本実施形態において学習制御プログラム５０２が実行する処理は、実際には、ＣＰＵ６０１が学習制御プログラム５０２に従って実行する。Ｉ／Ｆ６０３は、ネットワーク６３０を介して計算機６１０および６２０との間でデータを送受信する。ＨＤＤ６０４は、ＣＰＵ６０１によって実行されるプログラム及び処理されるデータ等を格納するいわゆる補助記憶装置である。例えば学習制御プログラム５０２がＨＤＤ６０４に格納され、必要に応じてメモリ６０２にコピーされてもよい。 The computer 600 corresponds to the learning control computer 501 in FIG. 5, and includes a CPU (Central Processing Unit) 601, a memory 602, an I / F (Interface) 603, and an HDD (Hard Disk Drive) 604 that are connected to one another. The CPU 601 is a processor that executes a program stored in the memory 602. The memory 602 is a so-called main storage device that stores programs executed by the CPU 601, data to be processed, and the like. The memory 602 of this embodiment stores a learning control program 502. The processing executed by the learning control program 502 in the present embodiment is actually executed by the CPU 601 according to the learning control program 502. The I / F 603 transmits and receives data to and from the computers 610 and 620 via the network 630. The HDD 604 is a so-called auxiliary storage device that stores a program executed by the CPU 601 and data to be processed. For example, the learning control program 502 may be stored in the HDD 604 and copied to the memory 602 as necessary.

計算機６１０は、相互に接続されたＣＰＵ６１１、メモリ６１７、Ｉ／Ｆ６１６およびＨＤＤ６１５を有し、さらに、ＣＰＵ６１１に接続されたＧＰＵ（Graphics Processing Unit）６１２を有する。ＣＰＵ６１１は、メモリ６１７に格納されたプログラムを実行するプロセッサである。メモリ６１７は、ＣＰＵ６１１によって実行されるプログラム及び処理されるデータ等を格納するいわゆる主記憶装置である。本実施形態のメモリ６０２は、学習データ生成プログラム５０３を格納する。本実施形態において学習データ生成プログラム５０３が実行する処理は、実際には、ＣＰＵ６１１が学習データ生成プログラム５０３に従って実行する。Ｉ／Ｆ６１６は、ネットワーク６３０を介して計算機６００および６２０との間でデータを送受信する。ＨＤＤ６１５は、ＣＰＵ６１１等によって実行されるプログラム及び処理されるデータ等を格納するいわゆる補助記憶装置である。本実施例のＨＤＤ６１５は、原データ５０４および教師情報５０５を格納する。 The computer 610 includes a CPU 611, a memory 617, an I / F 616, and an HDD 615 that are connected to each other, and further includes a GPU (Graphics Processing Unit) 612 that is connected to the CPU 611. The CPU 611 is a processor that executes a program stored in the memory 617. The memory 617 is a so-called main storage device that stores a program executed by the CPU 611 and data to be processed. The memory 602 of the present embodiment stores a learning data generation program 503. The processing executed by the learning data generation program 503 in the present embodiment is actually executed by the CPU 611 according to the learning data generation program 503. The I / F 616 transmits and receives data to and from the computers 600 and 620 via the network 630. The HDD 615 is a so-called auxiliary storage device that stores programs executed by the CPU 611 and the like, data to be processed, and the like. The HDD 615 of this embodiment stores original data 504 and teacher information 505.

ＧＰＵ６１２は、複数のプロセッサコア６１３及びメモリ６１４を有するプロセッサである。本実施例のメモリ６１４には、学習用ニューラルネット群５０７および教師情報つき学習データ５０６が格納される。本実施形態の学習用ニューラルネット群５０７の動作は、ＧＰＵ６１２によって実行される。 The GPU 612 is a processor having a plurality of processor cores 613 and a memory 614. The memory 614 of this embodiment stores a learning neural network group 507 and learning data 506 with teacher information. The operation of the learning neural network group 507 in this embodiment is executed by the GPU 612.

なお、教師情報つき学習データ５０６は、原データ５０４および教師情報５０５から学習データ生成プログラム５０３によって生成されると、ＨＤＤ６１５に格納され、その後、ＣＰＵ６１１によってメモリ６１４にコピーされてもよい。同様に、学習用ニューラルネット群５０７は、ＨＤＤ６１５またはメモリ６１７に格納され、ＣＰＵ６１１によってメモリ６１４にコピーされてもよい。 Note that the learning data with teacher information 506 may be stored in the HDD 615 after being generated from the original data 504 and the teacher information 505 by the learning data generation program 503, and then copied to the memory 614 by the CPU 611. Similarly, the learning neural network group 507 may be stored in the HDD 615 or the memory 617 and copied to the memory 614 by the CPU 611.

学習用ニューラルネット群５０７に含まれる各学習用ニューラルネットは、メモリ６１４に格納された各学習用ニューラルネットに含まれるニューロン間の重みおよびバイアス等の構造パラメタのセットと、それらの構造パラメタおよび入力された学習データに基づいて出力を計算し、その出力の評価に基づいて所定の学習方法（たとえば逆伝搬学習）による学習を行うプログラムと、に対応する。すなわち、各学習用ニューラルネットは、ＧＰＵ６１２が構造パラメタを使用して対応するプログラムを実行することによって実現されるシステムである。 Each learning neural network included in the learning neural network group 507 includes a set of structural parameters such as weights and biases between neurons included in each learning neural network stored in the memory 614, and the structural parameters and inputs. And a program that calculates an output based on the learned data and performs learning by a predetermined learning method (for example, reverse propagation learning) based on the evaluation of the output. That is, each learning neural network is a system realized by the GPU 612 executing a corresponding program using the structure parameter.

計算機６２０は、相互に接続されたＣＰＵ６２１、メモリ６２６、Ｉ／Ｆ６２５およびＨＤＤ６２７を有し、さらに、ＣＰＵ６２１に接続されたＧＰＵ６２２を有する。ＣＰＵ６２１は、メモリ６２６に格納されたプログラムを実行するプロセッサである。メモリ６２６は、ＣＰＵ６２１によって実行されるプログラム及び処理されるデータ等を格納するいわゆる主記憶装置である。Ｉ／Ｆ６２５は、ネットワーク６３０を介して計算機６００および６１０との間でデータを送受信する。ＨＤＤ６２７は、ＣＰＵ６２１等によって実行されるプログラム及び処理されるデータ等を格納するいわゆる補助記憶装置である。 The computer 620 includes a CPU 621, a memory 626, an I / F 625, and an HDD 627 that are connected to each other, and further includes a GPU 622 that is connected to the CPU 621. The CPU 621 is a processor that executes a program stored in the memory 626. The memory 626 is a so-called main storage device that stores a program executed by the CPU 621 and data to be processed. The I / F 625 transmits and receives data to and from the computers 600 and 610 via the network 630. The HDD 627 is a so-called auxiliary storage device that stores programs executed by the CPU 621 and the like, data to be processed, and the like.

ＧＰＵ６２２は、複数のプロセッサコア６２３及びメモリ６２４を有するプロセッサである。本実施例のメモリ６２４には、識別用ニューラルネット５０８、識別するべきデータ５０９および識別結果出力５１０が格納される。本実施形態の識別用ニューラルネット５０８の動作は、ＧＰＵ６２２によって実行される。 The GPU 622 is a processor having a plurality of processor cores 623 and a memory 624. The memory 624 of this embodiment stores an identification neural network 508, data 509 to be identified, and an identification result output 510. The operation of the identification neural network 508 of this embodiment is executed by the GPU 622.

なお、識別用ニューラルネット５０８および識別するべきデータ５０９は、ＨＤＤ６２７またはメモリ６２６に格納され、ＣＰＵ６２１によってメモリ６２４にコピーされてもよい。また、識別結果出力５１０は、ＣＰＵ６２１によってメモリ６２４からメモリ６２６またはＨＤＤ６２７にコピーされ、さらに、必要に応じてＩ／Ｆ６２５およびネットワーク６３０を介して計算機６００等に送信されてもよい。 Note that the identification neural network 508 and the data 509 to be identified may be stored in the HDD 627 or the memory 626 and copied to the memory 624 by the CPU 621. The identification result output 510 may be copied from the memory 624 to the memory 626 or the HDD 627 by the CPU 621 and further transmitted to the computer 600 or the like via the I / F 625 and the network 630 as necessary.

なお、図６はデータ識別用ニューラルネット設計ツール５００のハードウェア構成の一例であり、実際には種々の変形例があり得る。例えば、計算機６１０が複数のＧＰＵ６１２を有してもよい。その場合、各ＧＰＵ６１２のメモリ６１４に、学習用ニューラルネット群５０７に含まれる各学習用ニューラルネットと、教師情報つき学習データ５０６とが格納され、それぞれのＧＰＵ６１２が一つの学習用ニューラルネットの学習を行ってもよい。これによって、複数のニューラルネットの学習が並列に実行されるため、学習に要する時間が短縮される。 FIG. 6 shows an example of the hardware configuration of the data identifying neural network design tool 500, and there may actually be various modifications. For example, the computer 610 may have a plurality of GPUs 612. In this case, the memory 614 of each GPU 612 stores each learning neural network included in the learning neural network group 507 and learning data 506 with teacher information, and each GPU 612 learns one learning neural network. You may go. As a result, learning of a plurality of neural networks is executed in parallel, and the time required for learning is shortened.

あるいは、計算機６００、６１０および６２０のいずれか二つまたは全部の機能が一つの計算機によって実現されてもよい。あるいは、上記の例においてＧＰＵ６１２等が実行する処理が、ＣＰＵ６１１等によって実行されてもよい。あるいは、ＨＤＤ６０４等がフラッシュメモリ等のＨＤＤ以外の種類の記憶装置によって置き換えられてもよい。 Alternatively, any two or all functions of the computers 600, 610, and 620 may be realized by one computer. Alternatively, the processing executed by the GPU 612 or the like in the above example may be executed by the CPU 611 or the like. Alternatively, the HDD 604 or the like may be replaced with a storage device other than the HDD such as a flash memory.

（染色体の表現）
この実施形態においては、図１のように多層ニューラルネットのパラメタが遺伝的アルゴリズム（ＧＡ）の染色体にエンコードされる。１個の個体は１個の染色体だけをもつため、染色体と個体はここでは同義である。図１（ａ）には３層パーセプトロンの例をしめす。結合のおもみ１０１を染色体上にエンコードする点は従来のニューラルネットとＧＡをくみあわせた方法におけるエンコード法と同様だが、本実施形態ではさらに学習率（learning rate）１０２もあわせてエンコードされている。なお、図１（ａ）においてはニューロン間の結合パラメタのうちおもみだけを記述しているが、定数項すなわちバイアスも染色体にエンコードすることができる。また、ここではニューラルネットの構造は固定にしているため構造は染色体上に表現されていないが、構造も表現することによって、学習過程において所定の変異規則にしたがってニューロンおよびニューロン間結合を変更（たとえば削除）するような構造最適化もＧＡを使用して実現することができる。すなわち、染色体を可変長にし、各ニューロンのパラメタを記述する（ニューロンを削除する際にはそれ全体を削除する）ようにしたり、ニューロン間の結合に関するパラメタを記述する（結合を削除する際にはそれを削除する）ようにすることができる。 (Chromosome expression)
In this embodiment, the parameters of the multilayer neural network are encoded in the chromosome of the genetic algorithm (GA) as shown in FIG. Since one individual has only one chromosome, chromosome and individual are synonymous here. FIG. 1A shows an example of a three-layer perceptron. The point of encoding the binding weight 101 on the chromosome is the same as the encoding method in the conventional method combining the neural network and the GA, but in this embodiment, the learning rate 102 is also encoded. . In FIG. 1 (a), only the weight is described among the connection parameters between neurons. However, a constant term, that is, a bias can also be encoded into the chromosome. Here, since the structure of the neural network is fixed, the structure is not expressed on the chromosome. However, by expressing the structure as well, neurons and connections between neurons are changed according to a predetermined mutation rule in the learning process (for example, Structural optimization such as (deletion) can also be realized using GA. That is, make the chromosome variable length and describe the parameters of each neuron (delete the entire neuron when deleting a neuron), or describe the parameters related to the connection between neurons (when deleting a connection) Delete it).

図１（ｂ）には、画像認識などにおいてよく使用されるたたみこみニューラルネット（ＣＮＮ）のエンコードをしめしている。図１（ａ）と比較するとパラメタ数は増加し染色体の規模が拡大するが、パラメタをエンコードするという点においてはおなじである。 FIG. 1B shows the encoding of a convolutional neural network (CNN) often used in image recognition or the like. Compared to FIG. 1 (a), the number of parameters increases and the size of the chromosome increases, but the same is true in terms of encoding parameters.

染色体の構造はすべての個体について同一である必要はない。すなわち、ことなる構造の（たとえばニューロン間の結合がことなる、またはニューロン数およびニューロン間の結合がことなる）ニューラルネットを使用して計算をおこなうことができる。この場合でも変異は同様におこなうことができる。ＧＡにおいては変異のほかに交叉という演算が使用されるが、同一の構造をもつ染色体間ではもちろん、ことなる構造をもつ染色体間でも交叉をおこなうことが可能である。たとえば、２個のニューラルネットのそれぞれをいずれかの層のあいだで分割するか、特定の層において２分割して、それらを交叉してくみあわせることが可能である。この際には、切断する結合の数がひとしくなるようにすれば単純に再接続するだけでニューラルネットの構造を維持することができるが、結合が不足するときはおもみ０の結合を導入したり、結合に剰余がでるときには結合を削除することによって、ニューラルネットの構造を再構築することができる。 The chromosome structure need not be the same for all individuals. That is, the calculation can be performed using a neural network having a different structure (for example, different connections between neurons, or different numbers of neurons and connections between neurons). Even in this case, the mutation can be carried out in the same manner. In GA, an operation called crossover is used in addition to mutation, but crossover can be performed not only between chromosomes having the same structure but also between chromosomes having different structures. For example, each of the two neural nets can be divided between any layers, or divided into two in a specific layer and crossed together. In this case, the structure of the neural network can be maintained by simply reconnecting if the number of connections to be cut is made small. When there is a remainder in the connection, the structure of the neural network can be reconstructed by deleting the connection.

なお、染色体へのコーディングは多層ニューラルネットにかぎらず、学習または最適化・探索のための他の種類のシステムにおいても適用することができる。すなわち、システムの構造パラメタ（ニューラルネットにおける結合のおもみに相当）と学習パラメタまたは最適化・探索の過程を制御するパラメタ（学習率に相当）をコーディングし、変異および交叉の操作を適用することができる。 Chromosome coding is not limited to multilayer neural networks, and can be applied to other types of systems for learning or optimization / search. That is, coding system structural parameters (corresponding to the binding weight in a neural network) and learning parameters or parameters that control the optimization / search process (corresponding to the learning rate) and applying mutation and crossover operations Can do.

（学習法）
以下、図２を使用して逆伝搬学習とＧＡとをくみあわせた学習法について説明する。この学習法をＬＯＧ−ＢＰ学習法（learning-rate-optimizing genetic back-propapation 学習法）とよぶ。この学習法においては、前節でしめした染色体を複数用意して並列に逆伝搬学習をおこなうことによって、それらの染色体上のおもみは自律的に変異する。また、学習率は確率的に変異させる。 (Learning method)
Hereinafter, a learning method in which back propagation learning and GA are combined will be described with reference to FIG. This learning method is called a LOG-BP learning method (learning-rate-optimizing genetic back-propapation learning method). In this learning method, by preparing a plurality of chromosomes shown in the previous section and performing reverse propagation learning in parallel, the masses on those chromosomes are autonomously mutated. The learning rate is stochastically varied.

まず、図２のプログラム（すなわち学習用ニューラルネット群５０７にふくまれるプログラム）がくみこまれたコンピュータ（図６の例では計算機６１０）が、染色体の初期化をおこなう（２０１）。個体数は可変とすることもできるが、ここでは固定数（たとえば２０個）とする。それらの染色体がもつおもみと学習率は乱数によってきめられる。おもみの初期化は通常の逆伝搬学習におけるのと同様におこなえばよいが、たとえば正規分布する乱数によっておもみやバイアスをきめてもよい。学習率も乱数を使用して適度に分布させるが、たとえば正規分布によってきめればよい。学習率は発散頻度がたかくなりすぎない程度に、比較的おおきな値にするのがよいとかんがえられる。これらの初期値をきめるためのパラメタは学習制御プログラム５０２を経由して外部から入力することができる。すなわち、学習の開始前に学習率、おもみの平均値、標準偏差、分布の形状、および乱数の種を指定することができる。 First, a computer (computer 610 in the example of FIG. 6) in which the program of FIG. 2 (that is, the program included in the learning neural network group 507) is embedded initializes the chromosome (201). The number of individuals can be variable, but here it is a fixed number (for example, 20). The masses and learning rates of these chromosomes are determined by random numbers. The initialization of the omigami may be performed in the same manner as in the normal back propagation learning. However, the omigami and the bias may be determined by a normally distributed random number, for example. The learning rate is also appropriately distributed using random numbers, but may be determined by, for example, a normal distribution. It can be said that the learning rate should be relatively large so that the frequency of divergence is not too high. Parameters for determining these initial values can be input from the outside via the learning control program 502. That is, it is possible to specify the learning rate, the average value of the weight, the standard deviation, the shape of the distribution, and the seed of the random number before starting the learning.

つぎに、コンピュータは、各個体について逆伝搬学習の１ステップ（1 epoch）をおこなう（２０３）。このステップがＧＡにおける１世代に相当する。たとえば、コンピュータは、画像データを学習させるときには、あらかじめ、できるだけ多数の画像データを訓練データとして用意し、その一部を検証用データとしてとりわける。また、おなじ形式の画像からなる評価用データを必要に応じて用意する。そして、コンピュータは、すべての訓練データを１回、学習させる（画像データを使用した学習に関しては後述する）。ミニバッチを単位とする確率的勾配降下法（すなわち、訓練データのすべてを一度に学習させる最急降下法とも、１個ずつ学習させる基本的な確率的勾配降下法ともちがって、ある程度ずつまとめて学習させる方法）を使用するときは、配列に格納した訓練用データをミニバッチごとに分割して１回ずつ逆伝搬させて学習させる。このとき、学習率としては各染色体にエンコードされた値を使用する。 Next, the computer performs one step (1 epoch) of back propagation learning for each individual (203). This step corresponds to one generation in GA. For example, when learning image data, a computer prepares as many image data as possible as training data in advance, and uses a part of the data as verification data. Also, evaluation data consisting of the same image format is prepared as necessary. Then, the computer learns all the training data once (learning using image data will be described later). Probabilistic gradient descent in units of mini-batch (i.e., the steepest descent method that learns all of the training data at once and the basic probabilistic gradient descent method that learns one by one) When the method is used, the training data stored in the array is divided for each mini-batch and back-propagated once to learn. At this time, the value encoded in each chromosome is used as the learning rate.

つぎに、コンピュータは、学習によって変化したおもみによって、染色体上のおもみを更新する（２０４）。すなわち、この方法においては染色体上のおもみの値は外的に変化させるのではなくて、乱数と各個体の学習にもとづいて自律的に更新される。すなわち、Darwin的な遺伝ではなく、獲得形質がそのまま遺伝するLamarck的な遺伝を実現する。ただし、おもみの更新を変異とかんがえれば、この過程はＧＡの基本に一致する。 Next, the computer updates the mass on the chromosome with the mass changed by learning (204). That is, in this method, the value of the worm on the chromosome is not changed externally, but is autonomously updated based on the random number and learning of each individual. In other words, instead of Darwinian inheritance, Lamarck-like inheritance in which acquired traits are inherited as they are is realized. However, this process is consistent with the basics of GA if the renewal of rice cake is considered a mutation.

つぎに、コンピュータは、更新された各個体（もとの個体の無性生殖による卵子）に関して、検証用データを使用して評価をおこなう（２０５）。十分な評価値をもつ個体があれば、ここで計算を終了すればよい（２０６）。十分な評価値をもつ個体がないときは、評価結果がエラー率であれば値はひくいほどよいから、コンピュータは、その値にもとづいて選択をおこなう。すなわち、値が最大のものすなわち評価が最悪の個体（すなわち染色体）は殺して（すなわち削除して）、最小のものすなわち評価が最良の個体をコピーする（２卵性双生児を生成する）（２０７）。これによって個体数は不変になる。ただし、生成（コピー）確率と死滅確率とを同一にしないことにより、個体数がしだいに増加または減少するようにすることも可能である。コンピュータは、コピーによって生成された個体に関してはつぎの式にしたがって染色体上の学習率ηを変異させる。 Next, the computer evaluates each updated individual (eg, an egg by asexual reproduction of the original individual) using the verification data (205). If there is an individual having a sufficient evaluation value, the calculation may be terminated here (206). When there is no individual having a sufficient evaluation value, if the evaluation result is an error rate, the value is better. Therefore, the computer selects based on the value. That is, the individual with the highest value, i.e. the worst evaluation (i.e. the chromosome), is killed (i.e. deleted) and the one with the lowest value, i.e. the best evaluation, is copied (creating a dizygotic twin) (207 ). This makes the population unchanged. However, by making the generation (copying) probability and the death probability not the same, the number of individuals can be gradually increased or decreased. The computer mutates the learning rate η on the chromosome according to the following formula for the individual generated by copying.

η' = fη （確率 0.5）
η' =η/f （確率 0.5） η '= fη (probability 0.5)
η '= η / f (probability 0.5)

すなわち、どちらの式を適用するかは乱数によって等確率になるように決定する。ｆはたとえば１．２くらいの値であり、適応的な逆伝搬学習法において使用される規則（この規則は本来はＧＡとは無関係）にちかい。ただし、上記の式による学習率の変更は一例であり、評価が最良の個体の学習率とそれをコピーすることによって生成された個体の学習率とが相違するように決定されるかぎり、例えば両方の学習率を変更するなど、上記以外の方法によって学習率を決定してもよい。また、上記の例では、評価が最悪の個体が削除されて、評価が最良の個体のコピーが生成されるが、コピーが生成される個体の評価が削除される個体の評価よりよいかぎり、削除とコピーの対象を評価が最悪の個体と最良の個体とに限定する必要はない。 That is, which expression is applied is determined so as to have an equal probability by a random number. f is a value of about 1.2, for example, and is related to a rule used in the adaptive back propagation learning method (this rule is originally independent of GA). However, the change in the learning rate according to the above formula is an example, and as long as it is determined that the learning rate of the individual with the best evaluation differs from the learning rate of the individual generated by copying it, for example, both The learning rate may be determined by a method other than the above, such as changing the learning rate. In the above example, the individual with the worst evaluation is deleted and a copy of the individual with the best evaluation is generated. However, as long as the evaluation of the individual with the copy is better than the evaluation of the individual to be deleted, it is deleted. It is not necessary to limit the target of copying to the worst individual and the best individual.

また、処理２０７における染色体の削除は、当該染色体をそれ以降の機械学習の処理から除外するための処理の一例であり、実際にその染色体をメモリ６１４から削除してもよいし、その染色体をメモリ６１４に残したまま、例えば学習制御プログラム５０２がそれ以降のepochにおいてその染色体に関する機械学習を行わないように学習を制御するなどの方法でその染色体を機械学習の処理から除外してもよい。以下の説明における染色体の削除も同様である。 Deletion of the chromosome in the process 207 is an example of a process for excluding the chromosome from the subsequent machine learning process, and the chromosome may be actually deleted from the memory 614, or the chromosome may be stored in the memory. For example, the learning control program 502 may exclude the chromosome from the machine learning process by controlling the learning so that the learning control program 502 does not perform the machine learning on the chromosome in the subsequent epoch. The same applies to the deletion of chromosomes in the following description.

コンピュータは、適切な解がえられるまで、あるいは変化がほとんどおこらなくなるまで、上記のステップ（epoch）を反復して計算する（２０９）。計算停止の条件は通常の逆伝搬学習法におけるのと同様にきめればよい。処理２０２および２０８はこの反復にかかわるパラメタの初期化および更新のための処理である。 The computer repeatedly calculates the above steps (epoch) until an appropriate solution is obtained or until almost no change occurs (209). The calculation stop condition may be determined in the same manner as in the normal back propagation learning method. Processes 202 and 208 are processes for initializing and updating parameters relating to this iteration.

上記ではステップごとに選択と変異をおこなうように記述したが、実際にはステップごとの選択と変異の回数の平均値を選択・制御するのがよいとかんがえられる。すなわち、各ステップにちょうど１回の選択・変異をおこなうのでは、個体数がすくないときはその回数は過大になり、個体数がおおいときにはその回数は過小になる。そのため、ステップごとに選択・変異をおこなう回数の平均値をあらかじめきめておいて、実際の回数は確率的にきめればよい。選択回数が過大であれば探索範囲がはやくせばまりすぎるし、過小であれば探索範囲がひろくなりすぎるとかんがえられる。選択回数を適切に制御することによって、計算開始時には広域を探索し、徐々に探索範囲をせまくすることができ、うまく解をもとめることができるという効果がある。 In the above description, selection and mutation are performed for each step. However, in practice, it is better to select and control the average value of the selection and the number of mutations for each step. That is, if selection / mutation is performed exactly once in each step, the number of times becomes excessive when the number of individuals is not large, and the number becomes too small when the number of individuals is large. Therefore, an average value of the number of times of selection / mutation for each step is determined in advance, and the actual number of times may be determined stochastically. If the number of selections is excessive, the search range is too fast. If it is too small, the search range is too wide. By appropriately controlling the number of selections, it is possible to search a wide area at the start of calculation, gradually narrow the search range, and obtain a solution well.

なお、ニューラルネットのかわりに他の学習システムまたは最適化・探索システムを使用するときは、反復実行されるその学習や最適化・探索の１ステップごとに評価をおこない、その結果にもとづいて選択をおこない、学習過程を制御する学習パラメタあるいは最適化・探索過程を制御する最適化・探索パラメタの値を変異させる。この変異に関しては、これらのパラメタの複数の値のあいだに距離（スカラー値のときは差）が定義できるときは乱数を使用して距離のちかいパラメタ値を生成すればよい。また、距離が定義できないときはいずれかことなる値を乱数によって選択すればよい。 When using another learning system or optimization / search system instead of the neural network, evaluation is performed for each step of the repeated learning or optimization / search, and selection is made based on the result. The learning parameter that controls the learning process or the value of the optimization / search parameter that controls the optimization / search process is mutated. Regarding this variation, if a distance (difference when it is a scalar value) can be defined between a plurality of values of these parameters, a random parameter value may be generated using a random number. If the distance cannot be defined, any value can be selected by a random number.

（学習法と応用範囲に関する補足）
以下、ＬＯＧ−ＢＰ学習法の変異とその応用範囲拡大に関する６点について記述する。第１に、各個体は検証用データにもとづく評価値を参照し、それを学習に反映させることができる。上記のアルゴリズムにおいては選択のためにもその評価値を使用しているが、選択は外的なものとかんがえられるから、選択のための評価値はそれとはべつにあたえることが可能である。たとえば、検証用データ以外に評価用データをあたえ、選択にはそれを使用することもかんがえられる。すなわち、各個体による選択基準（逆伝搬学習における基準）と外的な選択基準（ＧＡにおける基準）としてことなる基準を使用することができる。 (Supplement about learning method and application range)
The following describes six points related to the variation of the LOG-BP learning method and its application range expansion. First, each individual can refer to an evaluation value based on the verification data and reflect it in learning. In the above algorithm, the evaluation value is also used for selection. However, since the selection is considered to be external, the evaluation value for selection can be assigned separately. For example, it is possible to give evaluation data in addition to the verification data and use it for selection. That is, different criteria can be used as selection criteria for each individual (standard in back propagation learning) and external selection criteria (standard in GA).

第２に、前記の方法においてはニューラルネットの構造およびパラメタは選択・変異によって変化しない。以下の評価においては拡張はおこなわないが、構造およびパラメタを最適化する目的でこれを拡張し、変異および交差を使用することは可能である。たとえば、各染色体が各ニューラルネットのニューロン数およびニューロン間の結合の有無を示す情報を含み、計算機６１０は、処理２０７において、所定の変異規則に基づいて染色体を変異させることによって、ニューロン間の結合を切断したり、ニューロンを消滅させたりすることができる。後述するニューロンの追加も同様である。ニューラルネット以外のシステムを使用するときも、同様の方法によってその一部を変更・削除することができる。 Second, in the above method, the structure and parameters of the neural network are not changed by selection / mutation. There is no expansion in the following evaluation, but it is possible to extend it to optimize structure and parameters and use mutations and crossings. For example, each chromosome includes information indicating the number of neurons in each neural network and the presence / absence of connections between neurons, and the computer 610 mutates the chromosomes based on a predetermined mutation rule in process 207, thereby connecting the neurons. Can be cut off or neurons can be extinguished. The same applies to the addition of neurons described later. Even when a system other than a neural network is used, a part of the system can be changed or deleted by the same method.

第３に、変異によってニューロンを追加することも可能である。ニューロンを追加する際、それによってすでにおこなった学習を無効にしないためには、おもみの値をちいさくすればよいとかんがえられる（おもみが０ならば追加しないのとおなじになる）。ただし、それでは追加したニューロンが活性化されない可能性もある。訓練データを増加させずにニューロンを追加すると過剰適合が発生し、みかけ上は評価値が向上しやすいとかんがえられる。そのため、ニューラルネットの規模がちいさいときに評価値が向上するように評価関数をきめるのがよいとかんがえられる。たとえば、評価値の一部として最小記述長（minimul description length, MDL）をくわえる（いいかえればdescription length penalty をあたえる）ことがかんがえられる。すなわち、ニューラルネットのモデルの記述長を評価値の一部とする。染色体がニューラルネットの構造を記述しているときには、それはモデルを記述したものということができるから、染色体のながさを記述長として使用することができる。ニューラルネット以外のシステムを使用するときも、その一部を追加することができる。 Third, it is also possible to add neurons by mutation. In order to avoid invalidating the learning that has already been done by adding a neuron, it can be thought that the value of the peach should be reduced (same as not adding if the peach is 0). However, the added neuron may not be activated. If neurons are added without increasing the training data, overfitting occurs, and it seems that the evaluation value is likely to improve. For this reason, it can be said that the evaluation function should be determined so that the evaluation value is improved when the scale of the neural network is small. For example, it can be considered that a minimum description length (MDL) is added as a part of the evaluation value (in other words, a description length penalty is given). That is, the description length of the neural network model is set as a part of the evaluation value. When a chromosome describes the structure of a neural network, it can be said that it describes a model, so the length of the chromosome can be used as the description length. When using a system other than a neural network, a part of the system can be added.

第４に、すでに補足説明してきているように、ＬＯＧ−ＢＰ学習法は上記のようにニューラルネットへの適用において拡張できるだけでなく、他の学習法への拡張も可能である。すなわち、分類・検知などをおこなう（ニューラルネットに対応する）システムが存在し、それを訓練するための学習法が存在するとする。その学習は反復的におこなわれ、また学習を制御するパラメタが存在するとする。このとき、反復の過程で学習の効果を評価する方法があたえられていれば、ニューラルネットの逆伝搬学習におけるのと同様に本学習法（ＬＯＧ学習法）を適用することができる。すなわち、システムの構造をきめるパラメタを染色体として表現し、複数の染色体を初期化して学習を開始し、学習、評価、変異・選択を反復していく。変異の対象となる学習制御パラメタは実数値である必要もなく、単にそれを他の値に変異させる方法が前記の変異のための２個の式のかわりにあたえられればよい。 Fourth, as already described in supplementary explanation, the LOG-BP learning method can be extended not only to the application to the neural network as described above but also to other learning methods. That is, it is assumed that there is a system that performs classification / detection (corresponding to a neural network) and a learning method for training it. The learning is performed iteratively, and there are parameters that control the learning. At this time, if a method for evaluating the effect of learning in the iteration process is provided, the present learning method (LOG learning method) can be applied in the same manner as in the back propagation learning of a neural network. That is, the parameters that determine the structure of the system are expressed as chromosomes, a plurality of chromosomes are initialized, learning is started, and learning, evaluation, mutation and selection are repeated. The learning control parameter to be mutated does not need to be a real value, and a method of mutating it to another value only needs to be provided instead of the two expressions for the mutation.

第５に、上記の実施形態においては全個体が同種のニューラルネットだったが、個体ごとに異種のニューラルネットあるいは他の学習法を使用する個体であっても、評価関数をそろえて学習パラメタとその変異の方法を指定すれば上記の方法によって評価し、選択・変異させることができる。すなわち、ニューラルネットと他の学習法を混合して適用することができる。具体的には、たとえば、学習用ニューラルネット群５０７が、ニューロン間の結合の有無が異なる複数のニューラルネットを含んでもよいし、ニューロン数及びニューロン間の結合の有無が異なる複数のニューラルネットを含んでもよい。それらのすべてについて逆伝搬学習を使用してもよいし、ニューラルネットごとに異なる学習法を使用してもよい。これによって、異種のニューラルネットまたは異種の学習法のなかで最適なものを特定することができる。 Fifth, in the above embodiment, all the individuals are the same type of neural network, but even if the individuals use different types of neural networks or other learning methods for each individual, the evaluation functions are arranged and the learning parameters If the mutation method is designated, it can be evaluated, selected and mutated by the above method. That is, a neural network and other learning methods can be mixed and applied. Specifically, for example, the learning neural network group 507 may include a plurality of neural networks having different connections between neurons, or a plurality of neural networks having different numbers of neurons and connections between neurons. But you can. Back propagation learning may be used for all of them, or a different learning method may be used for each neural network. As a result, it is possible to identify an optimal one among different types of neural networks or different types of learning methods.

第６に、上記の実施形態においては１台のコンピュータ上（例えば図６の計算機６１０）での学習を基本としたが、複数台のコンピュータ（例えば複数の計算機６１０）を用意し、各コンピュータに１個の染色体をわりあてることによって、これらのコンピュータが有するプロセッサ（例えば各計算機６１０のＣＰＵ６１１またはＧＰＵ６１２）によって並列計算をおこなうことができる。各染色体の評価値は１台のコンピュータにあつめて選択をおこなうことができる。つぎのepochにすすむ際にはそれらのコンピュータのうちの１台または複数台の染色体をいれかえる必要があるが、この操作は少量のデータをコンピュータ間で交換することによっておこなうことができる。ことなるパラメタを使用した通常の逆伝搬学習を複数のコンピュータ上でおこなうことは従来技術によって実現できるが、それと比較すると上記のような方法をとることによってより高速に、またより最適にちかい値がもとめられる確率がたかまるという利点がある。あるいは、図６を参照して説明したように、例えば計算機６１０が複数のＧＰＵ６１２を有し、それぞれのＧＰＵ６１２を使用して上記と同様の処理と実行することもできる。 Sixth, in the above embodiment, learning is basically performed on one computer (for example, the computer 610 in FIG. 6). However, a plurality of computers (for example, a plurality of computers 610) are prepared, and each computer is provided. By assigning one chromosome, parallel calculation can be performed by a processor (for example, CPU 611 or GPU 612 of each computer 610) included in these computers. The evaluation value of each chromosome can be selected by collecting in one computer. When proceeding to the next epoch, it is necessary to replace one or more chromosomes of those computers. This operation can be performed by exchanging a small amount of data between computers. Conventional back-propagation learning using different parameters can be performed on multiple computers by the conventional technology, but compared to that, the above method can be used to achieve faster and more optimal values. There is an advantage that the probability of being obtained increases. Alternatively, as described with reference to FIG. 6, for example, the computer 610 includes a plurality of GPUs 612, and each GPU 612 can be used to perform the same processing as described above.

（画像データセットの学習例）
この節においては、前記の学習法にしたがって歩行者画像データセットを学習させる方法について記述する。歩行者画像データセットの例としては、Caltech歩行者データセットがある。歩行者画像のかわりに顔画像、物体の画像、文字画像などを使用する場合もおなじ方法を適用することができる。この学習において使用するデータセットは複数個のビデオをふくんでいる。ビデオとはべつに注釈データが付属していて、そのなかに歩行者の位置とサイズをしめすbounding boxのデータもある。ビデオは訓練用の１個または複数個のビデオと、テスト用の１個または複数個のビデオとで構成されている。 (Learning example of image data set)
This section describes a method for learning a pedestrian image data set according to the learning method described above. An example of a pedestrian image data set is the Caltech pedestrian data set. The same method can be applied when a face image, an object image, a character image, or the like is used instead of a pedestrian image. The data set used in this learning includes multiple videos. Annotation data is attached to each video, and there is also bounding box data that indicates the position and size of the pedestrian. The video is composed of one or more videos for training and one or more videos for testing.

訓練データのうち半数は正例であるが、それをつぎのようにして生成する。上記のデータセットにおいて指定されているbounding boxをきりとって２４×４８のサイズに正規化することによって１０万個の画像を用意し、それを左右反転してえられた１０万個とあわせた２０万個の画像を２回ずつ正例として使用する。 Half of the training data are positive examples, but they are generated as follows. The bounding box specified in the above data set was cut out and normalized to a size of 24 × 48 to prepare 100,000 images, which were combined with the 100,000 obtained by reversing left and right. 200,000 images are used twice as positive examples.

また、訓練データののこり半数の負例はつぎのようにして生成する。Caltech歩行者データセットのbounding box以外の部分からきりだしたサイズ２４×４８の画像を２０万個使用する。この初期負例の生成にあたってはその位置を乱数によってきめる。２４×４８とはことなるサイズの画像をきりだしてリサイズすることもできるが、ちょうど２４×４８のサイズの画像だけをきりだすことも可能である。そして、さらに正例２０万個、負例２０万個を使用して訓練したたたみこみ層１段のＣＮＮをもとのデータセットに適用して誤認識した部分から負例２０万個を生成する。すなわち、そのＣＮＮが歩行者がふくまれると判定したがbounding boxからはずれている画像をあらたな負例とする。これらの負例をあわせて４０万個とし、正例とあわせて８０万個の画像を用意する。 A negative example of the remaining half of the training data is generated as follows. Use 200,000 images with a size of 24 x 48 that are cut out from parts other than the bounding box of the Caltech pedestrian dataset. In generating this initial negative example, the position is determined by a random number. An image with a size different from 24 × 48 can be extracted and resized, but only an image with a size of 24 × 48 can be extracted. Further, 200,000 negative examples are generated from the misrecognized portion by applying the CNN of one stage of the convolution layer trained using 200,000 positive examples and 200,000 negative examples to the original data set. That is, the CNN determines that a pedestrian is included, but an image deviating from the bounding box is a new negative example. These negative examples are combined to 400,000, and 800,000 images are prepared including positive examples.

これらのデータがふくむ数値は、Caltech歩行者データセットなどの原データにおいては０〜２５５だが、これをほぼ−１〜１の範囲の浮動小数にし、さらに平均が０になるように補正する。 The numerical value included in these data is 0 to 255 in the original data such as the Caltech pedestrian data set, but this is made a floating point number in the range of about −1 to 1 and further corrected so that the average becomes 0.

以下、使用するべきＣＮＮの構造とハイパー・パラメタについて記述する。その例は、たたみこみ層２段であることを前提とすると、つぎのとおりである。 The following describes the CNN structure and hyper parameters to be used. The example is as follows, assuming that there are two layers of convolution layers.

・たたみこみ層初段：フィルタ・サイズ５×５、フィルタ数１６、非線形（activation）関数：ReLU
・プーリング層初段：最大プーリング．サイズ２×２
・たたみこみ層２段め：フィルタ・サイズ３×３、フィルタ数２６、２８、または３２、非線形関数：ReLU
・プーリング層２段め：最大プーリング．サイズ２×２
・かくれ層（１段）：ニューロン数５０
・出力層：Logistic regression．ニューロン数２（期待される出力は［１，０］または［０，１］）
・ミニバッチ・サイズ：２５０（もとにしたDeep learning tutorialよりはちいさいが、過大である可能性あり）
ここでReLUとは、f(x) = if x < 0 then 0 else x という折れ線関数を意味している。 First stage of convolution layer: filter size 5 × 5, number of filters 16, non-linear (activation) function: ReLU
・ Pooling layer first stage: Maximum pooling. Size 2x2
2nd stage of convolution layer: filter size 3 × 3, number of filters 26, 28 or 32, nonlinear function: ReLU
・ Pooling layer 2nd stage: Maximum pooling. Size 2x2
-Hide layer (1 stage): 50 neurons
-Output layer: Logistic regression. Number of neurons 2 (expected output is [1, 0] or [0, 1])
・ Mini-batch size: 250 (smaller than the original deep learning tutorial, but may be too large)
Here, ReLU means a line function of f (x) = if x <0 then 0 else x.

（学習率の変化）
本実施形態の学習過程における学習率の変化の例を図３にしめす。学習率の平均値と標準偏差とはepochごとに測定することができるが、この図においてはそれらを5 epochごとにプロットしている。学習率は初期値がひくすぎるとき（図３（ｂ））にはその平均値が最初は増加し、その後減少するが、図３（ａ）においては増加していない。図３（ｂ）においてはやや初期値がひくすぎたが、それが自律的に調整されたのだとかんがえられる。学習率の標準偏差は初期状態では比較的おおきくしているが、学習がすすむと通常は減少する。しかし、増加する場合もある。いずれにしても、学習率は自律的に調整される。ニューラルネットの学習のかわりに他のシステムの機械学習や最適化・探索をおこなうときは、学習率のかわりに学習過程を制御する他の学習パラメタまたは最適化・探索の過程を制御するパラメタが自律的に調整される。 (Change in learning rate)
An example of a change in the learning rate in the learning process of this embodiment is shown in FIG. The average value and standard deviation of the learning rate can be measured every epoch, but in this figure they are plotted every 5 epochs. When the initial value of the learning rate is too low (FIG. 3 (b)), the average value initially increases and then decreases, but does not increase in FIG. 3 (a). In FIG. 3B, the initial value is slightly too small, but it can be considered that it was adjusted autonomously. The standard deviation of the learning rate is relatively large in the initial state, but usually decreases as learning progresses. However, it may increase. In any case, the learning rate is adjusted autonomously. When learning or optimizing / searching another system instead of learning a neural network, other learning parameters controlling the learning process or parameters controlling the optimization / searching process are autonomous instead of the learning rate. Adjusted.

このような学習率（あるいは学習パラメタ、最適化・探索パラメタ）の値およびその変化は、学習をおこなう際あるいは学習の終了時に図３のようなグラフまたは表などの手段によって表示することができる。 The learning rate (or learning parameter, optimization / search parameter) value and its change can be displayed by means such as a graph or table as shown in FIG. 3 when learning is performed or at the end of learning.

（学習性能等に関する補足）
第１に、個体数（染色体数）に関して記述する。それが多いほうが確率的にはより最適にちかい解をもとめることができるが、すべてを並列に計算できるのでなければ、個体数が多いほうが計算時間がかかる。計算時間と探索範囲のバランスがとれる値の例として、個体数を１２個程度にすることがかんがえられる。個体数を１２として100 epochまで実験するにはＧＰＵを使用してたとえば８時間程度かかる。 (Supplement about learning performance)
First, the number of individuals (number of chromosomes) will be described. The more it can be, the stochastic solution can be obtained more probabilistically. However, if not all can be calculated in parallel, the larger the number of individuals, the longer it takes. As an example of a value that can balance the calculation time and the search range, it can be considered that the number of individuals is about 12. It takes about 8 hours using a GPU to experiment to 100 epoch with 12 individuals.

第２に、多様性を維持する方法すなわち探索の大域性を制御するための方法に関して記述する。選択・変異の頻度がたかいと、すべての個体が１個の個体からのコピーになりやすい。そのため、その個体が大域最適値からはなれた局所最適値しかない部分に位置していると、満足できる解に到達できない。1 epochで選択・変異する確率を５％以下にする（個体数が１２個なら０．６個以下にする）必要があるとかんがえられる。選択・変異の確率をひくくすると、学習の過程がすすんでもより大域的な探索がおこなわれる。逆に選択・変異の確率をたかくすると、比較的早期に探索が局所的になる。 Second, a method for maintaining diversity, that is, a method for controlling the globality of the search will be described. If the frequency of selection / mutation is high, all individuals tend to be copies from one individual. Therefore, a satisfactory solution cannot be reached if the individual is located in a portion having only local optimum values that are separated from the global optimum values. It can be considered that the probability of selection / mutation in 1 epoch needs to be 5% or less (or 0.6 if the number of individuals is 12). If the probability of selection / mutation is reduced, a more global search is performed even if the learning process proceeds. Conversely, if the probability of selection / mutation is increased, the search becomes local relatively early.

第３に、大域探索性能に関して記述する。染色体数１２程度では大域探索に十分とはいえない。この場合、最初は１２か所を探索するが、しだいに複写によって生成される個体がふえるため、上記の多様性を維持する方法を適用しても、すぐに探索箇所が３点の近傍くらいにしぼられるからである。探索範囲が何個くらいあるかは、これまでにもとめた最良の個体と現在の各個体とのユークリッド距離を計算し表示することによって推定することができる。図４にこのような表示の例をしめす。各行の右の４個の数値が各段のウェイトのユークリッド距離である。個体９（左端の数字が９の行）が最良の個体であり、すくなくとも２か所の近傍を並列に探索していることがわかる。 Third, the global search performance will be described. A chromosome number of about 12 is not sufficient for global search. In this case, the search is initially made at 12 locations. However, since individuals that are generated by copying gradually increase, even if the above-described method for maintaining diversity is applied, the search location is immediately set to around 3 points. Because it is drowned. The number of search ranges can be estimated by calculating and displaying the Euclidean distance between the best individual so far and the current individual. FIG. 4 shows an example of such display. The four numerical values on the right of each row are the Euclidean distances of the weights in each row. It can be seen that the individual 9 (the row with the leftmost number being 9) is the best individual and is searching for at least two neighborhoods in parallel.

第４に、発散した個体の削除について記述する。この実験で使用した単純な逆伝搬学習のアルゴリズムにおいては、逆伝搬によってウェイトが発散する（“nan”になる）ことがしばしばある。このような個体はゾンビすなわち計算を継続しても解がえられる可能性のない個体だとかんがえられるから、削除するべきである。削除のための論理をくみこむこともできるが、そのような個体は評価値が極端に悪化するため、このアルゴリズムにおいては優先的に削除されるから、特別な論理をくみこむ必要はかならずしもない。ただし、その場合は選択・変異の頻度をゾンビの発生頻度よりたかくする必要がある。また、ゾンビが多数発生するときはそれが除去する論理をくみこんだほうが計算効率がよくなる。 Fourth, the removal of the divergent individual will be described. In the simple back propagation learning algorithm used in this experiment, the weight often diverges (becomes “nan”) due to back propagation. Such individuals are considered zombies, that is, individuals that are unlikely to be solved by continuing calculations, and should be deleted. Although the logic for deletion can be included, such an individual has an extremely deteriorated evaluation value. Therefore, since this algorithm is deleted preferentially, it is not always necessary to include special logic. In this case, however, the selection / mutation frequency must be greater than the zombie frequency. In addition, when many zombies are generated, calculation efficiency is improved by incorporating logic to remove them.

たとえば、計算機６１０は、処理２０７において、評価が所定の条件を満たす（たとえば所定の値より悪い）全ての染色体を削除し、削除した染色体と同数の染色体の複製を生成してもよい。たとえば、計算機６１０は、複数の染色体を削除した場合、評価が最良の染色体の複製を、削除した染色体と同数生成して、評価が最良の染色体とそれらを複製した複数の染色体の学習率が全て異なるようにそれらの染色体の学習率を変更してもよい。あるいは、計算機６１０は、複数の染色体を削除した場合、削除した染色体と同数の、評価が上位の染色体を選択して、選択した染色体の複製を一つずつ生成し、複製元の染色体と複製された染色体の学習率が異なるようにそれらの少なくとも一方を変更してもよい。 For example, in the process 207, the computer 610 may delete all chromosomes whose evaluation satisfies a predetermined condition (for example, worse than a predetermined value), and generate the same number of duplicated chromosomes as the deleted chromosomes. For example, when a plurality of chromosomes are deleted, the computer 610 generates the same number of duplicates of the best evaluated chromosome as the deleted chromosome, and the learning rate of the best evaluated chromosome and the plurality of chromosomes replicating them is all. You may change the learning rate of those chromosomes so that it may differ. Alternatively, when a plurality of chromosomes are deleted, the computer 610 selects the same number of chromosomes with the highest evaluation as the deleted chromosomes, generates one copy of each selected chromosome, and is replicated with the original chromosome. At least one of them may be changed so that the learning rate of the chromosomes is different.

（本発明の実施形態のまとめ）
以上のように、本実施形態は、逆伝搬学習過程にＧＡの方法をとりいれた、あらたな学習法に関する。この方法においてはニューラルネットを１個の染色体（データ）をもつ個体としてコンピュータ上に（プログラムおよびデータとして）表現し、各個体の染色体にニューラルネットのハイパー・パラメタすなわちニューロン間の接続のおもみなどをコーディング（表現）する。また、それとあわせて各染色体にそのニューラルネットの学習率をコーディングする。複数の個体を用意して並列に計算し、並列化された逆伝搬学習の１ステップ（1 epoch）ごとにＧＡにおける選択と変異とをおこなう。すなわち、成績のわるい個体を削除して成績のよい個体の学習率を変異させたものによって置換する。 (Summary of embodiments of the present invention)
As described above, the present embodiment relates to a new learning method in which the GA method is adopted in the back propagation learning process. In this method, a neural network is represented on a computer (as a program and data) as an individual having one chromosome (data), and the hyperparameters of the neural network, that is, the connection between neurons, etc. Is coded. At the same time, the learning rate of the neural network is coded for each chromosome. A plurality of individuals are prepared and calculated in parallel, and selection and mutation in GA are performed for each step (1 epoch) of parallel back propagation learning. In other words, individuals with poor grades are deleted and replaced with those obtained by mutating the learning rate of individuals with good grades.

また、本実施形態の方法はニューラルネットの学習にかぎらず、他の機械学習にも適用することができる。すなわち、画像、音声、ドキュメントなどのデータの反復学習をおこない、その結果を数値的に評価することができるときに、その機械学習を制御する学習パラメタを染色体上にコーディングし、並列化された学習の１ステップごとにＧＡにおける選択と変異とをおこなう。 Further, the method of the present embodiment is not limited to neural network learning but can be applied to other machine learning. In other words, when iterative learning of data such as images, voices, documents, etc. is performed and the results can be evaluated numerically, the learning parameters that control the machine learning are coded on the chromosome and parallelized learning In each step, selection and mutation in GA are performed.

さらに、本発明の方法は最適化および探索にも適用することができる。すなわち、探索空間内の移動を反復して最適化や探索をおこなう際に、探索空間内の現在の点を数値的に評価することができるとき、その最適化や探索を制御する最適化・探索制御パラメタを染色体上にコーディングし、並列化された最適化や探索の１ステップごとにＧＡにおける選択と変異とをおこなう。 Furthermore, the method of the present invention can also be applied to optimization and searching. In other words, when performing optimization and search by repeatedly moving in the search space, when the current point in the search space can be evaluated numerically, optimization / search that controls the optimization and search A control parameter is coded on a chromosome, and selection and mutation in GA are performed for each step of parallel optimization and search.

本実施形態の最大の効果は学習率（または学習パラメタ、最適化・探索制御パラメタ）が自律的に決定されることである。すなわち、逆伝搬学習（または学習、最適化、探索）の１ステップごとに選択と変異とをおこなうことによって、従来の逆伝搬学習法（または学習法、最適化法、探索法）およびそれとＧＡとをくみあわせた方法と同様に学習結果としてニューラルネット（またはシステム）を最適化するのと同時に、従来の方法においてはできなかった逆伝搬学習過程（学習過程、最適化過程、または探索過程）における学習率（または学習パラメタ、最適化・探索制御パラメタ）を最適化することができる。すなわち、１ステップごとにおこなう選択と変異とによって、学習率（または学習パラメタ、最適化・探索制御パラメタ）の平均値がそのステップにおける最適値にちかづけられ、学習（または最適化、探索）の進展とともに変化する最適値に追随する。学習率（または学習パラメタ、最適化・探索制御パラメタ）は、通常は学習（または最適化、探索）の初期には比較的おおきな値をとり学習（または最適化、探索）がすすむとともに最適なスケジュールで低下させることができるが、低下させないほうがよいときはそのようになる。 The greatest effect of this embodiment is that the learning rate (or learning parameter, optimization / search control parameter) is determined autonomously. That is, by performing selection and mutation for each step of backpropagation learning (or learning, optimization, search), the conventional backpropagation learning method (or learning method, optimization method, search method), and GA and In the same way as the method that combines, the neural network (or system) is optimized as a learning result, and at the same time, in the back propagation learning process (learning process, optimization process, or search process) that was not possible with the conventional method The learning rate (or learning parameter, optimization / search control parameter) can be optimized. That is, the average value of the learning rate (or learning parameter, optimization / search control parameter) is assigned to the optimum value in that step by the selection and mutation performed at each step, and the learning (or optimization, search) Follow optimal values that change with progress. The learning rate (or learning parameter, optimization / search control parameter) usually takes a relatively large value at the beginning of learning (or optimization, search), and learning (or optimization, search) proceeds as the schedule is optimized. It can be lowered with, but when it is better not to lower it, it will be like that.

また、それと同時に本実施形態においては学習（または最適化、探索）における探索範囲が適切に制御できるという効果がある。学習（または最適化、探索）の初期には大域的な探索をおこなうことができ、学習（または最適化、探索）の進展とともに探索範囲をせばめることができる。初期には大域的な探索をおこなうことによって局所最適値におちいる確率が低下するとともに、後期にはせまい範囲を効率的に並列探索することができる。ただし、適切な制御のためには選択と変異の頻度を適切に制御する必要がある。 At the same time, this embodiment has an effect that the search range in learning (or optimization, search) can be appropriately controlled. A global search can be performed at the initial stage of learning (or optimization, search), and the search range can be narrowed as learning (or optimization, search) progresses. By performing a global search in the initial stage, the probability of falling to the local optimum value decreases, and in the latter stage, a narrow range can be efficiently searched in parallel. However, it is necessary to appropriately control the frequency of selection and mutation for proper control.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明のより良い理解のために詳細に説明したのであり、必ずしも説明の全ての構成を備えるものに限定されものではない。たとえば、上記した実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 In addition, this invention is not limited to an above-described Example, Various modifications are included. For example, the above-described embodiments have been described in detail for better understanding of the present invention, and are not necessarily limited to those having all the configurations described. For example, it is possible to add, delete, or replace another configuration with respect to a part of the configuration of the above-described embodiment.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によってハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによってソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、不揮発性半導体メモリ、ハードディスクドライブ、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記憶デバイス、または、ＩＣカード、ＳＤカード、ＤＶＤ等の計算機読み取り可能な非一時的データ記憶媒体に格納することができる。 Each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Further, each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files for realizing each function is stored in a non-volatile semiconductor memory, a hard disk drive, a storage device such as an SSD (Solid State Drive), or a computer-readable non-readable information such as an IC card, an SD card, or a DVD. It can be stored on a temporary data storage medium.

また、制御線及び情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線及び情報線を示しているとは限らない。実際にはほとんど全ての構成が相互に接続されていると考えてもよい。 Further, the control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.

５０１学習制御コンピュータ
５０２学習制御プログラム
５０３学習データ生成プログラム
５０４原データ
５０５教師情報
５０６教師情報つき学習データ
５０７学習用ニューラルネット群
５０８識別用ニューラルネット
５０９識別するべきデータ
５１０識別結果出力 501 Learning control computer 502 Learning control program 503 Learning data generation program 504 Original data 505 Teacher information 506 Learning data with teacher information 507 Learning neural network group 508 Identification neural network 509 Data to be identified 510 Identification result output

Claims

A machine learning method executed by a computer having a processor and a storage device connected to the processor,
The storage device includes a plurality of programs for realizing a plurality of systems for executing predetermined processing, a plurality of structural parameters corresponding to each of the plurality of programs, and a plurality of learning parameters corresponding to the plurality of systems. And hold
Each learning parameter is a parameter that specifies a change in the structural parameter in learning performed by each system,
The machine learning method includes:
A first procedure in which the processor causes each system to learn a predetermined data set using a learning parameter corresponding to each system;
A second procedure in which the processor evaluates the systems by a predetermined evaluation method;
The processor selects a first system and a second system having a higher evaluation than the first system from the plurality of systems, and copies the program corresponding to the second system and the plurality of structural parameters to the second system. A program for realizing replication of the system and a plurality of structural parameters corresponding thereto are generated, a replica of the learning parameter corresponding to the second system is generated as a learning parameter corresponding to the replication of the second system, At least one of a learning parameter corresponding to the second system and a learning parameter corresponding to the duplication of the second system so that the learning parameter corresponding to the second system and the learning parameter corresponding to the duplication of the second system are different from each other A third procedure to be changed,
The machine learning method, wherein the third procedure is executed again from the first procedure for the plurality of systems other than the first system.

The machine learning method according to claim 1,
A combination of the plurality of structural parameters corresponding to each system and a learning parameter corresponding to each system is retained as one chromosome in the genetic algorithm,
The machine learning method further includes a step in which the processor determines initial values of the plurality of structural parameters corresponding to the systems and initial values of learning parameters corresponding to the systems using random numbers,
In the third procedure, the processor generates a replica of a chromosome corresponding to the second system, uses a random number to learn a parameter corresponding to the second system, and a learning parameter corresponding to the replica of the second system A machine learning method characterized by determining at least one of the values.

The machine learning method according to claim 1,
Each of the systems is a neural network,
The plurality of structural parameters corresponding to each of the systems includes connection weights between neurons in the neural network,
In the first procedure, the processor causes each system to learn the predetermined data set by back propagation learning,
The machine learning method, wherein the learning parameter is a learning rate indicating a magnitude of a change amount of the weight in the back propagation learning.

The machine learning method according to claim 3,
A combination of the plurality of structural parameters corresponding to each system and a learning parameter corresponding to each system is retained as one chromosome in the genetic algorithm,
Each chromosome further includes information indicating the presence or absence of connections between the neurons,
In the third procedure, the processor changes the presence or absence of the connection between the neurons based on a predetermined mutation rule.

The machine learning method according to claim 4,
Each chromosome further includes information indicating the number of neurons included in each system,
In the third procedure, the processor changes the number of neurons based on a predetermined mutation rule.

The machine learning method according to claim 4,
Each chromosome further includes information indicating the number of stages of the neural network included in each system,
In the third procedure, the processor changes the number of stages of the neural network based on a predetermined variation rule.

The machine learning method according to claim 3,
The plurality of systems includes a plurality of neural networks having different connections between the neurons, or a plurality of neural networks having different numbers of neurons and connections between the neurons. Learning method.

The machine learning method according to claim 1,
In the third procedure, when the evaluation of two or more of the systems is lower than a predetermined value, the processor is configured to realize the same number of systems other than the two or more systems as many as the two or more systems. Generating a copy of a program, a plurality of structural parameters corresponding to a system other than the two or more systems, and a learning parameter corresponding to a system other than the two or more systems;
The machine learning method, wherein the first procedure to the third procedure are executed again for the plurality of systems other than the two or more systems.

The machine learning method according to claim 1,
The computer has a plurality of the processors,
Each of the plurality of systems is assigned to each of the plurality of processors;
In the first procedure, each processor causes the one system assigned to each processor to learn a predetermined data set using the learning parameter.

A machine learning device having a processor and a storage device connected to the processor,
The storage device holds a plurality of systems each including a program executed by the processor and a plurality of structural parameters for the program, and learning parameters corresponding to each system,
Each learning parameter is a parameter that specifies a change in the structural parameter in learning performed by each system,
The processor is
A first procedure for causing each system to learn a predetermined data set using a learning parameter corresponding to each system;
A second procedure for evaluating each of the systems by a predetermined evaluation method;
A first system and a second system having a higher evaluation than the first system are selected from the plurality of systems, a duplicate of the second system is generated, and a duplicate of the learning parameter corresponding to the second system is generated in the second A learning parameter corresponding to the second system so that a learning parameter corresponding to the second system and a learning parameter corresponding to the second system replication are different from each other. Executing at least one of the learning parameters corresponding to the duplication of the two systems;
The machine learning device, wherein the third procedure is executed again from the first procedure for the plurality of systems other than the first system.