JP6101650B2

JP6101650B2 - System parameter learning apparatus, information processing apparatus, method, and program

Info

Publication number: JP6101650B2
Application number: JP2014037245A
Authority: JP
Inventors: 鈴木　潤; 潤鈴木
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-02-27
Filing date: 2014-02-27
Publication date: 2017-03-22
Anticipated expiration: 2034-02-27
Also published as: JP2015162113A

Description

本発明は、システムパラメタ学習装置、情報処理装置、方法、及びプログラムに係り、特に、システムパラメタを学習するシステムパラメタ学習装置、情報処理装置、方法、及びプログラムに関する。 The present invention relates to a system parameter learning apparatus, information processing apparatus, method, and program, and more particularly, to a system parameter learning apparatus, information processing apparatus, method, and program for learning system parameters.

図１３に示すような、音声認識、機械翻訳、文字認識、物体認識、ＤＮＡの構造予測などといった情報処理における識別問題は、図１４に示すように、入力が与えられたときに、出力を予測するシステムとみなすことができる。 As shown in FIG. 13, identification problems in information processing such as speech recognition, machine translation, character recognition, object recognition, DNA structure prediction, etc. predict output when input is given as shown in FIG. System.

これらのシステムは一般的に、実行フェーズと構築フェーズにわけることができる。構築フェーズとは、人手により事前にシステムを設計し、システムパラメタ等を決定する作業を指す。実行フェーズとは、構築フェーズで定義された設計に基づき入力を処理し、出力はシステムパラメタに依存して決定される。 These systems can generally be divided into an execution phase and a construction phase. The construction phase refers to the work of designing a system in advance by hand and determining system parameters and the like. The execution phase processes the input based on the design defined in the construction phase, and the output is determined depending on the system parameters.

構築フェーズでは、様々な方法でシステムを構築することができる。例えば、人手により変換規則を記述しておいて、その規則に則って入力を出力へ変換し、それを出力する方法が考えられる。ただし、変換規則を人手により準備するのは網羅性や整合性を保持するためのコストが非常にかかるため、図１５に示すように、データから自動的にシステムを構築する機械学習手法を用いてシステムを自動構築する方法を用いるのが近年では主流である。 In the construction phase, the system can be constructed in various ways. For example, a method is conceivable in which a conversion rule is described manually, an input is converted into an output in accordance with the rule, and the output is output. However, manual preparation of conversion rules is very expensive to maintain completeness and consistency. Therefore, as shown in FIG. 15, a machine learning method for automatically constructing a system from data is used. In recent years, the method of automatically constructing a system has been mainstream.

構築フェーズでは、まず、対象とするシステムの入力とそれに対応する出力のペアを用意する。これは、一般的に、正解データ或いは教師データとよばれる。教師データとは、教師データ中の入力がシステムに入力された際に、どのような出力がされるべきかを表したデータである。次に、この教師データを用いてシステムを構築する。必要な要件は、教師データ中の入力に対して、正しい出力が行えるシステムであることである。そこで、機械学習に基づく構築フェーズでは、教師データを用いて、教師データを正しく判別できるようなシステムパラメタの集合を学習することに帰着する。 In the construction phase, first, an input pair of the target system and a corresponding output pair are prepared. This is generally called correct answer data or teacher data. Teacher data is data representing what kind of output should be made when input in teacher data is input to the system. Next, a system is constructed using the teacher data. The necessary requirement is that the system can perform correct output with respect to the input in the teacher data. Therefore, in the construction phase based on machine learning, the teacher data is used to result in learning a set of system parameters that can correctly discriminate the teacher data.

以上の処理を数式的に表すと以下のようになる。まず、実行フェーズを示す。ｘ＾を一つの入力を表すこととし、Χを、システムが受け付けるとり得る全ての入力ｘ＾の集合とする。なお、記号に付された「＾」は、当該記号が行列、多次元配列、又はベクトルであることを表している。同様に、ｙ＾を一つの出力を表すこととし、Ｙを、システムが許容するとり得る全ての出力ｙ＾の集合とする。また、Ｙ（ｘ＾）を、ｘ＾が与えられたときにとり得る全ての出力ｙ＾の集合とする。よって、ｘ＾∈Χ、ｙ＾∈Ｙ（ｘ＾）⊆Ｙの関係が成り立つ。 The above processing is expressed mathematically as follows. First, the execution phase is shown. Let x ^ represent one input and let Χ be the set of all inputs x ^ that the system can accept. Note that “^” attached to a symbol indicates that the symbol is a matrix, a multidimensional array, or a vector. Similarly, let y denote a single output, and let Y be a set of all possible outputs y that the system allows. Also, let Y (x ^) be a set of all outputs y ^ that can be taken when x ^ is given. Therefore, the relationship of x ^ ∈Χ and y ^ ∈Y (x ^) ⊆Y holds.

次に、ｗ＾をシステムパラメタの集合をベクトル表記したものとする。ここで、ｗ_ｄをベクトルｗ＾のｄ番目の要素であり、同時にｄ番目のシステムパラメタとする。つまり、ｗ＾＝（ｗ_１，．．．，ｗ_Ｎ）かつｄ＝｛１，．．．，Ｎ｝の関係が成り立つ。ただし、システムパラメタ数はＮであり、ｗ＾はＮ次元ベクトルとする。 Next, let w ^ be a vector representation of a set of system parameters. Here, w _d is the d-th element of the vector w ^, and at the same time is the d-th system parameter. That is, w ^ = (w ₁ ,..., W _N ) and d = {1,. . . , N} holds. However, the number of system parameters is N, and w ^ is an N-dimensional vector.

このとき、入力ｘ＾が与えられたときに出力ｙ＾を返すシステムを下記（１）式に表すことができる。 At this time, a system that returns an output y ^ when an input x ^ is given can be expressed by the following equation (1).

ただし、Φ（ｘ＾，ｙ＾：ｗ＾）は、ｘ＾からｙ＾へ変換する際のスコアを決定する関数であり、ここでは、単にスコア関数と呼ぶ。つまり、ｘ＾が与えられた際に得られる可能性がある全ての出力ｙ＾の中で、この変換スコアが最も高くなるｙ＾が出力として採用されることになる。そのため、ｗ＾は、どの出力が選ばれるかを制御するシステムパラメタであり、システム全体の性能を決定する要因といえる。よって、システムパラメタｗ＾をいかに精度よく求めるかという事が、構築フェーズの最大の要件となる。ここで、精度よくとは、あらゆる入力に対して可能な限り多くの正しい出力を行うことが可能なｗ＾を求めることを意味する。なお、記号の前に付された「^＊」は、当該記号が推定された値であることを表している。 However, Φ (x ^, y ^: w ^) is a function that determines a score when converting from x ^ to y ^, and is simply referred to as a score function here. That is, among all the outputs y ^ that may be obtained when x ^ is given, y ^ having the highest conversion score is adopted as the output. Therefore, w ^ is a system parameter that controls which output is selected, and can be said to be a factor that determines the performance of the entire system. Therefore, how to obtain the system parameter w ^ with high accuracy is the greatest requirement in the construction phase. Here, the term “accurately” means to obtain w ^ that can perform as many correct outputs as possible for every input. Note that “ ^* ” added in front of a symbol indicates that the symbol is an estimated value.

次に、構築フェーズについて説明する。実際に、あらゆる可能な入力に対して最良のパラメタｗ＾を求めることは非常に困難を伴う。それは、実際に、あらゆる可能な入力を列挙することが事実上困難であることに起因する。そこで、パターン認識の分野では、実データに基づいてｗ＾を決定する。まず、教師データを Next, the construction phase will be described. In fact, it is very difficult to find the best parameter w for every possible input. That is due to the fact that it is virtually difficult to enumerate all possible inputs. Therefore, in the field of pattern recognition, w ^ is determined based on actual data. First, the teacher data

で表す。教師データは入力ｘ＾、出力ｙ＾のペアの集合で構成される。つまり、 Represented by The teacher data is composed of a set of pairs of input x ^ and output y ^. That means

このとき、ｘ＾_ｉを、教師データ中のｉ番目の入力データとし、ｙ＾_ｉをｉ番目の入力に対応する出力とする。システムパラメタの学習は、下記（２）式の最適化問題を解くことで得られる。 At this time, x ^ _i is the i-th input data in the teacher data, and y ^ _i is the output corresponding to the i-th input. Learning system parameters can be obtained by solving the optimization problem of the following equation (2).

このとき、 At this time,

は、リスク関数や損失関数とよばれ、教師データ内の入力に対してどの程度正しい出力を得られるかといった値を返す関数である。現在のパラメタｗ＾を用いて、実際に上記（１）式を用いて判別を行ってみて、より多く間違える場合には、より大きな値となるような関数を用いる。Ω（ｗ＾）は、一般に正則化項とよばれ、教師データが有限個しかない状況で、教師データに現れないデータに対してもより正しく判別できるように、システムパラメタが教師データに過適応しないように、ペナルティを与える項である。例えば、パラメタのＬ_２−ノルムがなるべく小さくなるような制約を課すことで、パラメタが極端に大きな値をとらないように制限するといったことが、よく用いられる。最終的に、上記（２）式で得られる^＊ｗ＾は、教師データを最もよく識別することができるパラメタの集合といえる。 Is a function called a risk function or a loss function, and returns a value indicating how much correct output can be obtained with respect to the input in the teacher data. When the current parameter w is used to actually make a determination using the above equation (1), if more mistakes are made, a function having a larger value is used. Ω (w ^) is generally called a regularization term, and in a situation where there is only a limited number of teacher data, system parameters are over-adapted to teacher data so that it can be correctly identified even for data that does not appear in the teacher data. This is a term that gives a penalty. For example, it is often used to restrict a parameter from taking an extremely large value by imposing a constraint such that the L ₂ -norm of the parameter is as small as possible. Finally, it can be said that ^* w ^ obtained by the above equation (2) is a set of parameters that can best identify the teacher data.

以上が、本発明で対象とする情報処理システムの実行フェーズと構築フェーズを数式的に定義したものである。 The above is a mathematical definition of the execution phase and the construction phase of the information processing system targeted by the present invention.

上記（２）式に基づいたシステムパラメタの獲得は、パターン認識では教師あり学習と呼ばれる。このとき、学習後のシステムパラメタ^＊ｗ＾は、実数値で表される。よって、構築フェーズ終了時にｗ＾の値をファイルなどに書き出しておき、実行フェーズでは、書き出したファイルを読み込んでスコア関数を計算し、出力を得る。つまり、パラメタ数が多くなればなるほど、その情報を保持するために必要なファイルサイズは大きくなる。 Acquisition of system parameters based on the above equation (2) is called supervised learning in pattern recognition. At this time, the learned system parameter ^* w ^ is represented by a real value. Therefore, the value of ＾ is written to a file or the like at the end of the construction phase, and in the execution phase, the written file is read to calculate a score function and obtain an output. That is, the larger the number of parameters, the larger the file size required to hold the information.

ファイルサイズは、そのまま実行時のメモリ占有量と同じとなる。メモリ占有量は、携帯端末等の限定されたリソースしか持たない計算環境で、非常に大きな問題となる可能性がある。また、一般的な計算機上での実行時にも、近年のマルチコアな計算機上で同時に複数実行する際や、他のプログラムになるべく影響を与えないという観点で、メモリ占有量は極力少ないことが望まれる。つまり本質的に、プログラム実行時の必要リソース量（ファイルサイズ、メモリ専有量など）は、どのような計算環境であれ少ないほうがよりよい、ということがいえる。この考えに基づいて学習後のモデルサイズを、学習の工程で圧縮するという課題が近年盛んに取り組まれている。 The file size is the same as the memory occupancy during execution. Memory occupancy can be a significant problem in computing environments with limited resources, such as portable terminals. In addition, even when executing on a general computer, it is desirable that the memory occupancy is as small as possible from the viewpoint of not affecting other programs as much as possible when simultaneously executing multiple on a recent multi-core computer. . In other words, in essence, it can be said that the required resource amount (file size, exclusive memory amount, etc.) for program execution is better in any calculation environment. Based on this idea, the problem of compressing the model size after learning in the learning process has been actively addressed in recent years.

最も単純な方法として、Ｌ_１正則化項の効果を用いて学習後のモデル圧縮を行う方法がある（非特許文献１）。この方法の原理は、Ｌ_１正則化項が学習時にパラメタが極力ゼロとなるようにする効果があるため、パラメタがゼロになった場合は、そのパラメタに関わる項目をモデルから消去できることから、モデルサイズを削減できる。 The simplest method is a method of performing model compression after learning using the effect of L ₁ regularization term (non-patent document 1). The principle of this method is that the L ₁ regularization term has the effect of making the parameter zero as much as possible during learning. When the parameter becomes zero, the items related to the parameter can be deleted from the model. The size can be reduced.

また、モデル圧縮が可能な方法として、システムパラメタ学習後に複数のパラメタが同じ値を取れば、この重複情報を利用して、実際に保持しておかなくてはいけない情報量を減らすことが可能であるという原理を利用する方法がある（非特許文献２）。このように、情報処理システムの精度を極力落とさずに学習モデルの圧縮を行うことは、実用上非常に大きな課題であり、様々な工夫により改善がなされている。 In addition, as a method that allows model compression, if multiple parameters take the same value after system parameter learning, it is possible to use this duplicate information to reduce the amount of information that must actually be retained. There is a method that uses the principle of being (Non-patent Document 2). Thus, it is a very big problem in practice to compress the learning model without reducing the accuracy of the information processing system as much as possible, and improvements have been made by various devices.

Lin Xiao. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization. Journal of Machine Learning Research, 11:2543-2596, 2010.Lin Xiao. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization. Journal of Machine Learning Research, 11: 2543-2596, 2010. Jun Suzuki and Masaaki Nagata. Supervised Model Learning with Feature Grouping based on a Discrete Constraint. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 18-23,2013.Jun Suzuki and Masaaki Nagata.Supervised Model Learning with Feature Grouping based on a Discrete Constraint.Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 18-23,2013.

しかし、非特許文献１記載の技術は、値をゼロにするのは、真にそのシステムパラメタに関わる項目に意味がない場合を除き、システムの精度を低下させる恐れがあるため、値をゼロにする効果とシステムの精度はトレードオフの関係になる。つまり、なるべくゼロを多くしてモデルサイズを圧縮したいが、あまりにゼロを多くしすぎる設定で学習を行うと、今度はシステム精度が不十分になるという問題点が存在する。 However, in the technique described in Non-Patent Document 1, setting the value to zero may reduce the accuracy of the system unless the item related to the system parameter is truly meaningless. There is a trade-off between the effect of the system and the accuracy of the system. In other words, if you want to reduce the model size by increasing the number of zeros as much as possible, there is a problem that if the learning is performed with the setting of too many zeros, the system accuracy will be insufficient.

また、非特許文献２記載の技術は、非常に効果が高いことが示されているが、一方、実際にモデルを作成する際には、システムの精度を落とさずにどこまでモデルを圧縮可能かは自明ではなく、実際に試行錯誤的に試してみて、開発データなどで最も良い結果を最終的に選択するといった方法がとられる。実際にこの作業は、学習データ量が多くなればなるほど、非常に高コストになり、運用上問題が発生する。結果として、試行錯誤のコストを割くことが出来なかったがゆえに、本来得られる可能性があった高精度かつ高圧縮なモデルを発見できず、結果として、高圧縮なモデルを用いることができなかったり、逆に、精度を落とす結果となったりすることがしばしばおこるという問題点がある。 The technique described in Non-Patent Document 2 has been shown to be very effective. On the other hand, when actually creating a model, how much can the model be compressed without reducing the accuracy of the system? It is not self-evident, but it can be tried by trial and error and finally selecting the best result from development data. Actually, as the amount of learning data increases, this work becomes very expensive and causes operational problems. As a result, it was impossible to find a high-precision and high-compression model that could have been obtained because trial and error costs could not be obtained, and as a result, a high-compression model could not be used. On the contrary, there is a problem that the accuracy is often reduced.

本発明では、上記問題点を解決するために成されたものであり、自動で適切な高圧縮モデルを獲得することができるシステムパラメタ学習装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above problems, and an object thereof is to provide a system parameter learning apparatus, method, and program capable of automatically acquiring an appropriate high compression model.

また、実行時に必要とされるリソースを削減することができる情報処理装置、及びプログラムを提供することを目的とする。 It is another object of the present invention to provide an information processing apparatus and program that can reduce resources required at the time of execution.

上記目的を達成するために、第１の発明に係るシステムパラメタ学習装置は、入力データに対する出力データのスコアを決定するためのスコア関数を用いて、入力データに対して所定の情報処理を行って出力データを出力する情報処理システムにおいて設定される、複数のシステムパラメタを学習するシステムパラメタ学習装置であって、複数の入力データの各々と前記複数の入力データに対する複数の正解出力データの各々とのペアである教師データを受け付ける教師データ入力部と、前記教師データ入力部により受け付けた前記教師データと、前記スコア関数とに基づいて、前記複数のシステムパラメタの各々の値が、所定個の実数値ｖ_i（ｉ＝１，・・・，ζ）と実数値−ｖ_i（ｉ＝１，・・・，ζ）と０とからなる離散値の集合に含まれる制約を満たし、かつ、最適化された前記複数のシステムパラメタを学習する学習部と、を含んで構成されている。 In order to achieve the above object, a system parameter learning device according to a first invention performs predetermined information processing on input data using a score function for determining a score of output data with respect to input data. A system parameter learning device for learning a plurality of system parameters set in an information processing system for outputting output data, each of a plurality of input data and a plurality of correct output data for the plurality of input data Based on a teacher data input unit that receives a pair of teacher data, the teacher data received by the teacher data input unit, and the score function, each value of the plurality of system parameters is a predetermined number of real values Included in a set of discrete values consisting of v _i (i = 1,..., ζ), real values −v _i (i = 1,..., ζ) and 0 And a learning unit that learns the optimized plurality of system parameters.

第２の発明に係るシステムパラメタ学習方法は、教師データ入力部と、学習部とを含む、入力データに対する出力データのスコアを決定するためのスコア関数を用いて、入力データに対して所定の情報処理を行って出力データを出力する情報処理システムにおいて設定される、複数のシステムパラメタを学習するシステムパラメタ学習装置におけるシステムパラメタ学習方法であって、前記教師データ入力部が、複数の入力データの各々と前記複数の入力データに対する複数の正解出力データの各々とのペアである教師データを受け付け、前記学習部が、前記教師データ入力部により受け付けた前記教師データと、前記スコア関数とに基づいて、前記複数のシステムパラメタの各々の値が、前記所定個の実数値ｖ_i（ｉ＝１，・・・，ζ）と実数値−ｖ_i（ｉ＝１，・・・，ζ）と０とからなる離散値の集合に含まれる制約を満たし、かつ、最適化された前記複数のシステムパラメタを学習する。 A system parameter learning method according to a second invention includes a teacher data input unit and a learning unit, and uses a score function for determining a score of output data with respect to input data. A system parameter learning method in a system parameter learning device that learns a plurality of system parameters, set in an information processing system that performs processing and outputs output data, wherein the teacher data input unit includes a plurality of input data And a plurality of correct output data for each of the plurality of input data are received, and the learning unit is based on the teacher data received by the teacher data input unit and the score function, the value of each of the plurality of system parameters is the predetermined number of real numbers _{v i (i = 1, ···} , ζ) and Numerical _{-v i (i = 1, ···} , ζ) satisfy the constraints in the set of discrete values consisting of 0 Prefecture, and learns the plurality of system parameters optimized.

第１及び第２の発明によれば、教師データ入力部により、複数の入力データの各々と複数の入力データに対する複数の正解出力データの各々とのペアである教師データを受け付け、学習部により、受け付けた教師データと、スコア関数とに基づいて、複数のシステムパラメタの各々の値が、所定個の実数値ｖ_i（ｉ＝１，・・・，ζ）と実数値−ｖ_i（ｉ＝１，・・・，ζ）と０とからなる離散値の集合に含まれる制約を満たし、かつ、最適化された複数のシステムパラメタを学習する。 According to the first and second inventions, the teacher data input unit accepts teacher data that is a pair of each of the plurality of input data and each of the plurality of correct output data for the plurality of input data, and the learning unit Based on the received teacher data and the score function, the values of the plurality of system parameters are a predetermined number of real values v _i (i = 1,..., Ζ) and real values −v _i (i = 1,..., Ζ) and 0 satisfy a constraint included in the set of discrete values and learn a plurality of optimized system parameters.

このように、複数の入力データの各々と複数の入力データに対する複数の正解出力データの各々とのペアである教師データを受け付け、受け付けた教師データと、スコア関数とに基づいて、複数のシステムパラメタの各々の値が、所定個の実数値ｖ_iと実数値−ｖ_iと０とからなる離散値の集合に含まれる制約を満たし、かつ、最適化された複数のシステムパラメタを学習することによって、自動で適切な高圧縮モデルを獲得することができる。 In this way, the teacher data that is a pair of each of the plurality of input data and each of the plurality of correct output data for the plurality of input data is received, and a plurality of system parameters are based on the received teacher data and the score function By learning a plurality of optimized system parameters that satisfy the constraints included in the set of discrete values consisting of a predetermined number of real values v _i , real values −v _i and 0 It is possible to automatically obtain an appropriate high compression model.

第３の発明に係る情報処理装置は、入力データを受け付ける入力部と、前記スコア関数と、第１の発明のシステムパラメタ学習装置によって保存された各グループの前記インデックス番号のシステムパラメタとに基づいて、前記入力部において受け付けた入力データに対して、前記所定の情報処理を行って出力データを出力する情報処理部と、を含んで構成されている。 An information processing apparatus according to a third invention is based on an input unit that receives input data, the score function, and the system parameter of the index number of each group stored by the system parameter learning apparatus of the first invention. And an information processing unit that performs the predetermined information processing on the input data received by the input unit and outputs output data.

また、本発明のプログラムは、コンピュータを、上記のシステムパラメタ学習装置又は情報処理装置を構成する各部として機能させるためのプログラムである。 Moreover, the program of this invention is a program for functioning a computer as each part which comprises said system parameter learning apparatus or information processing apparatus.

以上説明したように、本発明のパラメタ学習装置、方法、及びプログラムによれば、複数の入力データの各々と複数の入力データに対する複数の正解出力データの各々とのペアである教師データを受け付け、受け付けた教師データと、スコア関数とに基づいて、複数のシステムパラメタの各々の値が、所定個の実数値ｖ_iと実数値−ｖ_iと０とからなる離散値の集合に含まれる制約を満たし、かつ、最適化された複数のシステムパラメタを学習することによって、自動で適切な高圧縮モデルを獲得することができる。 As described above, according to the parameter learning apparatus, method, and program of the present invention, teacher data that is a pair of each of a plurality of input data and a plurality of correct output data for a plurality of input data is received, Based on the received teacher data and the score function, each of the plurality of system parameters has a constraint included in a set of discrete values including a predetermined number of real values v _i , real values −v _i, and 0. By learning a plurality of system parameters that are satisfied and optimized, an appropriate high compression model can be obtained automatically.

また、本発明の情報処理装置によれば、システムパラメタ学習装置によって保存された各グループのインデックス番号のシステムパラメタに基づいて、入力データに対して、所定の情報処理を行って出力データを出力することにより、実行時に必要とされるリソースを削減することができる。 Further, according to the information processing apparatus of the present invention, based on the system parameter of the index number of each group saved by the system parameter learning apparatus, the input data is subjected to predetermined information processing and output data is output. As a result, resources required at the time of execution can be reduced.

本発明の実施の形態を適用する問題の例を示す図である。It is a figure which shows the example of the problem to which embodiment of this invention is applied. 素性とシステムの定義づけの一例を説明するための概念図である。It is a conceptual diagram for demonstrating an example of the definition of a feature and a system. 本発明の実施の形態に係るシステムパラメタ学習装置の構成を示すブロック図である。It is a block diagram which shows the structure of the system parameter learning apparatus which concerns on embodiment of this invention. 素性抽出関数の一例を説明するための第１の概念図である。It is a 1st conceptual diagram for demonstrating an example of a feature extraction function. 素性抽出関数の一例を説明するための第２の概念図である。It is a 2nd conceptual diagram for demonstrating an example of a feature extraction function. 教師データに基づいて、システムパラメタ値の学習処理を説明するための概念図である。It is a conceptual diagram for demonstrating the learning process of a system parameter value based on teacher data. 超平面への写像問題の例を示す図である。It is a figure which shows the example of the mapping problem to a hyperplane. 一次元のｋ−ｍｅａｎｓと等価の処理の例を示す図である。It is a figure which shows the example of a process equivalent to one-dimensional k-means. 本発明の実施の形態に係る情報処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係るシステムパラメタ学習装置におけるシステムパラメタ学習処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the system parameter learning process routine in the system parameter learning apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理装置における情報処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the information processing routine in the information processing apparatus which concerns on embodiment of this invention. 本実施の形態を用いた場合における実験結果を示す図である。It is a figure which shows the experimental result in the case of using this Embodiment. 従来技術の概要を説明するための第１の説明図である。It is the 1st explanatory view for explaining an outline of conventional technology. 従来技術の概要を説明するための第２の説明図である。It is the 2nd explanatory view for explaining an outline of conventional technology. 従来技術の概要を説明するための第３の説明図である。It is the 3rd explanatory view for explaining an outline of conventional technology.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の原理＞ <Principle of the present invention>

本発明に係る実施の形態は、非特許文献２によるモデル圧縮の枠組みに、学習データから自動で高精度かつ高圧縮モデルを構築する処理装置を追加する。 In the embodiment according to the present invention, a processing apparatus that automatically builds a high-precision and high-compression model from learning data is added to the model compression framework of Non-Patent Document 2.

まず、非特許文献２のモデル圧縮処理装置を持った機械学習装置の仕組みを説明する。基本的な考えは、システムパラメタ学習後に複数のパラメタが同じ値をとれば、この重複情報を利用して、実際に保持しておかなくてはいけない情報量を減らすことが可能であるという原理を利用している。例えば、パラメタ数が５の時に、上記（２）式で最終的に得られたパラメタ集合が^＊ｗ＾＝（０．３，０．３，−０．６，−０．６，１．０）だったとする。すると、^＊ｗ＾では、０．３と−０．６が２回重複して出現するため、１番目と２番目のパラメタを合わせて、また、３番目と４番目のパラメタを合わせて、^＊ｗ＾＝（０．３，−０．６，１．０）という３つの情報が最低限あれば同じ結果が得られる。このように、重複した値が多ければ多いほど等価な情報を情報量を減らして保持することが可能であることがわかる。 First, the mechanism of the machine learning device having the model compression processing device of Non-Patent Document 2 will be described. The basic idea is that if multiple parameters have the same value after system parameter learning, this duplicate information can be used to reduce the amount of information that must actually be retained. We are using. For example, when the number of parameters is 5, the parameter set finally obtained by the above equation (2) is ^* w ^ = (0.3, 0.3, -0.6, -0.6, 1.0 ). Then, in ^* w ^, 0.3 and -0.6 appear twice, so the first and second parameters are combined, and the third and fourth parameters are combined ^. The same result can be obtained if three pieces of information such as w ^ = (0.3, -0.6, 1.0) are at least. Thus, it can be seen that as the number of duplicate values increases, it is possible to retain equivalent information with a reduced amount of information.

つまり、上記（２）式のシステムパラメタの学習を行う際に、「システムパラメタの値がなるべく多く同じ値をとるようにする」という制約を追加して、システムパラメタの学習を行うことで、結果として得られるシステムパラメタの値の重複がより多くなるようにする。 In other words, when learning the system parameter of the above formula (2), the result of adding the constraint that “the value of the system parameter takes the same value as much as possible” is added and the system parameter is learned. As a result, the duplication of system parameter values obtained as follows is increased.

そして、次に、得られたパラメタに対して、同じ値を持つパラメタを一つにまとめることで、保持すべき情報量を削減する。最後に、削減したパラメタを用いて、実際にシステムを実行する。まとめると、従来の方法は、大きく分けて以下の１〜３の３つの処理で構成される。 Then, for the obtained parameters, the parameters having the same value are combined into one to reduce the amount of information to be held. Finally, the system is actually executed using the reduced parameters. In summary, the conventional method is roughly divided into the following three processes 1 to 3.

１．第１の処理として、通常のシステムパラメタ学習時に最終的に得られるパラメタの値をなるべく同じになるようなシステムパラメタの獲得処理。
２．第２の処理として、得られたシステムパラメタの重複する値を一つにまとめて保持しておくべき情報量を最小限にするシステムパラメタの圧縮処理。
３．第３の処理として、最小限に圧縮したシステムパラメタを用いて、システムを実行フェーズで動作させる処理。 1. As a first process, a process for acquiring system parameters so that the parameter values finally obtained at the time of normal system parameter learning become as much as possible.
2. As a second process, a system parameter compression process for minimizing the amount of information that should be retained by holding duplicate values of the obtained system parameters together.
3. A third process is a process for operating the system in the execution phase by using the system parameter compressed to the minimum.

より具体的には、第１の処理は、上記（２）式のシステムパラメタ学習問題を、下記（３）式の問題に置き換える。 More specifically, the first process replaces the system parameter learning problem of the above formula (2) with the problem of the following formula (3).

上記（２）式と上記（３）式との違いは、単純に制約項 The difference between equation (2) and equation (3) is simply a constraint term

が増えただけである。この制約は、システムパラメタｗ＾がある任意の離散集合 Only increased. This constraint is an arbitrary discrete set with system parameters w ^

の要素となる場合にだけ解として認めるということを意味している。つまり、上記（２）式と同等の最適化問題を解くが、解は制約を満たしている必要がある。この制約は、パラメタの値をなるべく重複してとるように設計するために、有限個の離散値 This means that it is accepted as a solution only if it becomes an element of. That is, the optimization problem equivalent to the above equation (2) is solved, but the solution needs to satisfy the constraints. This constraint is used to design a finite number of discrete values in order to design the parameter values to be duplicated as much as possible.

の直積集合 Cartesian product set of

で構成する。 Consists of.

第２の処理として、同じ値になったパラメタを一つにまとめる。これは、構築フェーズの後処理的な位置づけになる。これは、例えば、ｗ_ｉ＝ｗ_ｊ＝ｗ_ｋとなったと仮定すると、つまり、ｉ，ｊ，ｋ番目のシステムパラメタ（パラメタベクトルの要素）が同じ値となったとき、ｗ_ｉ，ｗ_ｊ，ｗ_ｋを削除し、新たにｗ_ｌとするといった処理に相当する。ただし、新たに追加したｗ_ｌはｗ_ｉなどと同じ値であり、インデックスｉ，ｊ，ｋはインデックスｌに新たに振りなおされたとみなすことと等価である。このように、同じ値を保持していても冗長な情報なので、それらを一つにまとめて新たに割り振ったインデックスを、元の値のインデックスが指すことで、同じ情報を得られるようにする。こうすることで、従来と同じ形式のシステムパラメタでありながら、システムパラメタの値が重複した分を減らした、システムパラメタの集合を獲得することができる。 As a second process, parameters having the same value are combined into one. This is a post-processing position of the construction phase. For example, assuming that w _i = w _j = w _k , that is, when the i, j, and k-th system parameters (parameter vector elements) have the same value, w _i , w _j , This corresponds to a process of deleting w _k and newly setting w _l . However, the newly added w _l is the same value as w _i and the like, and the indexes i, j, and k are equivalent to assuming that the index l is newly reassigned. In this way, even if the same value is held, it is redundant information, so that the same information can be obtained by referring to the index of the original value that points to the newly allocated index. By doing so, it is possible to obtain a set of system parameters, which are the system parameters in the same format as before, but with a reduced amount of duplicate system parameter values.

第３の処理は、得られたシステムパラメタを用いて実際にシステムを動作させる。この処理は、第２の処理で、従来と同じ形式のシステムパラメタを得るようにすることで、処理としては、従来と同等の上記（１）式が使える。 In the third process, the system is actually operated using the obtained system parameters. In this process, the system parameters in the same format as in the conventional system are obtained in the second process, so that the above equation (1) equivalent to the conventional one can be used as the process.

本発明に係る実施の形態は、主に上記第１の処理を変更する。従来の方法では、制約 The embodiment according to the present invention mainly changes the first process. With traditional methods, constraints

に現れる集合 Sets appearing in

は手動で決める必要がある。つまり、この集合の定義がシステム精度に大きく影響を与える。また、この定義は、データや対象タスクによって変わるものであり、事前に最適な定義を与えることは困難である。場合によっては、人間の事前知識等により、高精度なモデルを構築可能な定義を導出できる場合もありえるが、一般論としては、非常に難しくセンシティブな作業となるため、人手コスト等の面で大きな課題となる。本発明に係る実施の形態では、この定義を決める部分も最適化問題に含めて考えることで、データから自動的に決定する。 Must be determined manually. In other words, the definition of this set greatly affects the system accuracy. Also, this definition varies depending on the data and the target task, and it is difficult to give an optimal definition in advance. In some cases, it may be possible to derive a definition that can build a high-accuracy model based on human prior knowledge, etc., but as a general rule, this is a very difficult and sensitive task, so it is very expensive in terms of manpower costs. It becomes a problem. In the embodiment according to the present invention, the part that determines this definition is included in the optimization problem, and is automatically determined from the data.

具体的には、制約 Specifically, constraints

を、制約 The constraints

に変更し、上記（３）式を、下記（４）式のように再定義する。 The above equation (3) is redefined as the following equation (4).

上記（４）式の意味は、制約として取り得る値は、０と±ｖ_ｉのみとなり、ｖ_ｉは、ζ種類の実数値となる。例えば、ζ＝４を予め指定した場合は、モデル内のパラメタ各ｗ＾の取り得る値は零を含めて９種類となる。また、このときの取り得る値の制約ｖ_ｉもデータから自動に決定される。 Meaning of the expression (4), possible values as constraint 0 becomes only ± v _i, v _i is a ζ types of real-valued. For example, when ζ = 4 is designated in advance, there are nine possible values including zero for each parameter w in the model. In addition, the possible value constraint v _{i at} this time is also automatically determined from the data.

以下では、図１に示すような、自然言語処理の固有表現抽出と係り受け解析に対する問題について、本発明の実施の形態を適用した場合を想定して説明する。これらの問題は、構造予測問題と呼ばれ、グラフ構造などに変換されたものを入力として受け取り、同じくグラフ構造で表されるものを出力とする問題とみなすことができる。 In the following, the problem of natural language processing specific expression extraction and dependency analysis as shown in FIG. 1 will be described assuming that the embodiment of the present invention is applied. These problems are called structure prediction problems, and can be regarded as problems that receive as an input what has been converted to a graph structure or the like and output what is also represented by the graph structure.

以下で説明する、構築及び実行フェーズでは、図２に示すように、最初にそれぞれの入力と出力の特徴付ける素性の定義を人手にて与える。ここでは、素性は関数として定義することを想定するため、素性抽出関数の集合を定義することと等価である。また、ある入力が与えられたときに、どのような出力がなされるかといった、システムの動作定義も人手にて与える。これは、実際に解く問題、ここでは固有表現抽出や係り受け解析、の問題の定義にしたがって自動的に決まるものである。 In the construction and execution phase described below, as shown in FIG. 2, first, the definition of the features that characterize each input and output is given manually. Here, since the feature is assumed to be defined as a function, it is equivalent to defining a set of feature extraction functions. Also, the system operation definition such as what kind of output is made when a certain input is given is given manually. This is automatically determined in accordance with the definition of the problem to be solved, here, the specific expression extraction and dependency analysis.

＜システムパラメタ学習装置のシステム構成＞
次に、本発明の実施の形態に係るシステムパラメタ学習装置の構成について説明する。図３に示すように、本発明の実施の形態に係るシステムパラメタ学習装置１００は、ＣＰＵと、ＲＡＭと、後述するシステムパラメタ学習処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。このシステムパラメタ学習装置１００は、機能的には図３に示すように教師データ入力部１０と、演算部２０と、システムパラメタ記憶部９０とを備えている。 <System configuration of system parameter learning device>
Next, the configuration of the system parameter learning device according to the embodiment of the present invention will be described. As shown in FIG. 3, a system parameter learning device 100 according to an embodiment of the present invention includes a CPU, a RAM, a ROM for storing a program and various data for executing a system parameter learning processing routine described later, Can be configured with a computer including Functionally, the system parameter learning apparatus 100 includes a teacher data input unit 10, a calculation unit 20, and a system parameter storage unit 90 as shown in FIG.

教師データ入力部１０は、教師データの入力を受け付ける。ここで、教師データは、上記図１で示したように、予め定められた入力データと、当該入力データに対する正解出力データとの複数ペアである。また、教師データ入力部１０は、後述する学習部４０で用いるチューニングパラメタρの入力を受け付ける。なお、パラメタρは、人手によって予め与えられる。 The teacher data input unit 10 receives input of teacher data. Here, as shown in FIG. 1, the teacher data is a plurality of pairs of predetermined input data and correct output data for the input data. The teacher data input unit 10 receives an input of a tuning parameter ρ used in the learning unit 40 described later. The parameter ρ is given in advance by hand.

演算部２０は、教師データベース３０と、学習部４０と、重複パラメタ圧縮部６０とを備えている。 The calculation unit 20 includes a teacher database 30, a learning unit 40, and a duplicate parameter compression unit 60.

教師データベース３０には、教師データ入力部１０により受け付けた教師データが格納される。 Teacher data received by the teacher data input unit 10 is stored in the teacher database 30.

学習部４０は、教師データベース３０に格納された教師データと、予め定められたスコア関数とに基づいて、複数のシステムパラメタを学習する。具体的には、学習部４０は、教師データとスコア関数とに基づいて、複数のシステムパラメタの各々の値が、個数ζが予め定められた実数値ｖ_ｉ（ｉ＝１，…，ζ）と実数値−ｖ_ｉ（ｉ＝１，…，ζ）と０とからなる離散値の集合に含まれる制約を満たし、かつ、最適化された複数のシステムパラメタを学習する。学習部４０は、初期化部５０、システムパラメタ更新部５２、補助パラメタ更新部５４、未定乗数更新部５６、及び収束判定部５８を備えている。 The learning unit 40 learns a plurality of system parameters based on teacher data stored in the teacher database 30 and a predetermined score function. Specifically, the learning unit 40 determines that each value of the plurality of system parameters is a real value v _i (i = 1,..., Ζ) with a predetermined number ζ based on the teacher data and the score function. the real value _{-v i (i = 1, ...} , ζ) satisfy the constraints in the set of discrete values consisting of 0 Prefecture, and learns a plurality of system parameters optimized. The learning unit 40 includes an initialization unit 50, a system parameter update unit 52, an auxiliary parameter update unit 54, an undetermined multiplier update unit 56, and a convergence determination unit 58.

学習部４０における学習アルゴリズムでは、目的関数を最小化するシステムパラメタｗ＾＝（ｗ_１，ｗ_２，…，ｗ_Ｎ）を学習する。具体的には、正解データと教師あり学習アルゴリズムを人手により決定し、それを構築フェーズの設定として与える。以下では、システムパラメタｗ＾を学習するための目的関数について最初に説明する。 The learning algorithm in the learning unit 40 learns system parameters w ^ = (w ₁ , w ₂ ,..., W _N ) that minimize the objective function. Specifically, the correct answer data and the supervised learning algorithm are determined manually and given as the setting of the construction phase. In the following, the objective function for learning the system parameter w will be described first.

ここでは、素性を抽出するための素性抽出関数ｆを、入力データｘ＾と出力データｙ＾の組み合わせで定義される関数とする。個々の素性抽出関数は、ｆ_ｄ（ｘ＾，ｙ＾）の形式の関数であり、任意の実数値を返す関数である。ここで、ｗ＾と同様に、素性抽出関数の集合をベクトル表記ｆ＾（ｘ＾，ｙ＾）で表す。このとき、ｆ_ｄ（ｘ＾，ｙ＾）は、ｆ＾（ｘ＾，ｙ＾）のｄ番目の要素を表す。素性抽出関数によって抽出される素性の例を、図４、及び図５に示す。 Here, the feature extraction function f for extracting features is a function defined by a combination of input data x ^ and output data y ^. Each feature extraction function is a function of the form f _d (x ^, y ^) and is a function that returns an arbitrary real value. Here, like w ^, a set of feature extraction functions is represented by a vector notation f ^ (x ^, y ^). At this time, f _d (x ^, y ^) represents the d-th element of f ^ (x ^, y ^). Examples of features extracted by the feature extraction function are shown in FIG. 4 and FIG.

また、上記の素性抽出関数の数が、システムパラメタの数Ｎとなるように設計すると、スコア関数（）は以下の（５）式のように線形関数として定義できる。 If the number of feature extraction functions is designed to be the number N of system parameters, the score function () can be defined as a linear function as shown in the following equation (5).

つまり、システムパラメタｗ_ｄは素性抽出関数ｆ_ｄの重みであり、値が大きければ、素性抽出関数ｆ_ｄがよりシステムの出力データの選択に影響をおよぼし、マイナス側に大きければ、システムが出力データｙ＾をより選ばないような重み付けをしたことに相当する。つまりここでのシステムパラメタは素性の重みを決める値である。学習部４０では、図６に示すように教師あり学習アルゴリズムによってシステムパラメタの学習を行う。 That is, the system parameter w _d is the weight of the feature extraction function f _d , and if the value is large, the feature extraction function f _d more influences the selection of the output data of the system. This corresponds to weighting such that y is not selected more. In other words, the system parameter here is a value that determines the weight of the feature. The learning unit 40 learns system parameters using a supervised learning algorithm as shown in FIG.

目的関数は上記（４）式で表わされるが、上記（４）式は、解に離散制約が入るので、基本的に離散最適化問題となり、厳密に解を求めるのが非常に困難な問題の系になる。しかし、双対分解に基づく方法を活用することで効率的な解法が得られる（例えば、非特許文献３： Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein、「Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers」,2011,Foundations and Trends in Machine Learning）。まず、双対分解に基づいて上記（４）式の制約を分解する。 Although the objective function is expressed by the above equation (4), the above equation (4) is basically a discrete optimization problem because the solution has discrete constraints, and it is very difficult to obtain a solution strictly. Become a system. However, an efficient solution can be obtained by utilizing a method based on dual decomposition (eg, Non-Patent Document 3: Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein, “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers ”, 2011, Foundations and Trends in Machine Learning). First, the constraint of the above equation (4) is decomposed based on dual decomposition.

ここで、ｕ＾は、補助パラメタである。これは、等式制約ｗ＾＝ｕ＾を用いて従来の最適化問題と制約を分離したことになる。次に、上記非特許文献３の拡張ラグランジュ緩和を用いて制約を目的関数に代入する。 Here, u ^ is an auxiliary parameter. This means that the conventional optimization problem and the constraint are separated by using the equality constraint w ^ = u ^. Next, the constraint is substituted into the objective function using the extended Lagrangian relaxation of Non-Patent Document 3.

α＾はラグランジュ乗数である。上記非特許文献３にしたがって、初期化部５０、システムパラメタ更新部５２、補助パラメタ更新部５４、未定乗数更新部５６、及び収束判定部５８における処理によって、上記（７）式に示す目的関数を最適化することにより、本実施の形態では、値が重複したシステムパラメタが得られる。 α ^ is a Lagrange multiplier. According to the non-patent document 3, the objective function shown in the above equation (7) is obtained by the processing in the initialization unit 50, the system parameter update unit 52, the auxiliary parameter update unit 54, the undetermined multiplier update unit 56, and the convergence determination unit 58. By optimizing, in the present embodiment, system parameters with overlapping values are obtained.

リスク関数は、基本的にどのような関数を用いてもよいが、ここでは、凸関数であることに限定する。本実施の形態では、リスク関数として、以下の（８）式に示すヒンジ損失関数を用いる。 As the risk function, basically any function may be used, but here, the risk function is limited to a convex function. In the present embodiment, a hinge loss function shown in the following equation (8) is used as the risk function.

ここで、Ｅ（ｙ＾，^＊ｙ＾）は、ｙ＾と^＊ｙ＾とがどの程度違っているかを示す関数である。ｙ＾と^＊ｙ＾との違いが大きい程、Ｅ（ｙ＾，^＊ｙ＾）は０以上の大きい値となり、ｙ＾と^＊ｙ＾とが同一である場合にＥ（ｙ＾，^＊ｙ＾）は０となる。 Here, E (y ^, ^* y ^) is a function indicating how much y ^ differs from ^* y ^. y ^ and ^* y ^ and as there is a large difference between ^{the, E (y ^, * y} ^) should be greater than or equal to 0 of large value, y ^ and ^* y ^ and E in the case are the same (y ^{^,} * y ^) Becomes 0.

また、リスク関数と同様に、正則化項も凸関数であることとする。例えば、正則化項について、以下の（９）式に示すＬ_１−ノルム正則化項を用いる。 Also, like the risk function, the regularization term is also a convex function. For example, for the regularization term, the L ₁ -norm regularization term shown in the following equation (9) is used.

リスク関数、及び目的関数が凸関数であるため、上記（２）式の最適化問題は基本的に凸最適化であり、大域的な最適解が必ず存在する。 Since the risk function and the objective function are convex functions, the optimization problem of the above equation (2) is basically convex optimization, and there is always a global optimal solution.

初期化部５０は、最適化に用いる３種類のパラメタｗ＾、ｕ＾、α＾を全て０にセットする。以下繰り返し計算となるため、繰り返し回数を管理する変数をｔとし、ｔ＝０とする。 The initialization unit 50 sets all three types of parameters w ^, u ^, and α ^ used for optimization to zero. Since the calculation is repeated, the variable for managing the number of repetitions is t, and t = 0.

システムパラメタ更新部５２は、初期化部５０で初期化された３種類のパラメタｗ＾、ｕ＾^（ｔ）、α＾^（ｔ）、又は後述する各処理で前回更新された３種類のパラメタｗ＾、ｕ＾^（ｔ）、α＾^（ｔ）を用いて、以下に説明するように、システムパラメタｗ＾^（ｔ）からｗ＾^{（ｔ＋１）}へ更新する。 The system parameter update unit 52 includes three types of parameters w ^, u ^ ^(t) , α ^ ^(t) initialized by the initialization unit 50, or three types of parameters w updated last time in each process described later. Using {circumflex over ⁽ u ⁾ }, {circumflex over ⁽ u ⁾ } ^(t) , and {circumflex over ⁽ α ⁾ } ^(t) , the system parameter “w” ^(t ⁾ is updated to “w” ^{(t + 1) as described} below.

反復計算ｔの時点で、ｕ＾^（ｔ）とα＾^（ｔ）を固定したときのｗ＾の最適解は、目的関数
The optimal solution of w ^ when u ^ ^(t) and α ^ ^(t) are fixed at the time of the iterative calculation t is the objective function

を最小にするｗ^を見つける問題の解である。 Is the solution to the problem of finding w ^ that minimizes.

定義に従って、上記（１０）式から、ｗ＾に関係する項のみを取り出すと、下記（１１）式となる。 According to the definition, when only the term related to w ^ is extracted from the above equation (10), the following equation (11) is obtained.

この最適化問題は、上記（２）式にＬ_２−ノルム正則化項が追加された最適化問題とみなすことができる。この問題は、上記（２）式が凸最適化問題であれば、必ず凸最適化問題となるため、上記（２）式を解く方法を用いて容易に解くことができる。 This optimization problem can be regarded as an optimization problem in which an L ₂ -norm regularization term is added to the above equation (2). This problem is always a convex optimization problem if the above equation (2) is a convex optimization problem, and can be easily solved using a method for solving the above equation (2).

そこで、システムパラメタ更新部５２は、初期化部５０で初期化された３種類のパラメタｗ＾、ｕ＾^（ｔ）、α＾^（ｔ）、又は後述する各処理で前回更新された３種類のパラメタｗ＾、ｕ＾^（ｔ）、α＾^（ｔ）を用いて、上記（１１）式に従って、システムパラメタｗ＾^（ｔ）からｗ＾^{（ｔ＋１）}へ更新する。 Therefore, the system parameter update unit 52 performs the three types of parameters w ^, u ^ ^(t) , α ^ ^(t) initialized by the initialization unit 50, or the three types updated last time in each process described later. Using the parameters w ^, u ^ ^(t) and α ^ ^(t) , the system parameters w ^ ^(t ⁾ are updated to w ^ ^{(t + 1)} according to the above equation (11).

補助パラメタ更新部５４は、システムパラメタ更新部５２で更新されたｗ＾^{（ｔ＋１）}と、初期化部５０で初期化されたパラメタα＾^（ｔ）、又は前回更新されたパラメタα＾^（ｔ）とを用いて、以下に説明するように、補助パラメタｕ＾^（ｔ）からｕ＾^{（ｔ＋１）}へ更新する。 The auxiliary parameter update unit 54 is updated by the system parameter update unit 52 w ^ ^{(t + 1)} , the parameter α ^ ^(t) initialized by the initialization unit 50, or the parameter α ^ ^(t) updated last time. As described below, the auxiliary parameter u ^ ^(t ⁾ is updated to u ^ ^{(t + 1)} .

ｗ＾^{（ｔ＋１）}とα＾^（ｔ）を固定したときｕ＾の最適解は、目的関数
When w ^ ^{(t + 1)} and α ^ ^(t) are fixed, the optimal solution of u ^ is the objective function

に制約 Constrained to

が追加された最適化問題の解である。まず、ｕ＾に関連する項だけ集めると、下記（１２）式に表す目的関数となる。 Is the solution to the added optimization problem. First, if only terms related to u ^ are collected, an objective function expressed by the following equation (12) is obtained.

まず、制約が無い場合を考えると、目的関数 First, considering the case of no constraints, the objective function

のｕ＾に関する勾配が零ベクトルになる点である。その関係から下記（１３）式及び下記（１４）式の関係式が得られる。 Is the point at which the gradient for u ^ becomes a zero vector. From the relation, the following relational expressions (13) and (14) are obtained.

次に、制約を考慮すると、上記（１２）式から、制約が無い場合の最適解ｕ＾＝ｗ＾^{（ｔ＋１）}＋α^（ｔ）からζの自由度を持った超平面への写像問題と捉えることができる。よって、図７に示すように、ｕ＾＝ｗ＾^{（ｔ＋１）}＋α^（ｔ）から最も近い超平面上の点を見つける処理を行えばよい。実際にこの処理は、図８に示すように、一次元のｋ−ｍｅａｎｓと等価の処理になる（非特許文献４：Haizhou Wang and Mingzhou Song. Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming. The R Journal, 3(2):29.33, 2011.）。このことから、各パラメタに対して独立に処理が可能であり、非常に効率的に求めることができる。 Next, considering the constraints, from the above equation (12), the optimal solution u ^ = w ^ ^{(t + 1)} + α ^(t) when there is no constraint is regarded as a mapping problem to a hyperplane having ζ degrees of freedom. be able to. Therefore, as shown in FIG. 7, a process for finding a point on the hyperplane closest to u ^ = w ^ ^{(t + 1)} + α ^(t) may be performed. Actually, this processing is equivalent to one-dimensional k-means as shown in FIG. 8 (Non-Patent Document 4: Haizhou Wang and Mingzhou Song. Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming. The R Journal, 3 (2): 29.33, 2011.). Thus, each parameter can be processed independently and can be obtained very efficiently.

未定乗数更新部５６は、システムパラメタ更新部５２で更新されたｗ＾^{（ｔ＋１）}と、補助パラメタ更新部５４で更新されたｕ＾^{（ｔ＋１）}と、初期化部５０で初期化されたパラメタα＾^（ｔ）、又は前回更新されたパラメタα＾^（ｔ）とを用いて、以下に説明するように、ラグランジュ未定乗数α＾を更新する。 The undetermined multiplier updating unit 56 includes w ^ ^{(t + 1)} updated by the system parameter updating unit 52, u ^ ^{(t + 1)} updated by the auxiliary parameter updating unit 54, and the parameter α initialized by the initialization unit 50. Using La ^(t) or the parameter α ^ ^(t) updated last time, the Lagrange undetermined multiplier α ^ is updated as described below.

ｗ＾^{（ｔ＋１）}とｕ＾^{（ｔ＋１）}を固定したときの最適値の方向は、目的関数 The direction of the optimum value when w ^ ^{(t + 1)} and u ^ ^{(t + 1)} are fixed is the objective function

のα＾に関する勾配方向である。 Is the gradient direction with respect to α ^.

上記（１５）式から下記（１６）式の更新式を得る。 An update formula of the following formula (16) is obtained from the formula (15).

以上より、未定乗数更新部５６は、システムパラメタ更新部５２で更新されたｗ＾^{（ｔ＋１）}と、補助パラメタ更新部５４で更新されたｕ＾^{（ｔ＋１）}と、初期化部５０で初期化されたパラメタα＾^（ｔ）、又は前回更新されたパラメタα＾^（ｔ）とを用いて、上記（１６）式に従って、ラグランジュ未定乗数α＾^（ｔ）からα＾^{（ｔ＋１）}へ更新する。 As described above, the undetermined multiplier update unit 56 is initialized by the initialization unit 50 with w ^ ^{(t + 1)} updated by the system parameter update unit 52, u ^ ^{(t + 1)} updated by the auxiliary parameter update unit 54. and parameter alpha ^{^ (t),} or by using the parameters alpha ^{^} and ^(t), which was last updated, according to the above (16), to update the Lagrange multiplier alpha ^{^ (t)} alpha ^{^} to ^{(t + 1).}

収束判定部５８は、予め定められた条件が満たされたか否かを判定し、当該条件が満たされるまで、システムパラメタ更新部５２、補助パラメタ更新部５４及び未定乗数更新部５６による更新処理を繰り返す。具体的には、収束判定部５８は、上記の各処理によって得られたシステムパラメタｗ＾が最適値になっているか判定する。より詳細には、二つの小さな正の実数ε_１、ε_２をあたえ、下記（１７）式、及び下記（１８）式を満たした際に収束したと判定する（上記非特許文献３参照）。 The convergence determination unit 58 determines whether or not a predetermined condition is satisfied, and repeats update processing by the system parameter update unit 52, the auxiliary parameter update unit 54, and the undetermined multiplier update unit 56 until the condition is satisfied. . Specifically, the convergence determination unit 58 determines whether or not the system parameter ＾ obtained by each of the above processes is an optimal value. More specifically, two small positive real numbers ε ₁ and ε ₂ are given, and it is determined that convergence has occurred when the following equation (17) and the following equation (18) are satisfied (see Non-Patent Document 3 above).

そして、収束判定部５８による収束判定で、収束していなかった場合は、ｔ＝ｔ＋１としてシステムパラメタ更新部５２に戻る。収束していると判定された場合は、後述する重複パラメタ圧縮部６０に移る。 If the convergence is not determined in the convergence determination by the convergence determination unit 58, t = t + 1 is returned to the system parameter update unit 52. If it is determined that it has converged, the process proceeds to the duplicate parameter compression unit 60 described later.

重複パラメタ圧縮部６０は、学習部４０によって学習された複数のシステムパラメタに基づいて、値が同一となるシステムパラメタでグループを構成し、各グループにインデックス番号を付与し、各グループについて、グループを構成するシステムパラメタの各々を、当該グループに付与されたインデックス番号のシステムパラメタに変換し、当該インデックス番号のシステムパラメタを、後述するシステムパラメタ記憶部９０に記憶する。具体的には、上記（５）式で定義したように、スコア関数には線形式を仮定したので、ｗ_ｉ＝ｗ_ｊ＝，，，となるような、同じ値を取るシステムパラメタでグループを構築する。 Based on a plurality of system parameters learned by the learning unit 40, the duplicate parameter compression unit 60 forms a group with system parameters having the same value, assigns an index number to each group, and sets the group for each group. Each of the constituent system parameters is converted into a system parameter having an index number assigned to the group, and the system parameter having the index number is stored in a system parameter storage unit 90 described later. Specifically, as defined in the above equation (5), since the score function is assumed to be in a linear format, groups are defined by system parameters that have the same values such that w _i = w _j =,. To construct.

仮に、Ｋ個のグループができたと仮定すると、各グループに１からＫまでのインデックス番号を付け Assuming that K groups are created, each group is assigned an index number from 1 to K.

のようにＫ個のグループの集合を構築する。ここで、ｗ_ｉ＝ｗ_ｊ＝ｗ_ｌの時に、 A set of K groups is constructed as follows. Here, when w _i = w _j = w _l ,

をシステムパラメタの元の番号の集合と考え、 Is the set of system parameter original numbers,

と定義する。つまり、元のシステムパラメタ番号から、そのシステムパラメタと同じ値になるグループのインデックス番号への変換を、一時的に記憶しておくことを意味する。また、ｖ_ｋをｋ番目のグループの値とする、つまり、ｗ_ｉ＝ｗ_ｊ＝ｗ_ｌで It is defined as That is, it means that the conversion from the original system parameter number to the index number of the group having the same value as the system parameter is temporarily stored. Also, let v _{k be} the value of the k-th group, ie, w _i = w _j = w _l

のとき、ｗ_ｉ＝ｗ_ｊ＝ｗ_ｌ＝ｖ_ｋである。 Then, w _i = w _j = w _l = v _k .

また、重複パラメタ圧縮部６０は、このインデックス番号の変換を素性関数ｆにも同じように適用する。このときに、新しくｇ_ｋを以下の（１９）式のように、元の素性抽出関数の単純な線形結合で定義する。 The duplicate parameter compression unit 60 applies the conversion of the index number to the feature function f in the same manner. At this time, g _k is newly defined by a simple linear combination of the original feature extraction functions as in the following equation (19).

すると、上記（５）式のスコア関数は、 Then, the score function of the above equation (5) is

という関係が成り立つ。ここで、複数のシステムパラメタｗ＾は、入力データｘ＾及び出力データｙ＾に対する複数の素性抽出関数ｆ_ｄ（ｘ＾、ｙ＾）の重みとなる。つまり、もともとの線形関数Σ^Ｎ _ｄ＝１ｗ_ｄｆ_ｄ（ｘ＾，ｙ＾）は、新たな線形関数Σ^Ｋ _ｋ＝１ｖ_ｋｇ_ｋ（ｘ＾，ｙ＾）と等価で変換できる。ただし、Ｋはグループの数なので、Ｎと比べると圧倒的に少ない数になっていることが想定できる。また、ｇ_ｋはグループ内に属する素性関数の総和なので、事前に容易に計算できる。 This relationship holds. Here, the plurality of system parameters w ^ are weights of the plurality of feature extraction functions f _d (x ^, y ^) for the input data x ^ and the output data y ^. That is, the original linear function Σ ^N _{d = 1} w _d f _d (x ^, y ^) can be converted equivalently to the new linear function Σ ^K _{k = 1} v _k g _k (x ^, y ^). However, since K is the number of groups, it can be assumed that the number is much smaller than N. Moreover, since g _k is the sum of the feature functions belonging to the group, it can be easily calculated in advance.

以上のように、重複パラメタ圧縮部６０は、各グループについて、当該グループを構成するシステムパラメタの各々を、当該グループに付与された当該インデックス番号のシステムパラメタに変換し、当該インデックス番号のシステムパラメタを保存すると共に、上記の（２０）式に示すように、スコア関数を定義する。 As described above, for each group, the duplicate parameter compression unit 60 converts each system parameter constituting the group into the system parameter of the index number assigned to the group, and converts the system parameter of the index number to In addition to saving, a score function is defined as shown in the above equation (20).

よって、システムパラメタ更新部５２により得られた重複が多くあるシステムパラメタの集合から、重複部分を融合し、等価だが無駄のない形式に圧縮することで、大幅にモデルサイズを削減することができる。また追加の処理として、ｗ_ｄ＝０になる場合、ｆ_ｄは出力の選択に何も寄与しないので、素性関数の定義そのものを削除することが可能である。この処理もここで合わせて行う。 Therefore, the model size can be greatly reduced by merging overlapping portions from a set of system parameters obtained by the system parameter updating unit 52 and having many overlapping portions and compressing them into an equivalent but useless format. As an additional process, when w _d = 0, f _d contributes nothing to the output selection, so that the definition of the feature function itself can be deleted. This processing is also performed here.

システムパラメタ記憶部９０には、重複パラメタ圧縮部６０によって圧縮されたシステムパラメタｖ_ｋと、新たに定義されたスコア関数Φ（ｘ＾，ｙ＾；ｗ＾）とが格納される。 The system parameter storage unit 90 stores the system parameter v _k compressed by the duplicate parameter compression unit 60 and the newly defined score function Φ (x ^, y ^; w ^).

＜情報処理装置のシステム構成＞
前述のシステムパラメタ学習装置１００で得られた圧縮されたシステムパラメタを用いて、情報処理装置２００によって、未知の入力データに対して所定の情報処理を行う。システムパラメタの圧縮を行う場合と、仮にシステムパラメタの圧縮をしなかった場合とで、処理結果は完全に一致する。よって、圧縮を行うことによって、モデルのサイズ（システムパラメタ自体のサイズ）を大幅に削減できる分、実行時に必要とされるリソースを削減できるというメリットだけを享受することができる。 <System configuration of information processing apparatus>
The information processing apparatus 200 performs predetermined information processing on unknown input data using the compressed system parameters obtained by the system parameter learning apparatus 100 described above. The processing results are completely the same when the system parameters are compressed and when the system parameters are not compressed. Therefore, by performing compression, it is possible to enjoy only the merit that the resources required at the time of execution can be reduced by the amount that the size of the model (the size of the system parameter itself) can be greatly reduced.

図９は、本発明の実施の形態に係る情報処理装置２００を示すブロック図である。この情報処理装置２００は、ＣＰＵと、ＲＡＭと、後述する情報処理ルーチンを実行するためのプログラムを記憶したＲＯＭと、を備えたコンピュータで構成され、機能的には次に示すように構成されている。 FIG. 9 is a block diagram showing an information processing apparatus 200 according to the embodiment of the present invention. The information processing apparatus 200 is configured by a computer including a CPU, a RAM, and a ROM that stores a program for executing an information processing routine to be described later, and is functionally configured as follows. Yes.

本実施の形態に係る情報処理装置２００は、図９に示すように、入力部２１０と、システムパラメタ記憶部２２０と、情報処理部２３０と、出力部２４０とを備えている。 As illustrated in FIG. 9, the information processing apparatus 200 according to the present embodiment includes an input unit 210, a system parameter storage unit 220, an information processing unit 230, and an output unit 240.

入力部２１０は、入力データｘ＾を受け付ける。 The input unit 210 receives input data x ^.

システムパラメタ記憶部２２０には、上記システムパラメタ学習装置１００によって圧縮されたシステムパラメタｖ_ｋと、新たに定義されたスコア関数ｇ_ｋとが格納される。 The system parameter storage unit 220 stores a system parameter v _k compressed by the system parameter learning device 100 and a newly defined score function g _k .

情報処理部２３０は、システムパラメタ記憶部２２０に格納された、インデックス番号ｋのシステムパラメタｖ_ｋと、新たに定義されたスコア関数Φ（ｘ＾，ｙ＾；ｗ＾）とに基づいて、入力部２１０により受け付けた入力データｘ＾に対して、所定の情報処理を行う。具体的には、情報処理部２３０は、入力部２１０により受け付けた入力データｘ＾と、システムパラメタ記憶部２２０に格納された、インデックス番号ｋのシステムパラメタｖ_ｋと、新たに定義されたスコア関数Φ（ｘ＾，ｙ＾；ｗ＾）とに基づいて、所定の最適化手法を用いて、スコア関数Φ（ｘ＾，ｙ＾；ｗ＾）が最大となる出力データｙ＾を算出する。 The information processing unit 230 is input based on the system parameter v _k of the index number k stored in the system parameter storage unit 220 and the newly defined score function Φ (x ^, y ^; w ^). Predetermined information processing is performed on the input data x ^ received by the unit 210. Specifically, the information processing unit 230, the input data x ^ accepted by the input unit 210, the system parameter stored in the storage unit 220, and the system parameters v _k of index number k, newly defined score function Based on Φ (x ^, y ^; w ^), output data y ^ having the maximum score function Φ (x ^, y ^; w ^) is calculated using a predetermined optimization method.

出力部２４０は、情報処理部２３０によって算出された出力データｙ＾を結果として出力する。
＜システムパラメタ学習装置の作用＞
次に、本実施の形態に係るシステムパラメタ学習装置１００の作用について説明する。まず、教師データと、パラメタρと、実数値の個数を表すパラメタζとが、システムパラメタ学習装置１００に入力されると、システムパラメタ学習装置１００によって、入力された教師データが、教師データベース３０へ格納される。 The output unit 240 outputs the output data y ^ calculated by the information processing unit 230 as a result.
<Operation of system parameter learning device>
Next, the operation of the system parameter learning device 100 according to the present embodiment will be described. First, when teacher data, a parameter ρ, and a parameter ζ indicating the number of real values are input to the system parameter learning device 100, the input teacher data is input to the teacher database 30 by the system parameter learning device 100. Stored.

そして、システムパラメタ学習装置１００によって、図１０に示すシステムパラメタ学習処理ルーチンが実行される。 Then, the system parameter learning apparatus 100 executes the system parameter learning process routine shown in FIG.

まず、ステップＳ１００において、初期化部５０によって、最適化に用いる３種類の最適化変数ｗ＾、ｕ＾、α＾を全て０にセットし、初期化する。 First, in step S100, the initialization unit 50 sets all three types of optimization variables w ^, u ^, α ^ used for optimization to 0 and initializes them.

次に、ステップＳ１０２において、初期化部５０によって、繰り返し回数を管理する変数ｔを、ｔ＝０と設定し、初期化する。 In step S102, the initialization unit 50 initializes the variable t for managing the number of repetitions by setting t = 0.

ステップＳ１０４において、システムパラメタ更新部５２によって、上記ステップＳ１００で初期化された３種類のパラメタｗ＾、ｕ＾^（ｔ）、α＾^（ｔ）、又は後述する各ステップで前回更新された３種類のパラメタｗ＾、ｕ＾^（ｔ）、α＾^（ｔ）を用いて、上記（１１）式に従って、システムパラメタｗ＾^（ｔ）からｗ＾^{（ｔ＋１）}へ更新する。 In step S104, the three types of parameters w ^, u ^ ^(t) , α ^ ^(t) initialized in step S100 by the system parameter update unit 52, or the three types last updated in each step described later. The system parameters w ^ ^(t) to w ^ ^{(t + 1)} are updated according to the above equation (11) using the parameters w ^, u ^ ^(t) and α ^ ^(t) .

ステップＳ１０６において、補助パラメタ更新部５４によって、上記ステップＳ１０４で更新されたｗ＾^{（ｔ＋１）}と、上記ステップＳ１００で初期化されたパラメタα＾^（ｔ）、又は後述するステップＳ１０８で前回更新されたパラメタα＾^（ｔ）とを用いて、上記（１４）式に従って、ｕ＾を計算し、計算されたｕ＾から最も近い超平面上の点を見つけて、補助パラメタｕ＾^（ｔ）からｕ＾^{（ｔ＋１）}へ更新する。 In step S106, the auxiliary parameter update unit 54 updates w ^ ^{(t + 1)} updated in step S104, the parameter α ^ ^(t) initialized in step S100, or updated last time in step S108 described later. Using the parameter α ^ ^(t) , u ^ is calculated according to the above equation (14), a point on the hyperplane closest to the calculated u ^ is found, and from the auxiliary parameter u ^ ^(t) to u ^ Update to ^{(t + 1)} .

ステップＳ１０８において、未定乗数更新部５６によって、上記ステップＳ１０４で更新されたｗ＾^{（ｔ＋１）}と、上記ステップＳ１０６で更新されたｕ＾^{（ｔ＋１）}と、上記ステップＳ１００で初期化されたパラメタα＾^（ｔ）、又は本ステップＳ１０８で前回更新されたパラメタα＾^（ｔ）とを用いて、上記（１６）式に従って、ラグランジュ未定乗数α＾^（ｔ）からα＾^{（ｔ＋１）}へ更新する。 In step S108, the undetermined multiplier updating unit 56 updates w ^ ^{(t + 1)} updated in step S104, u ^ ^{(t + 1)} updated in step S106, and the parameter α ^ initialized in step S100. ^(T) or using the parameter α ^ ^(t) updated last time in step S108, the Lagrange undetermined multiplier α ^ ^(t ⁾ is updated to α ^ ^{(t + 1)} according to the above equation (16).

ステップＳ１１０において、収束判定部５８によって、上記ステップＳ１０４で更新されたｗ＾^{（ｔ＋１）}と、前回更新されたｗ＾^（ｔ）と、上記ステップＳ１０６で更新されたｕ＾^{（ｔ＋１）}と、前回更新されたｕ＾^（ｔ）とに基づいて、上記（１７）式、及び上記（１８）式に示す収束条件を満たしているか否かを判定する。そして、収束していないと判定された場合には、ステップＳ１１２で繰り返しを管理する変数ｔをインクリメントして、ステップＳ１０４へ移行し、上記ステップＳ１０４〜ステップＳ１０８の各処理を繰り返す。収束したと判定された場合には、ステップＳ１１４へ移行する。 In step S110, the convergence determination unit 58 uses the previous updated w ^ ^{(t + 1)} , the previous updated w ^ ^(t) , the updated u ^ ^{(t + 1) in} step S106, and the previous time. Based on the updated u ^ ^(t) , it is determined whether or not the convergence condition shown in the above equation (17) and the above equation (18) is satisfied. If it is determined that it has not converged, the variable t for managing repetition is incremented in step S112, the process proceeds to step S104, and the processes in steps S104 to S108 are repeated. When it determines with having converged, it transfers to step S114.

ステップＳ１１４において、重複パラメタ圧縮部６０によって、上記ステップＳ１０４の更新処理によって最終的に得られたシステムパラメタｗ＾の値に基づいて、値が同一となるシステムパラメタでグループを構成し、各グループについて、グループを構成するシステムパラメタの各々を、当該グループに付与された当該インデックス番号のシステムパラメタに変換し、当該インデックス番号のシステムパラメタをシステムパラメタ記憶部９０に保存すると共に、上記の（２０）式に示すように、新たなスコア関数を定義して、システムパラメタ記憶部９０に保存して、システムパラメタ学習処理ルーチンを終了する。 In step S114, the duplicate parameter compression unit 60 forms a group with system parameters having the same value based on the value of the system parameter w ^ finally obtained by the update process in step S104. Each of the system parameters constituting the group is converted into a system parameter of the index number assigned to the group, the system parameter of the index number is stored in the system parameter storage unit 90, and the above equation (20) As shown in FIG. 5, a new score function is defined and stored in the system parameter storage unit 90, and the system parameter learning processing routine is terminated.

＜情報処理装置の作用＞
次に、本実施の形態に係る情報処理装置２００の作用について説明する。まず、システムパラメタ学習装置１００のシステムパラメタ記憶部９０に記憶されているシステムパラメタｖ_ｋの各々と新たに定義されたスコア関数Φ（ｘ＾，ｙ＾；ｗ＾）とが、情報処理装置２００に入力されると、システムパラメタ記憶部９０に格納される。そして、対象としての入力データｘ＾が情報処理装置２００に入力されると、情報処理装置２００によって、図１１に示す実行処理ルーチンが実行される。 <Operation of information processing device>
Next, the operation of the information processing apparatus 200 according to this embodiment will be described. First, each of the system parameters v _k stored in the system parameter storage unit 90 of the system parameter learning device 100 and the newly defined score function Φ (x ^, y ^; w ^) Is stored in the system parameter storage unit 90. When the target input data x ^ is input to the information processing apparatus 200, the information processing apparatus 200 executes an execution processing routine shown in FIG.

まず、ステップＳ２００において、入力部２１０によって、入力データｘ＾を受け付ける。 First, in step S200, the input unit 210 accepts input data x ^.

次に、ステップＳ２０２において、情報処理部２３０によって、システムパラメタ記憶部２２０に記憶されたシステムパラメタｖ_ｋの各々と、新たに定義されたスコア関数Φ（ｘ＾，ｙ＾；ｗ＾）とを読み込む。 Next, in step S202, each of the system parameters v _k stored in the system parameter storage unit 220 and the newly defined score function Φ (x ^, y ^; w ^) are processed by the information processing unit 230. Read.

ステップＳ２０４において、情報処理部２３０によって、上記ステップＳ２００で受け付けた入力データｘ＾と、上記ステップＳ２０２で読み込んだシステムパラメタｖ_ｋの各々及びスコア関数Φ（ｘ＾，ｙ＾；ｗ＾）とに基づいて、所定の最適化手法を用いて、スコア関数Φ（ｘ＾，ｙ＾；ｗ＾）が最大となる出力データｙ＾を算出する。 In step S204, the information processing unit 230, the input data x ^ accepted in step S200, respectively, and score function Φ (x ^, y ^; w ^) of the system parameters _{v k} read in step S202 and the Based on this, output data y ^ having the maximum score function Φ (x ^, y ^; w ^) is calculated using a predetermined optimization method.

ステップＳ２０６において、出力部２４０によって、上記ステップＳ２０４で算出された出力データｙ＾を結果として出力して、情報処理ルーチンを終了する。 In step S206, the output unit 240 outputs the output data y ^ calculated in step S204 as a result, and the information processing routine ends.

＜実験結果＞
次に、本実施の形態の実験結果を示す。実際にテキスト処理の問題では、識別する際に入力と出力を特徴付ける、いわゆる素性として単語等の離散シンボル的なものを扱うため、全体の素性数が数百万から、数十億程度まで利用されることが往々にしてあり得る。図１２に、自然言語処理の構文解析と固有表現抽出問題において、本実施の形態で説明した手法を用いた際の効果を示す。 <Experimental result>
Next, experimental results of the present embodiment are shown. In fact, in the text processing problem, since the discrete features such as words are used as so-called features that characterize the input and output at the time of identification, the total number of features is used from several million to several billions. It can often happen. FIG. 12 shows the effect of using the method described in this embodiment in the syntax analysis of natural language processing and the specific expression extraction problem.

図１２中の横軸は、グループの数を表し、縦軸は、システムの解析精度を表している。この図からもわかるように、本実施の形態の方法を用いると、グループの数を２から８といった非常に少ない数で従来と同等の解析精度を得ることができる。 The horizontal axis in FIG. 12 represents the number of groups, and the vertical axis represents the analysis accuracy of the system. As can be seen from this figure, when the method of the present embodiment is used, an analysis accuracy equivalent to the conventional one can be obtained with a very small number of groups, such as 2 to 8.

次に、この実験データにおけるモデル学習の一回あたりの試行でおよそ６時間かかる。ここでの結果を得るために人手で調整した定義はおおよそ２０回程度に及んだ。一方、本実施の形態の方法を用いると、特に試行錯誤をせずに、一回で同じ結果を出すことが可能である。これは単純計算で、モデル作成コストがおよそ２０分の１にできたことと等価である。 Next, it takes about 6 hours for each trial of model learning in this experimental data. The definition manually adjusted to obtain the results here is about 20 times. On the other hand, when the method of the present embodiment is used, the same result can be obtained at one time without particularly trial and error. This is a simple calculation and is equivalent to the fact that the model creation cost can be reduced to about 1/20.

以上説明したように、本発明の実施の形態に係るシステムパラメタ学習装置によれば、複数の入力データの各々と複数の入力データに対する複数の正解出力データの各々とのペアである教師データを受け付け、受け付けた教師データと、スコア関数とに基づいて、複数のシステムパラメタの各々の値が、個数が予め定められた実数値ｖ_iと実数値−ｖ_iと０とからなる離散値の集合に含まれる制約を満たし、かつ、最適化された複数のシステムパラメタを学習することによって、自動で適切な高圧縮モデルを獲得することができる。 As described above, the system parameter learning device according to the embodiment of the present invention accepts teacher data that is a pair of each of a plurality of input data and a plurality of correct output data for the plurality of input data. , and teacher data received, based on the score function, the set of values of each of the plurality of system parameters is the number consists of predetermined and real value v _i and a real value -v _i 0 Metropolitan discrete values By learning a plurality of system parameters that satisfy the constraints included and are optimized, an appropriate high compression model can be obtained automatically.

また、本発明の実施の形態に係る情報処理装置によれば、システムパラメタ学習装置によって学習された、圧縮されたシステムパラメタに基づいて、入力データに対して、所定の情報処理を行って出力データを出力することにより、モデルのサイズ（システムパラメタ自体のサイズ）を大幅に削減できる分、実行時に必要とされるリソースを削減できる。 Further, according to the information processing apparatus according to the embodiment of the present invention, the input data is subjected to predetermined information processing based on the compressed system parameter learned by the system parameter learning apparatus, and the output data By outputting, resources required for execution can be reduced as much as the size of the model (the size of the system parameter itself) can be greatly reduced.

また、従来とほぼ同等の精度を保ちつつシステムパラメタ自体のサイズを大幅に削減することが可能となる。具体的には、システムパラメタ数１億のモデルの場合は、１００，０００，０００ｘ８ｂｙｔｅ＝８００，０００，０００ｂｙｔｅであるので、８００ＭＢの容量を必要とする。しかし、本実施の形態を用いることで、システムパラメタ値の種類数を、例えば、８程度といった非常に小さい値にまで削減できる。この場合、８×８ｂｙｔｅ＝６４ｂｙｔｅであり、計算上は、約１２５０万分の１程度に圧縮できたことになる。 In addition, it is possible to greatly reduce the size of the system parameters themselves while maintaining almost the same accuracy as before. Specifically, in the case of a model with 100 million system parameters, since 100,000,000 × 8 bytes = 800,000,000 bytes, a capacity of 800 MB is required. However, by using this embodiment, the number of types of system parameter values can be reduced to a very small value such as about 8. In this case, 8 × 8 bytes = 64 bytes, and in terms of calculation, the compression can be performed to about 12.5 million.

また、従来技術で示した方法との違いとしては、従来技術では、 In addition, as a difference from the method shown in the prior art, in the prior art,

中に現れる集合 Set that appears in

の定義 Definition of

を人手により調整して得る。この時、試行錯誤により、よい定義を発見しなくてはいけないため、一般的には、数十回の試行を逐次または並列実行し、その中で最も開発セットの結果がよいものを選択する方法を用いる。一方、本実施の形態においては、人手による調整が不要であり、かつ、たった一回の試行でほぼ確実に従来技術と同レベルの精度と圧縮率を保ったモデルを獲得することが可能となる。これにより、モデル構築および選択の速度とコストが従来に比べて数から数十分の一程度に削減できるため、実用上非常に価値の高い効果を与えることができる。 Is adjusted manually. At this time, since a good definition must be found by trial and error, in general, a method in which dozens of trials are executed sequentially or in parallel, and the best development set result is selected. Is used. On the other hand, in the present embodiment, manual adjustment is not necessary, and it is possible to obtain a model that maintains the same level of accuracy and compression rate as the prior art with almost one trial. . As a result, the speed and cost of model construction and selection can be reduced to a few to a few tenths as compared with the conventional case, so that it is possible to provide a practically very valuable effect.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、システムパラメタ記憶部９０、２２０及び教師データベース３０は、外部に設けられ、システムパラメタ学習装置１００及び情報処理装置２００とネットワークで接続されていてもよい。 For example, the system parameter storage units 90 and 220 and the teacher database 30 may be provided outside and connected to the system parameter learning device 100 and the information processing device 200 via a network.

また、上記実施の形態では、システムパラメタ学習装置１００と情報処理装置２００とを別々の装置として構成する場合を例に説明したが、システムパラメタ学習装置１００と情報処理装置２００とを１つの装置として構成してもよい。 Moreover, although the case where the system parameter learning device 100 and the information processing device 200 are configured as separate devices has been described as an example in the above embodiment, the system parameter learning device 100 and the information processing device 200 are configured as one device. It may be configured.

上述のシステムパラメタ学習装置１００及び情報処理装置２００は、内部にコンピュータシステムを有しているが、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 The system parameter learning device 100 and the information processing device 200 described above have a computer system inside, but if the “computer system” uses a WWW system, a homepage providing environment (or display environment) ).

例えば、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 For example, in the present specification, the embodiment has been described in which the program is installed in advance. However, the program may be provided by being stored in a computer-readable recording medium.

１０教師データ入力部
２０演算部
３０教師データベース
４０学習部
５０初期化部
５２システムパラメタ更新部
５４補助パラメタ更新部
５６未定乗数更新部
５８収束判定部
６０重複パラメタ圧縮部
９０システムパラメタ記憶部
１００システムパラメタ学習装置
２００情報処理装置
２１０入力部
２２０システムパラメタ記憶部
２３０情報処理部
２４０出力部 DESCRIPTION OF SYMBOLS 10 Teacher data input part 20 Operation part 30 Teacher database 40 Learning part 50 Initialization part 52 System parameter update part 54 Auxiliary parameter update part 56 Undecided multiplier update part 58 Convergence determination part 60 Duplicate parameter compression part 90 System parameter storage part 100 System parameter Learning device 200 Information processing device 210 Input unit 220 System parameter storage unit 230 Information processing unit 240 Output unit

Claims

Learning a plurality of system parameters set in an information processing system that performs predetermined information processing on input data and outputs the output data using a score function for determining the score of the output data with respect to the input data A system parameter learning device,
A teacher data input unit that receives teacher data that is a pair of each of a plurality of input data and each of a plurality of correct output data for the plurality of input data;
Based on the teacher data received by the teacher data input unit and the score function, each value of the plurality of system parameters is a predetermined number of real values v _i (i = 1,..., Ζ). the real value _{-v i (i = 1, ···} , ζ) and satisfies the constraints in the set of discrete values of zeros Prefecture, and a learning unit that learns a plurality of system parameters that are optimized A system parameter learning device.

The system parameter learning device according to claim 1, wherein the learning unit learns the optimized N system parameters w ^ according to the following expression (1).
However,
Is the teacher data
The output data obtained when the predetermined information processing is performed on the input data x _i using the system parameter w ^ is the teacher data
Is a risk function that returns a value indicating how wrong the correct output data y _i is, and Ω (w ^) is a regularization term,
Is a Cartesian product set of the discrete values for each of the N system parameters.

3. The system parameter learning device according to claim 2, wherein the learning unit learns the optimized N system parameters w ^ according to the following equation (2).
Here, α ^ is a Lagrange undetermined multiplier, ρ is a predetermined tuning parameter, and u ^ is an auxiliary parameter.

Based on the plurality of system parameters learned by the learning unit, configure a group with system parameters having the same value, assign an index number to each group, and for each group, a system parameter that configures the group The system parameter learning according to claim 1, further comprising a duplicate parameter compression unit that converts each into a system parameter of the index number assigned to the group and stores the system parameter of the index number. apparatus.

The plurality of system parameters are weights of a plurality of feature extraction functions f _d (x ^, y ^) for input data x ^ and output data y ^,
The duplicate parameter compression unit converts, for each group, each system parameter constituting the group into a system parameter of the index number assigned to the group, stores the system parameter of the index number, and The system parameter learning apparatus according to claim 4, wherein the score function is defined and stored as shown in equation (3).
However, Φ (x ^, y ^ ; w ^) is a score function, v _k is a system parameter of the index number k, which is granted to the group.

An input unit for receiving input data;
Based on the score function and the system parameter of the index number of each group stored by the system parameter learning device according to claim 4 or 5, the predetermined information for the input data received in the input unit An information processing unit that performs processing and outputs output data;
Including an information processing apparatus.

An information processing system that includes a teacher data input unit and a learning unit, performs a predetermined information process on input data using a score function for determining a score of the output data with respect to the input data, and outputs the output data A system parameter learning method in a system parameter learning device for learning a plurality of system parameters set in
The teacher data input unit accepts teacher data that is a pair of each of a plurality of input data and a plurality of correct output data for the plurality of input data;
Based on the teacher data received by the learning data input unit and the score function, the learning unit sets each of the plurality of system parameters to a predetermined number of real values v _i (i = 1,. .., Ζ) and real values −v _i (i = 1,..., Ζ) and 0 satisfy the constraints included in the set of discrete values and optimize the plurality of system parameters. Learning System parameter learning method.

The program for functioning a computer as each part which comprises the system parameter learning apparatus of any one of Claims 1-5, or the information processing apparatus of Claim 6.