JP7021010B2

JP7021010B2 - Machine learning system

Info

Publication number: JP7021010B2
Application number: JP2018108903A
Authority: JP
Inventors: 保静松岡
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2018-06-06
Filing date: 2018-06-06
Publication date: 2022-02-16
Anticipated expiration: 2038-06-06
Also published as: JP2019212121A

Description

本発明の一側面は機械学習システムに関する。 One aspect of the present invention relates to a machine learning system.

従来から、ニューラルネットワークを用いる機械学習を高速に実行する試みが為されている。例えば特許文献１には、多項式ニューラルネットワークにより二次関数を学習し、二次関数の主要成分を保存する部分空間を選択することにより、特徴空間の次元を削減する手法が記載されている。この手法は、固有ベクトルと係数ベクトルとの中から主成分となる１つ以上のベクトルを選択し、選択されたベクトルによって生成される部分空間を新たな特徴空間として生成するステップを含む。 Conventionally, attempts have been made to execute machine learning using a neural network at high speed. For example, Patent Document 1 describes a method of reducing the dimension of a feature space by learning a quadratic function by a polynomial neural network and selecting a subspace for storing a main component of the quadratic function. This technique involves selecting one or more vectors that are principal components from an eigenvector and a coefficient vector and generating a subspace generated by the selected vector as a new feature space.

特開２０１０－３９７７８号公報Japanese Unexamined Patent Publication No. 2010-39778

ニューラルネットワークの出力層の次元数が大きくなると、出力層のベクトルを得るための計算量が膨大になり、出力層での行列演算が機械学習の速度に大きな影響を及ぼし得る。そこで、出力層の次元数が大きい場合にも機械学習を高速に実行することが望まれている。 As the number of dimensions of the output layer of the neural network increases, the amount of calculation for obtaining the vector of the output layer becomes enormous, and the matrix operation in the output layer can greatly affect the speed of machine learning. Therefore, it is desired to execute machine learning at high speed even when the number of dimensions of the output layer is large.

本発明の一側面に係る機械学習システムは、ニューラルネットワークの中間層で得られた中間ベクトルと変換行列Ａとを用いて、ニューラルネットワークの出力層のベクトルである出力ベクトルを算出し、該出力ベクトルに基づいて事象を予測する予測部を備え、予測部が、変換行列Ａを特異値分解することで得られる行列ＵΣと行列Ｖとを取得し、ここで、行列Ｕおよび行列Ｖは直交行列であり、行列Σは対角行列であり、中間ベクトルと行列Ｖとに基づいて一時ベクトルを算出し、行列ＵΣおよび一時ベクトルのそれぞれの分割位置を示す値ｋを用いて、行列ＵΣの１列目からｋ列目を用いて定義される前行列と、一時ベクトルの１個目からｋ個目までの要素を用いて定義される前ベクトルとを取得し、前行列および前ベクトルに基づいて近似ベクトルを算出し、近似ベクトルを出力ベクトルとして設定する。 The machine learning system according to one aspect of the present invention calculates an output vector, which is a vector of the output layer of the neural network, by using the intermediate vector obtained in the intermediate layer of the neural network and the transformation matrix A, and the output vector. It is provided with a prediction unit that predicts an event based on Yes, the matrix Σ is a diagonal matrix, a temporary vector is calculated based on the intermediate vector and the matrix V, and the first column of the matrix UΣ is used by using the value k indicating the division position of each of the matrix UΣ and the temporary vector. The prematrix defined using the kth column from to and the prematrix defined using the first to kth elements of the temporary vector are obtained, and the approximation vector is obtained based on the prematrix and the prematrix. Is calculated, and the approximation vector is set as the output vector.

このような側面においては、中間ベクトルから出力ベクトルを得るための変換行列Ａに対して、Ａ＝ＵΣＶという特異値分解が実行される。そして、行列ＵΣの全部を使わずに、その行列ＵΣの一部（前行列）を用いて近似ベクトルが得られる。この近似ベクトルは出力層のベクトルの近似値といえる。この近似ベクトルを出力ベクトルと見なすことで、行列ＵΣの全体（すなわち、変換行列Ａそのもの）を用いる場合よりも少ない計算量で出力ベクトルが得られるので、機械学習を高速に実行することができる。 In such an aspect, the singular value decomposition of A = UΣV is executed for the transformation matrix A for obtaining the output vector from the intermediate vector. Then, an approximate vector can be obtained by using a part of the matrix UΣ (prematrix) without using the entire matrix UΣ. This approximation vector can be said to be an approximation of the vector of the output layer. By considering this approximation vector as an output vector, the output vector can be obtained with a smaller amount of calculation than when the entire matrix UΣ (that is, the transformation matrix A itself) is used, so that machine learning can be executed at high speed.

本発明の一側面によれば、ニューラルネットワークの出力層の次元数が大きい場合にも機械学習を高速に実行することができる。 According to one aspect of the present invention, machine learning can be executed at high speed even when the number of dimensions of the output layer of the neural network is large.

実施形態に係る機械学習システムの機能構成の一例を示す図である。It is a figure which shows an example of the functional structure of the machine learning system which concerns on embodiment. 実施形態に係る機械学習システムで用いられるニューラルネットワークの一例を示す図である。It is a figure which shows an example of the neural network used in the machine learning system which concerns on embodiment. 出力ベクトルを得るための従来の計算方法を示す図である。It is a figure which shows the conventional calculation method for obtaining an output vector. 出力ベクトルを得るための本実施形態での計算方法を示す図である。It is a figure which shows the calculation method in this embodiment for obtaining an output vector. 出力ベクトルを得るための本実施形態での計算方法を示す図である。It is a figure which shows the calculation method in this embodiment for obtaining an output vector. 実施形態に係る機械学習システムの動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the machine learning system which concerns on embodiment. 実施形態に係る機械学習システムの動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the machine learning system which concerns on embodiment. 実施形態に係る機械学習システムの応用例を示す図である。It is a figure which shows the application example of the machine learning system which concerns on embodiment. 実施形態に係る機械学習システムに用いられるコンピュータのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware composition of the computer used for the machine learning system which concerns on embodiment.

以下、添付図面を参照しながら本発明の実施形態を詳細に説明する。なお、図面の説明において同一または同等の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the description of the drawings, the same or equivalent elements are designated by the same reference numerals, and duplicate description will be omitted.

実施形態に係る機械学習システム１０は、任意の事象を予測するコンピュータシステムである。機械学習とは、与えられた情報に基づいて反復的に学習することで、法則またはルールを自律的に見つけ出す手法である。機械学習システム１０は、ニューラルネットワークを用いた機械学習を実行することで事象を予測する。ニューラルネットワークとは、人間の脳神経系の仕組みを模した情報処理のモデルである。 The machine learning system 10 according to the embodiment is a computer system that predicts an arbitrary event. Machine learning is a method of autonomously finding a law or rule by iteratively learning based on given information. The machine learning system 10 predicts an event by executing machine learning using a neural network. A neural network is a model of information processing that imitates the mechanism of the human cranial nerve system.

事象とは、観察可能なかたちをとって現れる事柄であり、言い換えると、任意の形式のデータで表現可能な事柄である。機械学習システム１０が予測する事象は何ら限定されず、したがって、機械学習システム１０は任意の目的で用いられてもよい。機械学習システム１０は現実世界における事象を予測してもよいし、仮想世界における事象を予測してもよい。機械学習システム１０は、処理対象のデータがどの分類に属するかを決める分類問題（識別問題）に用いられてもよいし、処理対象のデータから未知のデータ（新たなデータ）を予測する回帰問題に用いられてもよい。機械学習システム１０は、自然言語処理、画像処理（画像認識）、音声処理（音声認識）、データ予測などの様々な情報処理に用いられ得る。例えば、機械学習システム１０は、機械翻訳、自動対話、光学的文字認識（ＯＣＲ）、自動運転、医療診断、不正検知、顔検出、商品レコメンド、顧客分析、金融取引などの様々な技術分野に応用することができる。したがって、機械学習の処理結果として出力される予測結果（予測される事象）は様々であり、例えば、翻訳文、音声または画像から認識されたテキスト、運転の制御に関する指示、診断結果、検知された不正、レコメンドされた商品、金融取引に関する指示などであり得る。 An event is a thing that appears in an observable form, in other words, a thing that can be expressed by data in any format. The events predicted by the machine learning system 10 are not limited in any way, and therefore the machine learning system 10 may be used for any purpose. The machine learning system 10 may predict an event in the real world or may predict an event in the virtual world. The machine learning system 10 may be used for a classification problem (discrimination problem) for determining which classification the data to be processed belongs to, or a regression problem for predicting unknown data (new data) from the data to be processed. May be used for. The machine learning system 10 can be used for various information processing such as natural language processing, image processing (image recognition), voice processing (voice recognition), and data prediction. For example, the machine learning system 10 is applied to various technical fields such as machine translation, automatic dialogue, optical character recognition (OCR), automatic driving, medical diagnosis, fraud detection, face detection, product recommendation, customer analysis, and financial transactions. can do. Therefore, there are various prediction results (predicted events) output as machine learning processing results, such as translations, texts recognized from voice or images, driving control instructions, diagnostic results, and detections. It can be fraudulent, recommended merchandise, financial transaction instructions, etc.

機械学習システム１０は１台のコンピュータで構成されてもよいし、複数台のコンピュータで構成されてもよい。複数台のコンピュータを用いる場合には、これらのコンピュータがインターネット、イントラネット等の通信ネットワークを介して接続されることで、論理的に一つの機械学習システム１０が構築される。 The machine learning system 10 may be composed of one computer or a plurality of computers. When a plurality of computers are used, one machine learning system 10 is logically constructed by connecting these computers via a communication network such as the Internet or an intranet.

図１は機械学習システム１０の機能構成の一例を示す図である。図１に示すように、機械学習システム１０は機能要素として予測部１１を備える。 FIG. 1 is a diagram showing an example of the functional configuration of the machine learning system 10. As shown in FIG. 1, the machine learning system 10 includes a prediction unit 11 as a functional element.

予測部１１はニューラルネットワークを用いて事象を予測する機能要素である。予測部１１は処理対象の入力データを取得し、ニューラルネットワークにその入力データを入力して機械学習を実行することで、出力データ（処理結果）を得る。 The prediction unit 11 is a functional element that predicts an event using a neural network. The prediction unit 11 acquires the input data to be processed, inputs the input data to the neural network, executes machine learning, and obtains output data (processing result).

入力データの取得方法は限定されない。例えば、予測部１１は任意のデータベースに記憶されているデータを入力データとして読み出してもよいし、別のコンピュータシステムから送信されてきたデータを入力データとして受信してもよい。あるいは、予測部１１は機械学習システム１０内の別の機能要素（図示せず）で処理されたデータを入力データとして取得してもよい。 The method of acquiring the input data is not limited. For example, the prediction unit 11 may read data stored in an arbitrary database as input data, or may receive data transmitted from another computer system as input data. Alternatively, the prediction unit 11 may acquire data processed by another functional element (not shown) in the machine learning system 10 as input data.

出力データ（処理結果）の処理方法も限定されない。例えば、予測部１１は出力データを、モニタ上に表示してもよいし、任意のデータベースに格納してもよいし、別のコンピュータシステムに向けて送信してもよい。あるいは、機械学習システム１０内の別の機能要素（図示せず）がその出力データをさらに処理してもよい。 The processing method of output data (processing result) is also not limited. For example, the prediction unit 11 may display the output data on a monitor, store it in an arbitrary database, or transmit it to another computer system. Alternatively, another functional element (not shown) within the machine learning system 10 may further process the output data.

本実施形態では、予測部１１は学習済みのニューラルネットワーク（いわゆる、学習済みモデル）を用いることを前提とする。学習済みモデルは、最も予測精度が高いと推定される最良のニューラルネットワークであるといえる。ただし、学習済みモデルが“現実に最良である”とは限らないことに留意されたい。一般に、学習済みモデルを生成するために、１以上の訓練サンプルを含む訓練データセット（コーパス）が用意される。学習済みモデルは、学習をさせるニューラルネットワークに個々の訓練サンプルを逐次入力しながら機械学習を実行することで得ることができる。学習済みモデルは任意の従来技術を用いて生成することができる。学習済みモデルは機械学習システム１０で生成されてもよいし、別のコンピュータシステムで生成されてもよい。 In this embodiment, it is premised that the prediction unit 11 uses a trained neural network (so-called trained model). The trained model can be said to be the best neural network estimated to have the highest prediction accuracy. Keep in mind, however, that the trained model is not always the “best in reality”. Generally, a training data set (corpus) containing one or more training samples is prepared to generate a trained model. The trained model can be obtained by performing machine learning while sequentially inputting individual training samples into the neural network to be trained. The trained model can be generated using any prior art. The trained model may be generated by the machine learning system 10 or another computer system.

学習済みモデルは、コンピュータプログラムとパラメータとの組合せであるということができる。あるいは、学習済みモデルは、ニューラルネットワークの構造と該ニューラルネットワークの個々のニューロン間の結びつきの強さであるパラメータ（重み付け係数）との組合せであるということができる。あるいは、学習済みモデルは、一の結果を得る（所定の処理を実行する）ことができるように構成されたコンピュータプログラムであるということができる。 It can be said that the trained model is a combination of a computer program and parameters. Alternatively, the trained model can be said to be a combination of the structure of the neural network and a parameter (weighting coefficient) which is the strength of the connection between the individual neurons of the neural network. Alternatively, the trained model can be said to be a computer program configured to obtain one result (perform a predetermined process).

図２は、機械学習システム１０で用いられるニューラルネットワーク（学習済みモデル）の一例を模式的に示す図である。このニューラルネットワーク１２は予測部１１の一部ということができる。ニューラルネットワーク１２は、入力層である第１層と、中間層（隠れ層）である第２層、第３層、および第４層と、出力層である第５層とで構成される。第１層は、ｐ個のパラメータを要素とする入力ベクトルｘ＝（ｘ_０，ｘ_１，ｘ_２，…ｘ_ｐ）をそのまま第２層に出力する。第２層、第３層、および第４層のそれぞれは、活性化関数により総入力を出力に変換してその出力を次の層に渡す。第５層も活性化関数により総入力を出力に変換し、この出力は、ｑ個のパラメータを要素とするニューラルネットワークの出力ベクトルｙ＝（ｙ_０，ｙ_１，…，ｙ_ｑ）である。各層のノード（要素）の個数は限定されず、例えば、処理しようとするデータの特性と、得ようとするデータの特性とに応じて設定されてよい。 FIG. 2 is a diagram schematically showing an example of a neural network (trained model) used in the machine learning system 10. It can be said that this neural network 12 is a part of the prediction unit 11. The neural network 12 is composed of a first layer which is an input layer, a second layer, a third layer, and a fourth layer which are intermediate layers (hidden layers), and a fifth layer which is an output layer. The first layer outputs the input vector x = (x ₀ , x ₁ , x ₂ , ... X _p ) having p parameters as elements to the second layer as it is. Each of the second layer, the third layer, and the fourth layer converts the total input into an output by the activation function and passes the output to the next layer. The fifth layer also converts the total input into an output by the activation function, and this output is the output vector y = (y ₀ , y ₁ , ..., Y _q ) of the neural network having q parameters as elements. The number of nodes (elements) in each layer is not limited, and may be set according to, for example, the characteristics of the data to be processed and the characteristics of the data to be obtained.

ニューラルネットワーク１２は５層（入力層を除いた場合には４層）であるが、機械学習システム１０（予測部１１）を構成するニューラルネットワークの層の数は何ら限定されない。例えば、機械学習システム１０は３以上の任意の個数の層を有するニューラルネットワークを用いてもよく、これは、１以上の任意の個数の中間層を有するニューラルネットワークを用いてもよいことを意味する。 The neural network 12 has 5 layers (4 layers when the input layer is excluded), but the number of layers of the neural network constituting the machine learning system 10 (prediction unit 11) is not limited at all. For example, the machine learning system 10 may use a neural network having an arbitrary number of layers of 3 or more, which means that a neural network having an arbitrary number of layers of 1 or more may be used. ..

予測部１１の特徴の一つは、最後の中間層の結果を示すベクトルから出力層のベクトル（出力ベクトル）を得るための計算方法にある。ニューラルネットワーク１２では第４層が最後の中間層である。以下では、最後の中間層の結果を示すベクトルを「中間ベクトル」という。予測部１１は、最初から正確な出力ベクトルを求めるのではなく、まずは出力ベクトルの近似値を計算する。そして、予測部１１はその近似値を最終結果として用いるか否かを判定する。近似値を用いると判定した場合には、予測部１１はその近似値を出力ベクトルとして設定する。一方、近似値を採用しないと判定した場合には、予測部１１は正確な出力ベクトルを求める。 One of the features of the prediction unit 11 is a calculation method for obtaining a vector (output vector) of the output layer from a vector showing the result of the last intermediate layer. In the neural network 12, the fourth layer is the last intermediate layer. In the following, the vector showing the result of the last intermediate layer is referred to as "intermediate vector". The prediction unit 11 does not obtain an accurate output vector from the beginning, but first calculates an approximate value of the output vector. Then, the prediction unit 11 determines whether or not to use the approximate value as the final result. When it is determined that the approximate value is used, the prediction unit 11 sets the approximate value as an output vector. On the other hand, when it is determined that the approximate value is not adopted, the prediction unit 11 obtains an accurate output vector.

図３～図５は、予測部１１による出力ベクトルの計算を説明するための図である。図３は従来から行われている計算方法を示す図である。図４および図５は、本実施形態における出力ベクトルの近似値を求める方法を示す図である。 3 to 5 are diagrams for explaining the calculation of the output vector by the prediction unit 11. FIG. 3 is a diagram showing a conventional calculation method. 4 and 5 are diagrams showing a method of obtaining an approximate value of an output vector in the present embodiment.

予測部１１は、ｍ次元の中間ベクトルからｎ次元の出力ベクトルを得るために、ｎ行ｍ列の変換行列（これを「ｎ×ｍの変換行列」という。）を用いる。中間ベクトルをｘとし、出力ベクトルをｙとし、変換行列をＡとすると、図３に示すように、正確な出力ベクトルはｙ＝Ａｘで得られる。出力ベクトルｙを得るためにｎ×ｍの行列演算が行われるので、出力層の次元が大きいと計算量が膨大になる。例えば、中間ベクトルの次元が５００であっても、出力ベクトルの次元が５００００であると、その出力ベクトルを得るために５００×５００００の行列演算が必要になる。分類問題（識別問題）における分類の候補数が大きいなどの理由により出力層の次元が大きくなると、出力層を得るための行列演算がニューラルネットワークの計算において支配的になる傾向がある。 The prediction unit 11 uses a transformation matrix of n rows and m columns (this is referred to as "n × m transformation matrix") in order to obtain an n-dimensional output vector from an m-dimensional intermediate vector. Assuming that the intermediate vector is x, the output vector is y, and the transformation matrix is A, an accurate output vector can be obtained with y = Ax, as shown in FIG. Since a matrix operation of n × m is performed in order to obtain the output vector y, the amount of calculation becomes enormous if the dimension of the output layer is large. For example, even if the dimension of the intermediate vector is 500, if the dimension of the output vector is 50,000, a matrix operation of 500 × 50,000 is required to obtain the output vector. When the dimension of the output layer becomes large due to a large number of classification candidates in the classification problem (discrimination problem), the matrix operation for obtaining the output layer tends to dominate the calculation of the neural network.

本実施形態では、予測部１１はその行列演算を高速に実行するために、行列分解の一手法である特異値分解（ＳＶＤ）を利用する。図４に示すように、予測部１１は変換行列Ａに対して、Ａ＝ＵΣＶで示される特異値分解を実行することで、変換行列Ａを行列ＵΣと行列Ｖとに分解する。行列Ｕおよび行列Ｖはいずれも直交行列である。行列Σは対角行列であり、より具体的には、非対角成分が０であり且つ対角成分（（ｉ，ｉ）要素）が特異値（変換行列Ａの特異値）で構成された行列である。行列ＵΣは行列Ｕと行列Σとの積である。ｎ×ｍの変換行列Ａは、ｎ×ｎの行列Ｕと、ｎ×ｍの行列Σと、ｍ×ｍの行列Ｖとに分解される。ニューラルネットワーク（学習済みモデル）の一部を構成する変換行列Ａは予め与えられるので、予測部１１は予め特異値分解を実行することで、行列ＵΣおよび行列Ｖを取得しておくことができる。 In the present embodiment, the prediction unit 11 uses singular value decomposition (SVD), which is a method of matrix factorization, in order to execute the matrix operation at high speed. As shown in FIG. 4, the prediction unit 11 decomposes the transformation matrix A into the matrix UΣ and the matrix V by executing the singular value decomposition represented by A = UΣV on the transformation matrix A. Both the matrix U and the matrix V are orthogonal matrices. The matrix Σ is a diagonal matrix, and more specifically, the off-diagonal component is 0 and the diagonal component ((i, i) element) is composed of a singular value (singular value of the transformation matrix A). It is a matrix. The matrix UΣ is the product of the matrix U and the matrix Σ. The n × m transformation matrix A is decomposed into an n × n matrix U, an n × m matrix Σ, and an m × m matrix V. Since the transformation matrix A that constitutes a part of the neural network (trained model) is given in advance, the prediction unit 11 can acquire the matrix UΣ and the matrix V by executing the singular value decomposition in advance.

予測部１１は、行列Σの前方の列に重要な要素（計算に影響する要素）が位置し、後方の列に計算にあまり影響を与えない要素が位置するように、行列Σの対角成分を並べる。具体的には、予測部１１は、前方の列に含まれる各対角成分が後方の列の対角成分の最大値と同じかまたは該最大値よりも大きくなるように、行列Σの対角成分を並べる。要するに、予測部１１は、値が大きな対角成分が前方の列に集まるように行列Σの対角成分を並べる。「前方の列」とは行列Σの１列目からｋ列目までのことをいい、「後方の列」とは行列Σの（ｋ＋１）列目から最後列までのことをいう。値ｋは、１以上で、且つ行列Σの列数よりも小さい。例えば、予測部１１は、１列目から最後列に向かって対角成分が降順に並ぶように行列Σを生成してもよい。 In the prediction unit 11, the diagonal component of the matrix Σ is located so that the important element (element that affects the calculation) is located in the front column of the matrix Σ and the element that does not affect the calculation is located in the rear column. Line up. Specifically, the prediction unit 11 determines the diagonal of the matrix Σ so that each diagonal component included in the front column is equal to or larger than the maximum value of the diagonal component in the rear column. Arrange the ingredients. In short, the prediction unit 11 arranges the diagonal components of the matrix Σ so that the diagonal components having large values are gathered in the front row. The "front column" means the first column to the kth column of the matrix Σ, and the "rear column" means the (k + 1) column to the last column of the matrix Σ. The value k is 1 or more and smaller than the number of columns of the determinant Σ. For example, the prediction unit 11 may generate the matrix Σ so that the diagonal components are arranged in descending order from the first column to the last column.

予測部１１は行列ＵΣおよび行列Ｖを用いて出力ベクトルの近似値を求める。図５に示すように、予測部１１は行列ＵΣを前行列Ｌと後行列Ｒとに分割する。前行列Ｌは行列ＵΣの１列目からｋ列目（すなわち、行列ＵΣの前方の列）を用いて定義される行列であり、したがって、ｎ×ｋの行列である。後行列Ｒは行列ＵΣの残りの列（すなわち、行列ＵΣの後方の列）で構成される行列である。より具体的には、後行列Ｒは行列ＵΣの（ｋ＋１）列目から最後列を用いて定義される行列であり、したがって、ｎ×（ｍ－ｋ）の行列である。値ｋは、行列ＵΣの分割位置を示す値であるといえる。 The prediction unit 11 obtains an approximate value of the output vector using the matrix UΣ and the matrix V. As shown in FIG. 5, the prediction unit 11 divides the matrix UΣ into a front matrix L and a back matrix R. The prematrix L is a matrix defined using the first to kth columns of the matrix UΣ (that is, the column in front of the matrix UΣ), and is therefore an n × k matrix. The back matrix R is a matrix composed of the remaining columns of the matrix UΣ (that is, the columns behind the matrix UΣ). More specifically, the back matrix R is a matrix defined by using the (k + 1) th column to the last column of the matrix UΣ, and is therefore an n × (m−k) matrix. It can be said that the value k is a value indicating the division position of the matrix UΣ.

また、予測部１１は行列Ｖと中間ベクトルｘとに基づいてｍ次元の一時ベクトルｘ´を求める。具体的には、予測部１１は行列Ｖと中間ベクトルｘとの積を一時ベクトルｘ´として求める。すなわち、ｘ´＝Ｖｘである。予測部１１はこの一時ベクトルｘ´を前ベクトルｘ_Ｌと後ベクトルｘ_Ｒとに分割する。前ベクトルｘ_Ｌは一時ベクトルｘ´の１個目からｋ個目までの要素を用いて定義されるベクトルであり、したがって、ｋ次元のベクトルである。後ベクトルｘ_Ｒは一時ベクトルｘ´の残りの要素で構成されるベクトルである。より具体的には、後ベクトルｘ_Ｒは一時ベクトルｘ´の（ｋ＋１）個目の要素から最後の要素（ｍ個目の要素）を用いて定義される行列であり、したがって、（ｍ－ｋ）次元のベクトルである。値ｋは、一時ベクトルｘ´の分割位置も示す値であるといえる。 Further, the prediction unit 11 obtains an m-dimensional temporary vector x'based on the matrix V and the intermediate vector x. Specifically, the prediction unit 11 obtains the product of the matrix V and the intermediate vector x as a temporary vector x'. That is, x'= Vx. The prediction unit 11 divides this temporary vector x ′ into a front vector x _L and a rear vector x _R. The prevector x _L is a vector defined by using the first to kth elements of the temporary vector x', and is therefore a k-dimensional vector. The posterior vector x _R is a vector composed of the remaining elements of the temporary vector x'. More specifically, the posterior vector x _R is a matrix defined using the (k + 1) th element to the last element (m th element) of the temporary vector x', and therefore (mk + 1). ) It is a vector of dimensions. It can be said that the value k is also a value indicating the division position of the temporary vector x'.

続いて、予測部１１は前行列Ｌと前ベクトルｘ_Ｌとに基づいてｎ次元の近似ベクトルｙ_ａを求める。具体的には、予測部１１は前行列Ｌと前ベクトルｘ_Ｌとの積を近似ベクトルｙ_ａとして求める。すなわち、ｙ_ａ＝Ｌｘ_Ｌである。近似ベクトルｙ_ａは、ｙ＝Ａｘで得られる正確な出力ベクトルｙの近似値である。 Subsequently, the prediction unit 11 obtains an n-dimensional approximate vector ya _a based on the front matrix L and the front vector x _L. Specifically, the prediction unit 11 obtains the product of the front matrix L and the front vector _x _L as the approximation vector ya. That is, ya ₌ Lx _L. The approximation vector y _a is an approximate value of the accurate output vector y obtained by y = Ax.

近似ベクトルｙ_ａは、変換行列Ａのうち重要な要素（前行列Ｌ）のみを用いて得られるので、正確な出力ベクトルｙを高精度に近似していることが期待できる。具体的には、近似ベクトルｙ_ａの最大要素のインデックスは、正確な出力ベクトルｙの最大要素のインデックスと同じである蓋然性が高い。ここで、最大要素とは値が最も大きい要素のことをいう。また、インデックスとは、要素の位置を示す要素番号のことをいう。例えば、分類問題（識別問題）では最大要素のインデックスがわかれば十分である。したがって、ｙ_ａ＝Ｌｘ_Ｌという近似計算でも最大要素のインデックスが変わらなければ、分類結果（識別結果）は、ｙ＝Ａｘを計算した場合と変わらない。その近似計算は、ｙ＝Ａｘという行列演算の一部のみを計算することを意味するので、近似ベクトルｙ_ａを出力ベクトルｙと見なすことで、機械学習の実行時間を短縮することができる。 Since the approximation vector y _a is obtained by using only the important element (prematrix L) of the transformation matrix A, it can be expected that the accurate output vector y is approximated with high accuracy. Specifically, it is highly probable that the index of the maximum element of the approximation vector y _a is the same as the index of the maximum element of the accurate output vector y. Here, the maximum element means the element having the largest value. Further, the index means an element number indicating the position of the element. For example, in a classification problem (discrimination problem), it is sufficient to know the index of the maximum element. Therefore, if the index of the maximum element does not change even in the approximate calculation of ya ₌ Lx _L , the classification result (discrimination result) is the same as the case where y = Ax is calculated. Since the approximate calculation means that only a part of the matrix operation of y = Ax is calculated, the execution time of machine learning can be shortened by considering the approximate vector y _a as the output vector y.

一方、近似ベクトルｙ_ａが出力ベクトルｙを近似していない場合には、省略したデータ（後行列Ｒおよび後ベクトルｘ_Ｒ）をさらに用いることで、正確な出力ベクトルｙ（正確な分類結果）を得ることができる。 On the other hand, when the approximation vector y _a does not approximate the output vector y, the omitted data (back matrix R and back vector x _R ) can be further used to obtain an accurate output vector y (correct classification result). Obtainable.

図６および図７を参照しながら機械学習システム１０の動作を説明する。図６は、学習済みモデルを取得した際に実行される処理の一例を示すフローチャートである。図７は中間ベクトルから出力ベクトルを得る処理の一例を示すフローチャートである。 The operation of the machine learning system 10 will be described with reference to FIGS. 6 and 7. FIG. 6 is a flowchart showing an example of the process executed when the trained model is acquired. FIG. 7 is a flowchart showing an example of a process of obtaining an output vector from an intermediate vector.

図６を参照しながら、学習済みモデルを取得した際に実行される処理について説明する。ステップＳ１１では、予測部１１が学習済みモデルを取得する。上述したように、この学習済みモデルは、変換行列Ａを含んで構成されるニューラルネットワークである。 The process executed when the trained model is acquired will be described with reference to FIG. In step S11, the prediction unit 11 acquires the trained model. As described above, this trained model is a neural network composed of the transformation matrix A.

ステップＳ１２では、予測部１１が変換行列Ａを特異値分解により行列ＵΣと行列Ｖとに分解する。すなわち、予測部１１はＡ＝ＵΣ×Ｖを計算する。変換行列Ａがｎ×ｍ行列であれば、行列ＵΣはｎ×ｍ行列であり、行列Ｖはｍ×ｍ行列である。 In step S12, the prediction unit 11 decomposes the transformation matrix A into a matrix UΣ and a matrix V by singular value decomposition. That is, the prediction unit 11 calculates A = UΣ × V. If the transformation matrix A is an n × m matrix, the matrix UΣ is an n × m matrix, and the matrix V is an m × m matrix.

ステップＳ１３では、予測部１１が行列ＵΣを前行列Ｌと後行列Ｒとに分割する。予測部１１は、前方の列（１列目からｋ列目）に含まれる各対角成分が残りの列（（ｋ＋１）列目から最後列）の対角成分の最大値と同じかまたは該最大値よりも大きくなるように、行列Σの対角成分を並べる。例えば、予測部１１はその対角成分を降順に並べてもよい。行列ＵΣの分割位置を示す値ｋを決定する方法は限定されない。値ｋは予め定められていてもよいし、予測部１１が動的に（すなわち、自動的に）決定してもよい。 In step S13, the prediction unit 11 divides the matrix UΣ into a front matrix L and a back matrix R. In the prediction unit 11, each diagonal component included in the front column (first column to kth column) is the same as or the maximum value of the diagonal component in the remaining column ((k + 1) column to last column). Arrange the diagonal components of the matrix Σ so that they are larger than the maximum value. For example, the prediction unit 11 may arrange the diagonal components in descending order. The method for determining the value k indicating the division position of the matrix UΣ is not limited. The value k may be predetermined or may be dynamically (that is, automatically) determined by the prediction unit 11.

例えば、値ｋは中間ベクトルの次元数（行列Σの列数）ｍの半分であってもよい。例えば、次元数ｍが偶数であればｋ＝ｍ／２である。次元数ｍが奇数であればｋ＝（ｍ－１）／２、またはｋ＝（ｍ＋１）／２でもよい。本実施形態では、次元数ｍが奇数であるこの場合も、値ｋが中間ベクトルの次元数の半分である例に含まれるものとする。 For example, the value k may be half the number of dimensions (the number of columns of the matrix Σ) m of the intermediate vector. For example, if the number of dimensions m is an even number, k = m / 2. If the number of dimensions m is an odd number, k = (m-1) / 2 or k = (m + 1) / 2 may be used. In the present embodiment, even in this case where the dimension number m is an odd number, it is included in the example in which the value k is half the dimension number of the intermediate vector.

あるいは、予測部１１は行列Σの対角成分を１列目から最終列に向けて降順に並べた上で、対角成分が予め定めた閾値Ｔａ以上であることを満たす最後の列の列番号をｋとして設定してもよい。例えば、対角成分が降順に並んでおり、ｍ＝１００であり、且つＴａ＝１であるとして、６０列目の対角成分が１以上であり、６１列目の対角成分が１未満である場合には、予測部１１はｋを６０に設定する。閾値Ｔａの具体的な値は限定されず、例えば、学習済みモデルの特性、予測しようとする事象の特性などの様々な要因を考慮して設定されてよい。 Alternatively, the prediction unit 11 arranges the diagonal components of the matrix Σ in descending order from the first column to the last column, and then the column number of the last column satisfying that the diagonal components are equal to or higher than the predetermined threshold value Ta. May be set as k. For example, assuming that the diagonal components are arranged in descending order, m = 100, and Ta = 1, the diagonal components in the 60th column are 1 or more, and the diagonal components in the 61st column are less than 1. In some cases, the prediction unit 11 sets k to 60. The specific value of the threshold value Ta is not limited, and may be set in consideration of various factors such as the characteristics of the trained model and the characteristics of the event to be predicted.

あるいは、予測部１１は行列Σの対角成分を１列目から最終列に向けて降順に並べた上で、対角成分の偏差値が予め定めた閾値Ｔｂ以上であることを満たす最後の列の列番号をｋとして設定してもよい。例えば、対角成分が降順に並んでおり、ｍ＝１００であり、且つＴｂ＝５０であるとして、４０列目の対角成分の偏差値が５０以上であり、４１列目の対角成分の偏差値が５０未満である場合には、予測部１１はｋを４０に設定する。個々の対角成分の偏差値は、すべての対角成分の平均および分散を用いて求めることができる。閾値Ｔｂの具体的な値は限定されず、例えば、学習済みモデルの特性、予測しようとする事象の特性などの様々な要因を考慮して設定されてよい。 Alternatively, the prediction unit 11 arranges the diagonal components of the matrix Σ in descending order from the first column to the last column, and then the last column satisfying that the deviation value of the diagonal components is equal to or higher than the predetermined threshold value Tb. The column number of may be set as k. For example, assuming that the diagonal components are arranged in descending order, m = 100, and Tb = 50, the deviation value of the diagonal components in the 40th column is 50 or more, and the diagonal components in the 41st column If the deviation value is less than 50, the prediction unit 11 sets k to 40. The deviation value of each diagonal component can be determined using the mean and variance of all diagonal components. The specific value of the threshold value Tb is not limited, and may be set in consideration of various factors such as the characteristics of the trained model and the characteristics of the event to be predicted.

このように値ｋの決め方は限定されないが、いずれにしても、予測部１１は、値が大きな対角成分が前方の列に集まるように行列Σの対角成分を並べた上で、行列ＵΣを前行列Ｌおよび後行列Ｒとに分割する。行列ＵΣがｎ×ｍ行列であれば、前行列Ｌはｎ×ｋ行列であり、後行列Ｒはｎ×（ｍ－ｋ）行列である。 In this way, the method of determining the value k is not limited, but in any case, the prediction unit 11 arranges the diagonal components of the matrix Σ so that the diagonal components having a large value gather in the front column, and then arranges the diagonal components of the matrix Σ, and then the matrix UΣ. Is divided into a front matrix L and a back matrix R. If the matrix UΣ is an n × m matrix, the front matrix L is an n × k matrix and the rear matrix R is an n × (m−k) matrix.

図７を参照しながら出力ベクトルの計算について説明する。図７は一つの出力ベクトルを求める処理を示すものである。或る一つの問題をニューラルネットワーク１２を用いて解くために出力ベクトルを複数回求めなければならない場合がある。この場合には、その一つの問題を処理するために、図７に示す一連の処理が複数回実行される。 The calculation of the output vector will be described with reference to FIG. 7. FIG. 7 shows a process of obtaining one output vector. It may be necessary to obtain the output vector multiple times in order to solve one problem using the neural network 12. In this case, a series of processes shown in FIG. 7 is executed a plurality of times in order to process the one problem.

ステップＳ２１では、予測部１１が行列Ｖと中間ベクトルｘとに基づいて一時ベクトルｘ´を求める。具体的には、予測部１１は行列Ｖと中間ベクトルｘとの積を一時ベクトルｘ´として求める。 In step S21, the prediction unit 11 obtains a temporary vector x'based on the matrix V and the intermediate vector x. Specifically, the prediction unit 11 obtains the product of the matrix V and the intermediate vector x as a temporary vector x'.

ステップＳ２２では、予測部１１がその一時ベクトルｘ´を前ベクトルｘ_Ｌと後ベクトルｘ_Ｒとに分割する。予測部１１は、一時ベクトルｘ´の１個目からｋ個目までの要素を用いて前ベクトルｘ_Ｌを生成し、一時ベクトルｘ´の（ｋ＋１）個目の要素から最後の要素（ｍ個目の要素）を用いて後ベクトルｘ_Ｒを生成する。一時ベクトルｘ´をこのように分割するための値ｋは、行列ＵΣを前行列Ｌと後行列Ｒとに分割する際に用いる値ｋと同じである。したがって、予測部１１は上記のステップＳ１３で設定した値ｋを一時ベクトルｘ´の分割でも用いる。 In step S22, the prediction unit 11 divides the temporary vector x ′ into a front vector x _L and a rear vector x _R. The prediction unit 11 generates a front vector x _L using the first to kth elements of the temporary vector x', and the last element (m pieces) from the (k + 1) th element of the temporary vector x'. The rear vector x _R is generated using the eye element). The value k for dividing the temporary vector x'in this way is the same as the value k used when dividing the matrix UΣ into the front matrix L and the back matrix R. Therefore, the prediction unit 11 also uses the value k set in step S13 above for the division of the temporary vector x'.

ステップＳ２３では、予測部１１が前行列Ｌと前ベクトルｘ_Ｌとに基づいて近似ベクトルｙ_ａを求める。具体的には、予測部１１は前行列Ｌと前ベクトルｘ_Ｌとの積を近似ベクトルｙ_ａとして求める。 In step S23, the prediction unit 11 obtains the approximation vector ya _a based on the front matrix L and the front vector x _L. Specifically, the prediction unit 11 obtains the product of the front matrix L and the front vector _x _L as the approximation vector ya.

ステップＳ２４では、予測部１１がその近似ベクトルｙ_ａにおける最大要素（近似ベクトルｙ_ａの要素の最大値）と少なくとも一つの他の要素（最大要素以外の近似ベクトルｙ_ａの要素のうちの少なくとも一つ）との乖離度を算出する。ステップＳ２５では、予測部１１はその乖離度を予め定められた閾値と比較する。乖離度とは、近似ベクトルｙ_ａの最大要素が近似ベクトルｙ_ａの他の要素の値からどれだけ離れているかを示す指標である。乖離度が大きいほど、近似ベクトルｙ_ａの要素の最大値は他の要素の値から大きく離れている、ということができる。乖離度が一定の水準以上に大きければ、最大要素と他の要素との間に有意な差があるといえる。 In step S24, the prediction unit 11 has the maximum element (maximum value of the element of the approximation _vector _ya ) and at least one other element (at least one of the elements of the approximation vector _ya other than the maximum element) in the approximation vector ya. Calculate the degree of deviation from one). In step S25, the prediction unit 11 compares the degree of deviation with a predetermined threshold value. The degree of divergence is an index showing how far the maximum element of the approximation vector _ya is from the values of other elements of the approximation vector _ya . It can be said that the larger the degree of divergence, the larger the maximum value of the element of the approximation vector _ya is from the values of other elements. If the degree of divergence is greater than a certain level, it can be said that there is a significant difference between the maximum element and other elements.

ステップＳ２４，Ｓ２５で用いる乖離度の種類は限定されない。例えば、予測部１１は近似ベクトルｙ_ａの最大要素と、近似ベクトルｙ_ａの中で２番目に大きい要素との差を乖離度として求め、この乖離度が閾値Ｔｃより大きいか否かを判定してもよい。あるいは、予測部１１は近似ベクトルｙ_ａの最大要素の偏差値を乖離度として求め、この乖離度が閾値Ｔｄより大きいか否かを判定してもよい。この偏差値は、近似ベクトルｙ_ａの全要素の平均および分散を用いて求めることができる。閾値Ｔｃ、Ｔｄのいずれについても、その具体的な値は限定されず、例えば、学習済みモデルの特性、予測しようとする事象の特性などの様々な要因を考慮して設定されてよい。 The type of deviation degree used in steps S24 and S25 is not limited. _For example, the prediction unit 11 obtains the difference between the maximum element of the approximation vector _ya and the second largest element in the approximation vector ya as the degree of divergence, and determines whether or not the degree of divergence is larger than the threshold value Tc. You may. Alternatively, the prediction unit 11 may obtain the deviation value of the maximum element of the approximation vector _ya as the degree of deviation and determine whether or not the degree of deviation is larger than the threshold value Td. This deviation value can be obtained by using the mean and variance of all the elements of the approximation vector _ya . The specific values of the threshold values Tc and Td are not limited, and may be set in consideration of various factors such as the characteristics of the trained model and the characteristics of the event to be predicted.

乖離度が閾値より大きい場合には（ステップＳ２５においてＹＥＳ）、処理はステップＳ２６に移り、予測部１１が近似ベクトルｙ_ａを出力ベクトルｙとして設定する。乖離度が閾値より大きければ、近似ベクトルｙ_ａの最大要素のインデックスが、正確な出力ベクトルｙの最大要素のインデックスと同じである蓋然性が高い。例えば、分類問題（識別問題）では最大要素のインデックスがわかれば十分なので、乖離度が閾値より大きければ、近似ベクトルｙ_ａによる分類結果（識別結果）は、正確な出力ベクトルｙによる分類結果と変わらないと期待できる。 If the degree of deviation is larger than the threshold value (YES in step S25), the process proceeds to step S26, and the prediction unit 11 sets the approximation vector y _a as the output vector y. If the degree of deviation is larger than the threshold value, it is highly probable that the index of the maximum element of the approximation vector y _a is the same as the index of the maximum element of the accurate output vector y. For example, in the classification problem (discrimination problem), it is sufficient to know the index of the maximum element, so if the degree of deviation is larger than the threshold value, the classification result (discrimination result) by the approximation vector _ya is different from the classification result by the accurate output vector y. You can expect not to.

乖離度が閾値以下である場合には（ステップＳ２５においてＮＯ）、処理はステップＳ２７に移る。ステップＳ２７では、予測部１１が、近似ベクトルｙ_ａに加えて、後行列Ｒおよび後ベクトルｘ_Ｒをさらに用いて出力ベクトルｙを算出する。具体的には、予測部１１は後行列Ｒと後ベクトルｘ_Ｒとの積を近似ベクトルｙ_ａに加えることで正確な出力ベクトルｙを求める。すなわち、予測部１１はｙ＝ｙ_ａ＋Ｒｘ_Ｒを計算する。乖離度が閾値以下であれば、近似ベクトルｙ_ａの最大要素のインデックスが、正確な出力ベクトルｙの最大要素のインデックスと異なる蓋然性が高い。この場合には、近似ベクトルｙ_ａを最終結果として採用するのではなく、出力ベクトルｙを正確に計算した方が、予測の精度がより高くなる。予測部１１は、省略した後行列Ｒおよび後ベクトルｘ_Ｒをさらに用いて計算することで、正確な出力ベクトルｙを得る。 If the degree of deviation is equal to or less than the threshold value (NO in step S25), the process proceeds to step S27. In step S27, the prediction unit 11 calculates the output vector y by further using the back matrix R and the back vector _x _R in addition to the approximation vector ya. Specifically, the prediction unit 11 obtains an accurate output vector y by adding the product of the back matrix R and the back vector x _R to the approximation vector y _a . That is, the prediction unit 11 calculates y ₌ ya + Rx _R. If the degree of deviation is equal to or less than the threshold value, it is highly probable that the index of the maximum element of the approximation vector y _a is different from the index of the maximum element of the accurate output vector y. In this case, the accuracy of the prediction is higher when the output vector y is calculated accurately instead of adopting the approximate vector y _a as the final result. The prediction unit 11 further obtains an accurate output vector y by further using the omitted back matrix R and the back vector x _R.

予測部１１は、これら一連の処理により得られた出力ベクトルｙに基づいて、ニューラルネットワークの最終的な結果を計算または生成する。最終結果を計算または生成する方法は限定されない。 The prediction unit 11 calculates or generates the final result of the neural network based on the output vector y obtained by these series of processes. There are no restrictions on how the final result is calculated or generated.

例えば、予測部１１は下記の式（１）で示されるソフトマックス（Ｓｏｆｔｍａｘ）関数を用いて最終結果を求めてもよい。 For example, the prediction unit 11 may obtain the final result by using the Softmax function represented by the following equation (1).

式（１）において、ｙ_ｉは出力ベクトルｙのｉ番目の要素を表し、ｎは出力ベクトルｙの要素数を表す。

In the equation (1), y _i represents the i-th element of the output vector y, and n represents the number of elements of the output vector y.

このソフトマックス関数は、出力ベクトルの要素を確率分布に変換する。ソフトマックス関数により、出力ベクトルの各要素は０から１の間の値をとり、出力ベクトルの全要素の和は１になる。一般には、このソフトマックス関数は分類問題（識別問題）を解く場合によく用いられる。 This softmax function transforms the elements of the output vector into a probability distribution. By the softmax function, each element of the output vector takes a value between 0 and 1, and the sum of all the elements of the output vector is 1. Generally, this softmax function is often used to solve a classification problem (discrimination problem).

あるいは、予測部１１は出力ベクトルｙをそのまま最終結果として設定してもよい。例えば、予測部１１は回帰問題を解く場合に出力ベクトルｙをそのまま最終結果として出力してもよい。 Alternatively, the prediction unit 11 may set the output vector y as it is as the final result. For example, the prediction unit 11 may output the output vector y as it is as the final result when solving the regression problem.

学習済みモデルでは変換行列Ａが確定しているので、予測部１１は、その学習済みモデルを読み込んだ時に一度だけ特異値分解を実行して前行列Ｌおよび後行列を求めればよい。したがって、個々の出力ベクトルを求めようとする度に前行列Ｌおよび後行列Ｒを求める必要はない。 Since the transformation matrix A is fixed in the trained model, the prediction unit 11 may perform the singular value decomposition only once when the trained model is read to obtain the front matrix L and the back matrix. Therefore, it is not necessary to obtain the front matrix L and the back matrix R each time an individual output vector is to be obtained.

本実施形態では機械学習システム１０（予測部１１）が変換行列Ａを特異値分解することで行列ＵΣと行列Ｖとを取得するが、機械学習システム１０（予測部１１）は別のコンピュータシステムで算出された行列ＵΣおよび行列Ｖを取得してもよい。すなわち、該別のコンピュータシステムが変換行列Ａを特異値分解してもよい。 In the present embodiment, the machine learning system 10 (prediction unit 11) acquires the matrix UΣ and the matrix V by singularly decomposing the transformation matrix A, but the machine learning system 10 (prediction unit 11) is a different computer system. The calculated matrix UΣ and the matrix V may be acquired. That is, the other computer system may decompose the transformation matrix A into singular values.

本実施形態では、機械学習システム１０（予測部１１）が、近似ベクトルｙ_ａについての乖離度に基づいて、近似ベクトルｙ_ａを出力ベクトルｙとして設定するか、または正確な出力ベクトルｙ＝ｙ_ａ＋Ｒｘ_Ｒを求める。しかし、乖離度に基づくこの分岐処理は必須ではない。したがって、機械学習システム１０（予測部１１）は乖離度を求めることなく、近似ベクトルｙ_ａを出力ベクトルｙとして設定してもよい。 In the present embodiment, the machine learning system 10 (prediction unit 11) sets the approximation vector y _a as the output vector y based on the degree of deviation with respect to the approximation vector y _a , or the accurate output vector y ₌ ya. Find + Rx _R. However, this branching process based on the degree of divergence is not essential. Therefore, the machine learning system 10 (prediction unit 11) may set the approximation vector y _a as the output vector y without obtaining the degree of deviation.

機械学習システム１０内で二つの数値の大小関係を比較する際には、「以上」および「よりも大きい」という二つの基準のどちらを用いてもよく、「以下」および「未満」の二つの基準のうちのどちらを用いてもよい。このような基準の選択は、二つの数値の大小関係を比較する処理についての技術的意義を変更するものではない。 When comparing the magnitude relations of two numerical values in the machine learning system 10, either of the two criteria of "greater than or equal to" and "greater than or equal to" may be used, and the two criteria of "less than or equal to" and "less than" are used. Either of the criteria may be used. The selection of such criteria does not change the technical significance of the process of comparing the magnitude relations of two numbers.

本実施形態のように出力ベクトルの近似値を用いることで、出力層の次元が膨大な場合にも機械学習を高速に実行することが可能になる。図８を参照しながら、この技術的効果について説明する。図８は、ＬＳＴＭ（ＬｏｎｇＳｈｏｒｔ－ＴｅｒｍＭｅｍｏｒｙ（長・短期記憶））というニューラルネットワーク２０を用いた機械翻訳に本実施形態を応用（適用）した実際の例を模式的に示す図である。 By using the approximate value of the output vector as in the present embodiment, it is possible to execute machine learning at high speed even when the dimension of the output layer is enormous. This technical effect will be described with reference to FIG. FIG. 8 is a diagram schematically showing an actual example in which this embodiment is applied (applied) to machine translation using a neural network 20 called LSTM (Long Short-Term Memory).

翻訳などの自然言語処理では語彙数が出力の候補の数になり得るので、出力層のベクトルの次元数はその語彙数に対応して数万以上（例えば約５００００）になる。その結果、出力ベクトルの計算量が膨大になる。例えば、中間ベクトルの次元数が５００程度であっても、出力ベクトルを計算するために５００×５００００の行列演算が必要になり、この行列演算がニューラルネットワークの計算において支配的になり得る。 In natural language processing such as translation, the number of vocabularies can be the number of output candidates, so the number of dimensions of the vector of the output layer is tens of thousands or more (for example, about 50,000) corresponding to the number of vocabularies. As a result, the amount of calculation of the output vector becomes enormous. For example, even if the number of dimensions of the intermediate vector is about 500, a matrix operation of 500 × 50,000 is required to calculate the output vector, and this matrix operation can be dominant in the calculation of the neural network.

図８の例では、ニューラルネットワーク（ＬＳＴＭ）２０は日本語の文を英語に翻訳する。このニューラルネットワーク２０において、中間層および出力層の次元数はそれぞれ５００、５００００であるとする。図８の例では、「私は日本人です。」という日本語の文が「ＩａｍＪａｐａｎｅｓｅ.」と翻訳されている。中間ベクトルから出力ベクトルを得るための行列演算は、英文を構成する個々の単語（文末記号である＜ＥＯＳ＞も含む）について実行されるので、図８の例ではその行列演算は４回実行される。その４回の行列演算において近似ベクトルｙ_ａが出力ベクトルｙとして設定される回数は０～４の間である。 In the example of FIG. 8, the neural network (LSTM) 20 translates a Japanese sentence into English. In this neural network 20, it is assumed that the number of dimensions of the intermediate layer and the output layer are 500 and 50,000, respectively. In the example of FIG. 8, the Japanese sentence "I am Japanese." Is translated as "I am Japanese." Since the matrix operation for obtaining the output vector from the intermediate vector is executed for each word constituting the English sentence (including the sentence ending symbol <EOS>), the matrix operation is executed four times in the example of FIG. To. The number of times that the approximation vector y _a is set as the output vector y in the four matrix operations is between 0 and 4.

５００次元の中間層および５００００次元の出力層を有するニューラルネットワーク２０において、５００列の行列ＵΣを前行列Ｌと後行列Ｒとに分割するための値ｋを固定値３００に設定した。したがって、前行列Ｌおよび後行列Ｒの列数はそれぞれ３００、２００であった。一例として、本実施形態を適用したこのニューラルネットワーク２０で「自転車で通勤すると運動になります。」という日本語の文を英訳したところ、「If you go to work by bicycle, you will get exercise.」という正しい英訳を２６８ｍｓ（ミリ秒）で得ることができた。 In the neural network 20 having a 500-dimensional intermediate layer and a 50,000-dimensional output layer, the value k for dividing the 500-column matrix UΣ into the front matrix L and the rear matrix R is set to a fixed value of 300. Therefore, the number of columns in the front matrix L and the back matrix R was 300 and 200, respectively. As an example, when the Japanese sentence "If you go to work by bicycle, you will get exercise." Is translated into English by this neural network 20 to which this embodiment is applied, "If you go to work by bicycle, you will get exercise." I was able to obtain the correct English translation in 268 ms (milliseconds).

比較のために、行列ＵΣを分割することなく常にすべての列を用いてその和文を翻訳したところ、正しい英訳が３０２ｍｓで得られた。意図的に後行列Ｒを用いずに常に前行列Ｌのみを用いてその和文を英訳したところ、前行列Ｌの列数ｋに応じて結果が変わった。具体的には、ｋ＝４００では、正しい英訳が２７７ｍｓで得られた。ｋ＝３００では、「If you go to work by bicycle, you will exercise.」という、正解に近い結果が２２９ｍｓで得られた。ｋ＝２００の場合には、「When you go to work by bicycle, you can exercise.」という不完全な結果が１９２ｍｓで得られた。ｋ＝１００の場合には、「To go to work on a bike is a sport.」という誤訳が１７３ｍｓで得られた。 For comparison, the Japanese translation was always performed using all the columns without dividing the matrix UΣ, and the correct English translation was obtained in 302 ms. When the Japanese sentence was intentionally translated into English using only the front matrix L without using the back matrix R, the result changed according to the number of columns k of the front matrix L. Specifically, at k = 400, a correct English translation was obtained in 277 ms. At k = 300, a result close to the correct answer, "If you go to work by bicycle, you will exercise.", Was obtained in 229 ms. In the case of k = 200, an incomplete result of "When you go to work by bicycle, you can exercise." Was obtained in 192 ms. In the case of k = 100, the mistranslation "To go to work on a bike is a sport." Was obtained in 173 ms.

ニューラルネットワーク２０を用いた実験からわかるように、本実施形態に係る機械学習システム１０を採用することで、確度の高い結果を高速に得ることが可能になる。上記の翻訳の例では、３００列を有する前行列Ｌのみを用いた計算では識別結果が曖昧な場合に限って、残りの２００列を有する後行列Ｒをさらに用いて計算が行われる。したがって、２６８ｍｓという短時間で正解を得ることができた。 As can be seen from the experiment using the neural network 20, by adopting the machine learning system 10 according to the present embodiment, it is possible to obtain highly accurate results at high speed. In the above translation example, only when the discrimination result is ambiguous in the calculation using only the front matrix L having 300 columns, the calculation is further performed using the back matrix R having the remaining 200 columns. Therefore, the correct answer could be obtained in a short time of 268 ms.

上記実施の形態の説明に用いたブロック図は、機能単位のブロックを示している。これらの機能ブロック（構成要素）は、ハードウェアおよび／またはソフトウェアの任意の組み合わせによって実現される。また、各機能ブロックの実現手段は特に限定されない。すなわち、各機能ブロックは、物理的および／または論理的に結合した一つの装置により実現されてもよいし、物理的および／または論理的に分離した２つ以上の装置を直接的および／または間接的に（例えば、有線および／または無線）で接続し、これら複数の装置により実現されてもよい。 The block diagram used in the description of the above embodiment shows a block of functional units. These functional blocks (components) are realized by any combination of hardware and / or software. Further, the means for realizing each functional block is not particularly limited. That is, each functional block may be realized by one physically and / or logically coupled device, or directly and / or indirectly by two or more physically and / or logically separated devices. It may be connected specifically (eg, wired and / or wireless) and realized by these plurality of devices.

例えば、本発明の一実施の形態における機械学習システム１０は、本実施形態の処理を行うコンピュータとして機能してもよい。図９は、機械学習システム１０として機能するコンピュータ１００のハードウェア構成の一例を示す図である。コンピュータ１００は、物理的には、プロセッサ１００１、メモリ１００２、ストレージ１００３、通信装置１００４、入力装置１００５、出力装置１００６、バス１００７などを含んでもよい。 For example, the machine learning system 10 in one embodiment of the present invention may function as a computer for processing the present embodiment. FIG. 9 is a diagram showing an example of a hardware configuration of a computer 100 that functions as a machine learning system 10. The computer 100 may physically include a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like.

なお、以下の説明では、「装置」という文言は、回路、デバイス、ユニットなどに読み替えることができる。機械学習システム１０のハードウェア構成は、図に示した各装置を一つまたは複数含むように構成されてもよいし、一部の装置を含まずに構成されてもよい。 In the following description, the word "device" can be read as a circuit, a device, a unit, or the like. The hardware configuration of the machine learning system 10 may be configured to include one or more of the devices shown in the figure, or may be configured not to include some of the devices.

機械学習システム１０における各機能は、プロセッサ１００１、メモリ１００２などのハードウェア上に所定のソフトウェア（プログラム）を読み込ませることで、プロセッサ１００１が演算を行い、通信装置１００４による通信や、メモリ１００２およびストレージ１００３におけるデータの読み出しおよび／または書き込みを制御することで実現される。 For each function in the machine learning system 10, by loading predetermined software (program) on hardware such as the processor 1001 and the memory 1002, the processor 1001 performs an calculation, and communication by the communication device 1004, the memory 1002, and the storage are performed. It is realized by controlling the reading and / or writing of data in 1003.

プロセッサ１００１は、例えば、オペレーティングシステムを動作させてコンピュータ全体を制御する。プロセッサ１００１は、周辺装置とのインターフェース、制御装置、演算装置、レジスタなどを含む中央処理装置（ＣＰＵ：Central Processing Unit）で構成されてもよい。例えば、機械学習システム１０の少なくとも一部の機能要素はプロセッサ１００１で実現されてもよい。 Processor 1001 operates, for example, an operating system to control the entire computer. The processor 1001 may be composed of a central processing unit (CPU) including an interface with a peripheral device, a control device, an arithmetic unit, a register, and the like. For example, at least a part of the functional elements of the machine learning system 10 may be realized by the processor 1001.

また、プロセッサ１００１は、プログラム（プログラムコード）、ソフトウェアモジュールやデータを、ストレージ１００３および／または通信装置１００４からメモリ１００２に読み出し、これらに従って各種の処理を実行する。プログラムとしては、上述の実施の形態で説明した動作の少なくとも一部をコンピュータに実行させるプログラムが用いられる。例えば、機械学習システム１０の少なくとも一部の機能要素は、メモリ１００２に格納され、プロセッサ１００１で動作する制御プログラムによって実現されてもよく、他の機能ブロックについても同様に実現されてもよい。上述の各種処理は、一つのプロセッサ１００１で実行される旨を説明してきたが、２以上のプロセッサ１００１により同時または逐次に実行されてもよい。プロセッサ１００１は、１以上のチップで実装されてもよい。なお、プログラムは、電気通信回線を介してネットワークから送信されてもよい。 Further, the processor 1001 reads a program (program code), a software module and data from the storage 1003 and / or the communication device 1004 into the memory 1002, and executes various processes according to these. As the program, a program that causes a computer to execute at least a part of the operations described in the above-described embodiment is used. For example, at least a part of the functional elements of the machine learning system 10 may be realized by a control program stored in the memory 1002 and operated by the processor 1001, and may be realized for other functional blocks as well. Although it has been described that the above-mentioned various processes are executed by one processor 1001, they may be executed simultaneously or sequentially by two or more processors 1001. Processor 1001 may be mounted on one or more chips. The program may be transmitted from the network via a telecommunication line.

メモリ１００２は、コンピュータ読み取り可能な記録媒体であり、例えば、ＲＯＭ（Read Only Memory）、ＥＰＲＯＭ（Erasable Programmable ROM）、ＥＥＰＲＯＭ（Electrically Erasable Programmable ROM）、ＲＡＭ（Random Access Memory）などの少なくとも一つで構成されてもよい。メモリ１００２は、レジスタ、キャッシュ、メインメモリ（主記憶装置）などと呼ばれてもよい。メモリ１００２は、本発明の一実施の形態に係る無線通信方法を実施するために実行可能なプログラム（プログラムコード）、ソフトウェアモジュールなどを保存することができる。 The memory 1002 is a computer-readable recording medium, and is composed of at least one such as a ROM (Read Only Memory), an EPROM (Erasable Programmable ROM), an EEPROM (Electrically Erasable Programmable ROM), and a RAM (Random Access Memory). May be done. The memory 1002 may be referred to as a register, a cache, a main memory (main storage device), or the like. The memory 1002 can store a program (program code), a software module, and the like that can be executed to implement the wireless communication method according to the embodiment of the present invention.

ストレージ１００３は、コンピュータ読み取り可能な記録媒体であり、例えば、ＣＤＲＯＭ（Compact Disc ROM）などの光ディスク、ハードディスクドライブ、フレキシブルディスク、光磁気ディスク（例えば、コンパクトディスク、デジタル多用途ディスク、Ｂｌｕ－ｒａｙ（登録商標）ディスク）、スマートカード、フラッシュメモリ（例えば、カード、スティック、キードライブ）、フロッピー（登録商標）ディスク、磁気ストリップなどの少なくとも一つで構成されてもよい。ストレージ１００３は、補助記憶装置と呼ばれてもよい。上述の記憶媒体は、例えば、メモリ１００２および／またはストレージ１００３を含むテーブル、サーバその他の適切な媒体であってもよい。 The storage 1003 is a computer-readable recording medium, and is, for example, an optical disk such as a CDROM (Compact Disc ROM), a hard disk drive, a flexible disk, a photomagnetic disk (for example, a compact disk, a digital versatile disk, or a Blu-ray (registration)). It may consist of at least one such as a (trademark) disk), a smart card, a flash memory (eg, a card, stick, key drive), a floppy (registered trademark) disk, a magnetic strip, and the like. The storage 1003 may be referred to as an auxiliary storage device. The storage medium described above may be, for example, a table, server or other suitable medium containing memory 1002 and / or storage 1003.

通信装置１００４は、有線および／または無線ネットワークを介してコンピュータ間の通信を行うためのハードウェア（送受信デバイス）であり、例えばネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュールなどともいう。例えば、機械学習システム１０の少なくとも一部の機能要素は通信装置１００４で実現されてもよい。 The communication device 1004 is hardware (transmission / reception device) for communicating between computers via a wired and / or wireless network, and is also referred to as, for example, a network device, a network controller, a network card, a communication module, or the like. For example, at least a part of the functional elements of the machine learning system 10 may be realized by the communication device 1004.

入力装置１００５は、外部からの入力を受け付ける入力デバイス（例えば、キーボード、マウス、マイクロフォン、スイッチ、ボタン、センサなど）である。出力装置１００６は、外部への出力を実施する出力デバイス（例えば、ディスプレイ、スピーカー、ＬＥＤランプなど）である。なお、入力装置１００５および出力装置１００６は、一体となった構成（例えば、タッチパネル）であってもよい。 The input device 1005 is an input device (for example, a keyboard, a mouse, a microphone, a switch, a button, a sensor, etc.) that receives an input from the outside. The output device 1006 is an output device (for example, a display, a speaker, an LED lamp, etc.) that performs output to the outside. The input device 1005 and the output device 1006 may have an integrated configuration (for example, a touch panel).

また、プロセッサ１００１やメモリ１００２などの各装置は、情報を通信するためのバス１００７で接続される。バス１００７は、単一のバスで構成されてもよいし、装置間で異なるバスで構成されてもよい。 Further, each device such as the processor 1001 and the memory 1002 is connected by a bus 1007 for communicating information. The bus 1007 may be composed of a single bus or may be composed of different buses between the devices.

また、コンピュータ１００は、マイクロプロセッサ、デジタル信号プロセッサ（ＤＳＰ：Digital Signal Processor）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（ProgrammableLogic Device）、ＦＰＧＡ（Field Programmable Gate Array）などのハードウェアを含んで構成されてもよく、当該ハードウェアにより、各機能ブロックの一部または全てが実現されてもよい。例えば、プロセッサ１００１は、これらのハードウェアの少なくとも一つで実装されてもよい。 Further, the computer 100 is configured to include hardware such as a microprocessor, a digital signal processor (DSP: Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), a PLD (ProgrammableLogic Device), and an FPGA (Field Programmable Gate Array). However, the hardware may realize a part or all of each functional block. For example, the processor 1001 may be implemented on at least one of these hardware.

以上説明したように、本発明の一側面に係る機械学習システムは、ニューラルネットワークの中間層で得られた中間ベクトルと変換行列Ａとを用いて、ニューラルネットワークの出力層のベクトルである出力ベクトルを算出し、該出力ベクトルに基づいて事象を予測する予測部を備え、予測部が、変換行列Ａを特異値分解することで得られる行列ＵΣと行列Ｖとを取得し、ここで、行列Ｕおよび行列Ｖは直交行列であり、行列Σは対角行列であり、中間ベクトルと行列Ｖとに基づいて一時ベクトルを算出し、行列ＵΣおよび一時ベクトルのそれぞれの分割位置を示す値ｋを用いて、行列ＵΣの１列目からｋ列目を用いて定義される前行列と、一時ベクトルの１個目からｋ個目までの要素を用いて定義される前ベクトルとを取得し、前行列および前ベクトルに基づいて近似ベクトルを算出し、近似ベクトルを出力ベクトルとして設定する。 As described above, the machine learning system according to one aspect of the present invention uses the intermediate vector obtained in the intermediate layer of the neural network and the transformation matrix A to obtain an output vector which is a vector of the output layer of the neural network. A prediction unit that calculates and predicts an event based on the output vector is provided, and the prediction unit acquires the matrix UΣ and the matrix V obtained by decomposing the transformation matrix A into singular values, and here, the matrix U and the matrix V are obtained. The matrix V is an orthogonal matrix, the matrix Σ is a diagonal matrix, a temporary vector is calculated based on the intermediate vector and the matrix V, and the values k indicating the respective division positions of the matrix UΣ and the temporary vector are used. Obtain the prematrix defined using the first to kth columns of the matrix UΣ and the prematrix defined using the first to kth elements of the temporary vector, and obtain the prematrix and the prematrix. Calculate the approximation vector based on the vector and set the approximation vector as the output vector.

他の側面に係る機械学習システムでは、予測部が、近似ベクトルにおける最大要素と少なくとも一つの他の要素との乖離度を算出し、乖離度が閾値より大きい場合に、近似ベクトルを出力ベクトルとして設定してもよい。乖離度が大きければ、最大要素と他の要素との差が大きいといえ、したがって、近似ベクトルと正確な出力ベクトルとの間で最大要素のインデックスが同じである蓋然性が高いといえる。乖離度が大きい場合に近似ベクトルを出力ベクトルとして設定することで、精度の高い機械学習を高速に実行することができる。 In the machine learning system related to the other aspect, the prediction unit calculates the degree of deviation between the maximum element in the approximation vector and at least one other element, and sets the approximation vector as an output vector when the degree of deviation is larger than the threshold value. You may. If the degree of divergence is large, it can be said that the difference between the maximum element and other elements is large, and therefore it is highly probable that the index of the maximum element is the same between the approximate vector and the accurate output vector. By setting the approximation vector as the output vector when the degree of deviation is large, highly accurate machine learning can be executed at high speed.

他の側面に係る機械学習システムでは、予測部が、乖離度が閾値以下である場合に、行列ＵΣの残りの列で構成される後行列Ｒと、一時ベクトルの残りの要素で構成される後ベクトルと、近似ベクトルとに基づいて出力ベクトルを算出してもよい。乖離度が小さい場合には最大要素と他の要素との差があまり大きくないので、近似ベクトルと正確な出力ベクトルとの間で最大要素のインデックスが異なる蓋然性が高い。乖離度が小さい場合に限って出力ベクトルを正確に求めることで、精度の高い機械学習を高速に実行することができる。 In the machine learning system according to the other aspect, when the degree of deviation is less than or equal to the threshold value, the predictor is composed of the posterior matrix R composed of the remaining columns of the matrix UΣ and the posterior matrix composed of the remaining elements of the temporary vector. The output vector may be calculated based on the vector and the approximation vector. When the degree of divergence is small, the difference between the maximum element and other elements is not so large, so it is highly probable that the index of the maximum element differs between the approximate vector and the accurate output vector. By accurately obtaining the output vector only when the degree of deviation is small, highly accurate machine learning can be executed at high speed.

他の側面に係る機械学習システムでは、予測部が、近似ベクトルの最大要素の偏差値を乖離度として算出してもよい。統計値の一種である偏差値を乖離度として用いることで、最大要素が他の要素からどのくらい離れているかを正しく推定することが可能になる。 In the machine learning system according to the other aspect, the prediction unit may calculate the deviation value of the maximum element of the approximation vector as the degree of deviation. By using the deviation value, which is a kind of statistical value, as the degree of deviation, it is possible to correctly estimate how far the maximum element is from other elements.

他の側面に係る機械学習システムでは、予測部が、近似ベクトルの最大要素と、近似ベクトルの中で２番目に大きい要素との差を乖離度として算出してもよい。最大要素と２番目に大きい要素との差を乖離度として用いることで、乖離度を簡単に求めることができる。 In the machine learning system according to the other aspect, the prediction unit may calculate the difference between the maximum element of the approximation vector and the second largest element in the approximation vector as the degree of deviation. By using the difference between the maximum element and the second largest element as the degree of divergence, the degree of divergence can be easily obtained.

他の側面に係る機械学習システムでは、予測部が、一時ベクトルの次元数の半分の値を値ｋとして設定してもよい。このように分割位置を設定することで行列ＵΣおよび一時ベクトルから簡単に前行列および前ベクトルを得ることができる。 In the machine learning system according to the other aspect, the prediction unit may set a value of half the number of dimensions of the temporary vector as the value k. By setting the division position in this way, the prematrix and the prevector can be easily obtained from the matrix UΣ and the temporary vector.

他の側面に係る機械学習システムでは、予測部が、行列Σの対角成分が閾値以上であることを満たす最後の列の列番号を値ｋとして設定してもよい。行列ＵΣおよび一時ベクトルの分割位置をこのように設定することで、計算に影響する重要な要素が前行列に集まるので、精度の高い近似ベクトルを求めることができる。 In the machine learning system according to the other aspect, the prediction unit may set the column number of the last column satisfying that the diagonal component of the matrix Σ is equal to or greater than the threshold value as the value k. By setting the division positions of the matrix UΣ and the temporary vector in this way, important elements that affect the calculation are gathered in the prematrix, so that a highly accurate approximate vector can be obtained.

他の側面に係る機械学習システムでは、予測部が、行列Σの対角成分の偏差値が閾値以上であることを満たす最後の列の列番号を値ｋとして設定してもよい。行列ＵΣおよび一時ベクトルの分割位置をこのように設定することで、計算に影響する重要な要素が前行列に集まるので、精度の高い近似ベクトルを求めることができる。 In the machine learning system according to the other aspect, the prediction unit may set the column number of the last column satisfying that the deviation value of the diagonal component of the matrix Σ is equal to or greater than the threshold value as the value k. By setting the division positions of the matrix UΣ and the temporary vector in this way, important elements that affect the calculation are gathered in the prematrix, so that a highly accurate approximate vector can be obtained.

以上、本実施形態について詳細に説明したが、当業者にとっては、本実施形態が本明細書中に説明した実施形態に限定されるものではないということは明らかである。本実施形態は、特許請求の範囲の記載により定まる本発明の趣旨および範囲を逸脱することなく修正および変更態様として実施することができる。したがって、本明細書の記載は、例示説明を目的とするものであり、本実施形態に対して何ら制限的な意味を有するものではない。 Although the present embodiment has been described in detail above, it is clear to those skilled in the art that the present embodiment is not limited to the embodiments described in the present specification. This embodiment can be implemented as an amendment or modification without departing from the spirit and scope of the present invention as determined by the description of the scope of claims. Therefore, the description herein is for purposes of illustration only and has no limiting implications for this embodiment.

情報の通知は、本明細書で説明した態様および実施形態に限られず、他の方法で行われてもよい。例えば、情報の通知は、物理レイヤシグナリング（例えば、ＤＣＩ（Downlink Control Information）、ＵＣＩ（Uplink Control Information））、上位レイヤシグナリング（例えば、ＲＲＣ（Radio Resource Control）シグナリング、ＭＡＣ（Medium Access Control）シグナリング、報知情報（ＭＩＢ（Master Information Block）、ＳＩＢ（System Information Block）））、その他の信号またはこれらの組み合わせによって実施されてもよい。また、ＲＲＣシグナリングは、ＲＲＣメッセージと呼ばれてもよく、例えば、ＲＲＣ接続セットアップ（RRC Connection Setup）メッセージ、ＲＲＣ接続再構成（RRC Connection Reconfiguration）メッセージなどであってもよい。 Notification of information is not limited to the embodiments and embodiments described herein, and may be performed by other methods. For example, information notification includes physical layer signaling (eg, DCI (Downlink Control Information), UCI (Uplink Control Information)), higher layer signaling (eg, RRC (Radio Resource Control) signaling, MAC (Medium Access Control) signaling, etc. It may be carried out by broadcast information (MIB (Master Information Block), SIB (System Information Block)), other signals, or a combination thereof. Further, the RRC signaling may be referred to as an RRC message, and may be, for example, an RRC Connection Setup message, an RRC Connection Reconfiguration message, or the like.

本明細書で説明した各態様／実施形態は、ＬＴＥ（Long Term Evolution）、ＬＴＥ－Ａ（LTE-Advanced）、ＳＵＰＥＲ３Ｇ、ＩＭＴ－Ａｄｖａｎｃｅｄ、４Ｇ、５Ｇ、ＦＲＡ（Future Radio Access）、Ｗ－ＣＤＭＡ（登録商標）、ＧＳＭ（登録商標）、ＣＤＭＡ２０００、ＵＭＢ（Ultra Mobile Broadband）、ＩＥＥＥ８０２．１１（Ｗｉ－Ｆｉ）、ＩＥＥＥ８０２．１６（ＷｉＭＡＸ）、ＩＥＥＥ８０２．２０、ＵＷＢ（Ultra-Wideband）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、その他の適切なシステムを利用するシステムおよび／またはこれらに基づいて拡張された次世代システムに適用されてもよい。 Each aspect / embodiment described herein includes LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G, 5G, FRA (Future Radio Access), W-CDMA. (Registered Trademarks), GSM (Registered Trademarks), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi), IEEE 802.16 (WiMAX), IEEE 802.20, UWB (Ultra-Wideband), It may be applied to Bluetooth®, other systems that utilize suitable systems and / or next-generation systems that are extended based on them.

本明細書で説明した各態様／実施形態の処理手順、シーケンス、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本明細書で説明した方法については、例示的な順序で様々なステップの要素を提示しており、提示した特定の順序に限定されない。 The processing procedures, sequences, flowcharts, and the like of each aspect / embodiment described in the present specification may be rearranged in order as long as there is no contradiction. For example, the methods described herein present elements of various steps in an exemplary order and are not limited to the particular order presented.

情報等は、上位レイヤ（または下位レイヤ）から下位レイヤ（または上位レイヤ）へ出力され得る。複数のネットワークノードを介して入出力されてもよい。 Information and the like can be output from the upper layer (or lower layer) to the lower layer (or upper layer). Input / output may be performed via a plurality of network nodes.

入出力された情報等は特定の場所（例えば、メモリ）に保存されてもよいし、管理テーブルで管理してもよい。入出力される情報等は、上書き、更新、または追記され得る。出力された情報等は削除されてもよい。入力された情報等は他の装置へ送信されてもよい。 The input / output information and the like may be stored in a specific place (for example, a memory) or may be managed by a management table. Information to be input / output may be overwritten, updated, or added. The output information and the like may be deleted. The input information or the like may be transmitted to another device.

判定は、１ビットで表される値（０か１か）によって行われてもよいし、真偽値（Boolean：trueまたはfalse）によって行われてもよいし、数値の比較（例えば、所定の値との比較）によって行われてもよい。 The determination may be made by a value represented by 1 bit (0 or 1), by a boolean value (Boolean: true or false), or by comparing numerical values (for example, a predetermined value). It may be done by comparison with the value).

本明細書で説明した各態様／実施形態は単独で用いてもよいし、組み合わせて用いてもよいし、実行に伴って切り替えて用いてもよい。また、所定の情報の通知（例えば、「Ｘであること」の通知）は、明示的に行うものに限られず、暗黙的（例えば、当該所定の情報の通知を行わない）ことによって行われてもよい。 Each aspect / embodiment described in the present specification may be used alone, in combination, or may be switched and used according to the execution. Further, the notification of predetermined information (for example, the notification of "being X") is not limited to the explicit one, but is performed implicitly (for example, the notification of the predetermined information is not performed). May be good.

ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、他の名称で呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。 Software, whether referred to as software, firmware, middleware, microcode, hardware description language, or other names, is an instruction, instruction set, code, code segment, program code, program, subprogram, software module. , Applications, software applications, software packages, routines, subroutines, objects, executable files, execution threads, procedures, features, etc. should be broadly interpreted.

また、ソフトウェア、命令などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、同軸ケーブル、光ファイバケーブル、ツイストペアおよびデジタル加入者回線（ＤＳＬ）などの有線技術および／または赤外線、無線およびマイクロ波などの無線技術を使用してウェブサイト、サーバ、または他のリモートソースから送信される場合、これらの有線技術および／または無線技術は、伝送媒体の定義内に含まれる。 Further, software, instructions, and the like may be transmitted and received via a transmission medium. For example, the software uses wired technology such as coaxial cables, fiber optic cables, twisted pair and digital subscriber lines (DSL) and / or wireless technologies such as infrared, wireless and microwave to websites, servers, or other When transmitted from a remote source, these wired and / or wireless technologies are included within the definition of transmission medium.

本明細書で説明した情報、信号などは、様々な異なる技術のいずれかを使用して表されてもよい。例えば、上記の説明全体に渡って言及され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、チップなどは、電圧、電流、電磁波、磁界若しくは磁性粒子、光場若しくは光子、またはこれらの任意の組み合わせによって表されてもよい。 The information, signals, etc. described herein may be represented using any of a variety of different techniques. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description are voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. It may be represented by a combination of.

なお、本明細書で説明した用語および／または本明細書の理解に必要な用語については、同一のまたは類似する意味を有する用語と置き換えてもよい。 The terms described herein and / or the terms necessary for understanding the present specification may be replaced with terms having the same or similar meanings.

本明細書で使用する「システム」および「ネットワーク」という用語は、互換的に使用される。 The terms "system" and "network" used herein are used interchangeably.

また、本明細書で説明した情報、パラメータなどは、絶対値で表されてもよいし、所定の値からの相対値で表されてもよいし、対応する別の情報で表されてもよい。例えば、無線リソースはインデックスで指示されるものであってもよい。 Further, the information, parameters, etc. described in the present specification may be represented by an absolute value, a relative value from a predetermined value, or another corresponding information. .. For example, the radio resource may be indexed.

上述したパラメータに使用する名称はいかなる点においても限定的なものではない。さらに、これらのパラメータを使用する数式等は、本明細書で明示的に開示したものと異なる場合もある。様々なチャネル（例えば、ＰＵＣＣＨ、ＰＤＣＣＨなど）および情報要素（例えば、ＴＰＣなど）は、あらゆる好適な名称によって識別できるので、これらの様々なチャネルおよび情報要素に割り当てている様々な名称は、いかなる点においても限定的なものではない。 The names used for the parameters described above are not limited in any way. Further, mathematical formulas and the like using these parameters may differ from those expressly disclosed herein. Since the various channels (eg, PUCCH, PDCCH, etc.) and information elements (eg, TPC, etc.) can be identified by any suitable name, the various names assigned to these various channels and information elements are in any respect. However, it is not limited.

ユーザ端末および移動通信端末は、当業者によって、加入者局、モバイルユニット、加入者ユニット、ワイヤレスユニット、リモートユニット、モバイルデバイス、ワイヤレスデバイス、ワイヤレス通信デバイス、リモートデバイス、モバイル加入者局、アクセス端末、モバイル端末、ワイヤレス端末、リモート端末、ハンドセット、ユーザエージェント、モバイルクライアント、クライアント、またはいくつかの他の適切な用語で呼ばれる場合もある。 User terminals and mobile communication terminals may be used by those skilled in the art as subscriber stations, mobile units, subscriber units, wireless units, remote units, mobile devices, wireless devices, wireless communication devices, remote devices, mobile subscriber stations, access terminals, etc. It may also be referred to as a mobile device, wireless device, remote device, handset, user agent, mobile client, client, or some other suitable term.

本明細書で使用する「判断（determining）」、「決定（determining）」という用語は、多種多様な動作を包含する場合がある。「判断」、「決定」は、例えば、判定（judging）、計算（calculating）、算出（computing）、処理（processing）、導出（deriving）、調査（investigating）、探索（looking up）（例えば、テーブル、テーブルまたは別のデータ構造での探索）、確認（ascertaining）した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、受信（receiving）（例えば、情報を受信すること）、送信（transmitting）（例えば、情報を送信すること）、入力（input）、出力（output）、アクセス（accessing）（例えば、メモリ中のデータにアクセスすること）した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、解決（resolving）、選択（selecting）、選定（choosing）、確立（establishing）、比較（comparing）などした事を「判断」「決定」したとみなす事を含み得る。つまり、「判断」「決定」は、何らかの動作を「判断」「決定」したとみなす事を含み得る。 As used herein, the terms "determining" and "determining" may include a wide variety of actions. “Judgment” and “decision” are, for example, judgment, calculation, computing, processing, deriving, investigating, and looking up (for example, a table). , Searching in a table or another data structure), ascertaining can be regarded as "judgment" or "decision". Also, "judgment" and "decision" are receiving (for example, receiving information), transmitting (for example, transmitting information), input (input), output (output), and access. (Accessing) (for example, accessing data in memory) may be regarded as "judgment" or "decision". In addition, "judgment" and "decision" are considered to be "judgment" and "decision" when the things such as solving, selecting, selecting, establishing, and comparing are regarded as "judgment" and "decision". Can include. That is, "judgment" and "decision" may include considering some action as "judgment" and "decision".

「接続された（connected）」、「結合された（coupled）」という用語、またはこれらのあらゆる変形は、２またはそれ以上の要素間の直接的または間接的なあらゆる接続または結合を意味し、互いに「接続」または「結合」された２つの要素間に１またはそれ以上の中間要素が存在することを含むことができる。要素間の結合または接続は、物理的なものであっても、論理的なものであっても、或いはこれらの組み合わせであってもよい。本明細書で使用する場合、２つの要素は、１またはそれ以上の電線、ケーブルおよび／またはプリント電気接続を使用することにより、並びにいくつかの非限定的かつ非包括的な例として、無線周波数領域、マイクロ波領域および光（可視および不可視の両方）領域の波長を有する電磁エネルギーなどの電磁エネルギーを使用することにより、互いに「接続」または「結合」されると考えることができる。 The terms "connected", "coupled", or any variation thereof, mean any direct or indirect connection or connection between two or more elements and each other. It can include the presence of one or more intermediate elements between two "connected" or "joined" elements. The connection or connection between the elements may be physical, logical, or a combination thereof. As used herein, the two elements are by using one or more wires, cables and / or printed electrical connections, and, as some non-limiting and non-comprehensive examples, radio frequencies. By using electromagnetic energies such as electromagnetic energies with wavelengths in the region, microwave region and light (both visible and invisible) regions, they can be considered to be "connected" or "coupled" to each other.

本明細書で使用する「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 The phrase "based on" as used herein does not mean "based on" unless otherwise stated. In other words, the statement "based on" means both "based only" and "at least based on".

本明細書で「第１の」、「第２の」などの呼称を使用した場合においては、その要素へのいかなる参照も、それらの要素の量または順序を全般的に限定するものではない。これらの呼称は、２つ以上の要素間を区別する便利な方法として本明細書で使用され得る。したがって、第１および第２の要素への参照は、２つの要素のみがそこで採用され得ること、または何らかの形で第１の要素が第２の要素に先行しなければならないことを意味しない。 As used herein by designations such as "first", "second", etc., any reference to that element does not generally limit the quantity or order of those elements. These designations can be used herein as a convenient way to distinguish between two or more elements. Thus, references to the first and second elements do not mean that only two elements can be adopted there, or that the first element must somehow precede the second element.

「含む（include）」、「含んでいる（including）」、およびそれらの変形が、本明細書あるいは特許請求の範囲で使用されている限り、これら用語は、用語「備える（comprising）」と同様に、包括的であることが意図される。さらに、本明細書あるいは特許請求の範囲において使用されている用語「または（or）」は、排他的論理和ではないことが意図される。 As long as "include", "including", and variations thereof are used herein or within the scope of the claims, these terms are similar to the term "comprising". In addition, it is intended to be inclusive. Moreover, the term "or" as used herein or in the claims is intended to be non-exclusive.

本明細書において、文脈または技術的に明らかに一つのみしか存在しない装置である場合以外は、複数の装置をも含むものとする。 In the present specification, a plurality of devices shall be included unless the device has only one device apparently in context or technically.

１０…機械学習システム、１１…予測部、１２…ニューラルネットワーク。 10 ... Machine learning system, 11 ... Predictor, 12 ... Neural network.

Claims

Using the intermediate vector obtained in the intermediate layer of the neural network and the transformation matrix A, an output vector which is a vector of the output layer of the neural network is calculated, and a prediction unit for predicting an event based on the output vector is provided. ,
The prediction unit
The matrix UΣ and the matrix V obtained by decomposing the transformation matrix A by singular values are obtained, where the matrix U and the matrix V are orthogonal matrices, and the matrix Σ is a diagonal matrix.
A temporary vector is calculated based on the intermediate vector and the matrix V,
Using the values k indicating the division positions of the matrix UΣ and the temporary vector, the prematrix defined using the first to kth columns of the matrix UΣ and the first to k of the temporary vector. Get the pre-vector defined using the elements up to the th item,
Calculate the approximation vector based on the prematrix and the prevector,
Setting the approximation vector as the output vector,
Machine learning system.

The prediction unit
The degree of divergence between the maximum element in the approximation vector and at least one other element is calculated.
When the degree of deviation is larger than the threshold value, the approximation vector is set as the output vector.
The machine learning system according to claim 1.

When the degree of deviation is equal to or less than the threshold value, the prediction unit approximates the posterior matrix R composed of the remaining columns of the matrix UΣ and the posterior vector composed of the remaining elements of the temporary vector. Calculate the output vector based on the vector,
The machine learning system according to claim 2.

The prediction unit calculates the deviation value of the maximum element of the approximation vector as the degree of deviation.
The machine learning system according to claim 2 or 3.

The prediction unit calculates the difference between the maximum element of the approximation vector and the second largest element in the approximation vector as the degree of deviation.
The machine learning system according to claim 2 or 3.

The prediction unit sets a value that is half the number of dimensions of the temporary vector as the value k.
The machine learning system according to any one of claims 1 to 5.

The prediction unit sets the column number of the last column satisfying that the diagonal component of the matrix Σ is equal to or greater than the threshold value as the value k.
The machine learning system according to any one of claims 1 to 5.

The prediction unit sets the column number of the last column satisfying that the deviation value of the diagonal component of the matrix Σ is equal to or greater than the threshold value as the value k.
The machine learning system according to any one of claims 1 to 5.