JP2019087042A

JP2019087042A - Spectrum data analyzer and program

Info

Publication number: JP2019087042A
Application number: JP2017214901A
Authority: JP
Inventors: 信明吉川; Nobuaki Yoshikawa; 俊輔山川; Shunsuke Yamakawa
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2017-11-07
Filing date: 2017-11-07
Publication date: 2019-06-06
Anticipated expiration: 2037-11-07
Also published as: JP6702289B2

Abstract

To enable returning of factorization results close to true base spectra in factorizing spectrum data expected to have high linear independence.SOLUTION: A non-negative matrix factorization unit 34 determines plural pieces of base spectrum data and activation data by searching for a divergence between a set of observed spectrum data and a set of estimated spectrum data calculated from the plural pieces of base spectrum data and the activation data, and a minimum value of a value of an objective function including a regularization term for evaluating a linear independence of the plural pieces of base spectrum data or the activation data.SELECTED DRAWING: Figure 2

Description

本発明は、スペクトルデータ解析装置及びプログラムに係り、特に、解析対象の信号について求められる観測スペクトルデータのセットを非負値行列分解するスペクトルデータ解析装置及びプログラムに関する。 The present invention relates to a spectrum data analysis apparatus and program, and more particularly to a spectrum data analysis apparatus and program for nonnegative matrix decomposition of a set of observation spectrum data obtained for a signal to be analyzed.

従来より、正則化項としてL2-normを利用してスペクトルデータを非負値行列分解する方法が知られている（非特許文献１）。 Conventionally, there is known a method of non-negative value matrix decomposition of spectral data by using L2-norm as a regularization term (Non-Patent Document 1).

また、正則化項としてL1-normを利用してスペクトルデータを非負値行列分解する方法が知られている（非特許文献２〜４）。 There is also known a method of non-negative value matrix decomposition of spectral data using L1-norm as a regularization term (Non-patent documents 2 to 4).

また、正則化項として直交性を評価する項を利用してスペクトルデータを非負値行列分解する方法が知られている（非特許文献５、特許文献１）。 There is also known a method for decomposing spectral data into a nonnegative matrix using a term for evaluating orthogonality as a regularization term (Non-Patent Document 5, Patent Document 1).

特開2011-76068号公報JP 2011-76068 A

Pauca, V. P., et al. "Nonnegative matrix factorization for spectral data analysis.“ Linear algebra and its applications 416(1): 29-47. (2006).Pauca, V.P., et al. "Non-negative matrix factorization for spectral data analysis." Linear algebra and its applications 416 (1): 29-47. (2006). Hoyer, P. O. Non-negative sparse coding. Neural Networks for Signal Processing, 2002. Proceedings of the 2002 12th IEEE Workshop on, IEEE. (2002).Hoyer, P. O. Non-negative sparse coding. Neural Networks for Signal Processing, 2002. Proceedings of the 2002 12th IEEE Workshop on, IEEE. (2002). Hoyer, P. O. "Non-negative matrix factorization with sparseness constraints.“ Journal of machine learning research 5(Nov): 1457-1469. (2004).Hoyer, P. O. "Non-negative matrix factorization with sparseness constraints." Journal of machine learning research 5 (Nov): 1457-1469. (2004). Kim, H. and H. Park “Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis.“ Bioinformatics 23(12): 1495-1502. (2007).Kim, H. and H. Park "Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares data analysis." Bioinformatics 23 (12): 1495-1502. (2007). Ding, C., et al. Orthogonal nonnegative matrix t-factorizations for clustering. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM.Proceedings of the 12th ACM. SIGKDD international conference on Knowledge discovery and data mining, ACM. Ding, C., et al.

しかしながら、上記の従来手法によるスペクトル分解では１つのピークが複数の分解結果として分解されてしまう。この結果は正則化項の性能が十分でないことを示唆している。 However, in the spectral decomposition by the above-described conventional method, one peak is decomposed as a plurality of decomposition results. This result suggests that the performance of the regularization term is not sufficient.

本発明は、上記の課題を解決するためになされたもので、一次独立性が高いと期待されるスペクトルデータの分解において真の基底スペクトルに近い分解結果を返すことができるスペクトルデータ解析装置及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and is a spectral data analysis apparatus and program capable of returning a decomposition result close to a true base spectrum in decomposition of spectral data expected to have high linear independence. Intended to provide.

上記目的を達成するために、本発明のスペクトルデータ解析装置は、解析対象の信号について求められる観測スペクトルデータのセットを非負値行列分解することにより、複数の基底スペクトルデータと、各基底スペクトルの大きさを表すアクティベーションデータとを求めるスペクトルデータ解析装置であって、観測スペクトルデータのセットと、前記複数の基底スペクトルデータ及び前記アクティベーションデータから計算される推定スペクトルデータのセットとの乖離度、並びに前記複数の基底スペクトルデータ又は前記アクティベーションデータの一次独立性を評価する正則化項を含む目的関数の値の極小値を探索することにより、前記複数の基底スペクトルデータと、前記アクティベーションデータとを求める。 In order to achieve the above object, the spectrum data analysis device of the present invention decomposes a set of observed spectrum data obtained for a signal to be analyzed into non-negative value matrices to obtain a plurality of basis spectrum data and the size of each basis spectrum. A spectral data analysis device for determining activation data representing the difference between the set of observed spectral data and the plurality of basis spectral data and the set of estimated spectral data calculated from the activation data; The plurality of basis spectrum data and the activation data are searched by searching for the local minimum value of the value of the objective function including the plurality of basis spectrum data or the regularization term for evaluating the linear independence of the activation data. Ask.

本発明のプログラムは、解析対象の信号について求められる観測スペクトルデータのセットを非負値行列分解することにより、複数の基底スペクトルデータと、各基底スペクトルの大きさを表すアクティベーションデータとを求めるためのプログラムであって、コンピュータに、観測スペクトルデータのセットと、前記複数の基底スペクトルデータ及び前記アクティベーションデータから計算される推定スペクトルデータのセットとの乖離度、並びに前記複数の基底スペクトルデータ又は前記アクティベーションデータの一次独立性を評価する正則化項を含む目的関数の値の極小値を探索することにより、前記複数の基底スペクトルデータと、前記アクティベーションデータとを求める処理を実行させるためのプログラムである。 The program according to the present invention decomposes a set of observed spectrum data obtained for a signal to be analyzed into a nonnegative matrix to obtain a plurality of basis spectrum data and activation data representing the size of each basis spectrum. A program, which causes a computer to deviate from the set of observed spectrum data and the set of estimated spectrum data calculated from the plurality of basis spectrum data and the activation data, and the plurality of basis spectrum data or the activation A program for executing processing for obtaining the plurality of basis spectrum data and the activation data by searching for the local minimum value of the value of the objective function including a regularization term that evaluates linear independence of tivation data. is there.

本発明のスペクトルデータ解析装置によれば、観測スペクトルデータのセットと、前記複数の基底スペクトルデータ及び前記アクティベーションデータから計算される推定スペクトルデータのセットとの乖離度、並びに前記複数の基底スペクトルデータ又は前記アクティベーションデータの一次独立性を評価する正則化項を含む目的関数の値の極小値を探索することにより、前記複数の基底スペクトルデータと、前記アクティベーションデータとを求める。 According to the spectrum data analysis device of the present invention, the degree of divergence between the set of observed spectrum data and the set of estimated spectrum data calculated from the plurality of basis spectrum data and the activation data, and the plurality of basis spectrum data Alternatively, the plurality of base spectrum data and the activation data are obtained by searching for the local minimum value of the value of the objective function including a regularization term for evaluating the linear independence of the activation data.

このように、観測スペクトルデータのセットと、複数の基底スペクトルデータ及びアクティベーションデータから計算される推定スペクトルデータのセットとの乖離度、並びに前記複数の基底スペクトルデータ又は前記アクティベーションデータの一次独立性を評価する正則化項を含む目的関数の値の極小値を探索することにより、前記複数の基底スペクトルデータと、前記アクティベーションデータとを求めるため、一次独立性が高いと期待されるスペクトルデータの分解において真の基底スペクトルに近い分解結果を返すことができる。 Thus, the degree of divergence between the set of observed spectrum data and the set of estimated spectrum data calculated from the plurality of basis spectrum data and the activation data, and the linear independence of the plurality of basis spectrum data or the activation data To obtain the plurality of basis spectrum data and the activation data by searching for the minimum value of the value of the objective function including the regularization term for evaluating It is possible to return decomposition results close to the true basis spectrum in decomposition.

本発明において、前記目的関数をｆとし、前記乖離度をｄとし、前記正則化項をｇとして、前記目的関数ｆは、以下の式で表わされ、前記正則化項は、det(ＡＡ^T)及びdet((ＡＡ^T)^-1)の少なくとも一方を含むことができる。 In the present invention, the objective function is f, the deviation degree is d, the regularization term is g, the objective function f is expressed by the following equation, and the regularization term is det (AA ^T And / or det ((AA ^T ) ^-1 ) can be included.

ただし、観測スペクトルデータのセットを表す行列をＹとし、複数の基底スペクトルデータを表す行列をＳとし、前記アクティベーションデータを表す行列をＣとする。 However, let Y be a matrix representing a set of observed spectrum data, S be a matrix representing a plurality of basis spectrum data, and C be a matrix representing the activation data.

また、前記正則化項は、

のとき

となり、かつ、Ａが無限大となるのを防ぐように予め定められる。 Also, the regularization term is

When

And is predetermined to prevent A from becoming infinite.

また、本発明において、前記正則化項は、以下の式で表わされる。 Further, in the present invention, the regularization term is expressed by the following equation.

ただし、ｋ₁、ｋ₂をハイパーパラメータとする。 However, let k ₁ and k ₂ be hyperparameters.

また、本発明において、乖離度は、以下の式で表わされ、目的関数の値の極小値を探索する際に、以下の式に従って、前記アクティベーションデータを表す行列の各要素Ｃ_nl及び複数の基底スペクトルデータを表す行列の各要素Ｓ_lmを更新することを繰り返す。 Further, in the present invention, the degree of divergence is expressed by the following equation, and when searching for the minimum value of the value of the objective function, each element C _{nl of the} matrix representing the activation data and a plurality thereof according to the following equation Repeat updating each element S _{lm of the} matrix representing basis spectral data of.

ただし、|| ||_Fは、Frobenius normである。 However, || || _F is Frobenius norm.

本発明において、前記観測スペクトルデータのセットを表す行列Ｙを、以下の式に従って規格化し、前記ハイパーパラメータとして、以下の式に示す値を用いることができる。 In the present invention, the matrix Y representing the set of observed spectrum data can be normalized according to the following equation, and the value shown in the following equation can be used as the hyper parameter.

上記の観測スペクトルデータは、Ｘ線回折スペクトル、中性子線回折スペクトル、質量分析スペクトル、又は振動分光スペクトルである。 The above observed spectrum data is an X-ray diffraction spectrum, a neutron diffraction spectrum, a mass analysis spectrum, or a vibration spectrum.

以上説明したように、本発明のスペクトルデータ解析装置及びプログラムによれば、一次独立性が高いと期待されるスペクトルデータの分解において真の基底スペクトルに近い分解結果を返すことができる、という効果が得られる。 As described above, according to the spectrum data analysis apparatus and program of the present invention, the decomposition result of spectrum data close to the true basis spectrum can be returned in the decomposition of spectrum data expected to have high linear independence. can get.

本発明の実施の形態に係るスペクトルデータ解析装置を示すブロック図である。It is a block diagram showing a spectrum data analysis device concerning an embodiment of the invention. 本発明の実施の形態に係るスペクトルデータ解析装置を示す機能ブロック図である。It is a functional block diagram showing a spectrum data analysis device concerning an embodiment of the invention. 本発明の実施の形態に係るスペクトルデータ解析装置におけるスペクトルデータ解析処理ルーチンの内容を示すフローチャートである。It is a flow chart which shows the contents of spectrum data analysis processing routine in the spectrum data analysis device concerning an embodiment of the invention. （ａ）真の基底スペクトルを示すグラフ、及び（ｂ）使用したデータセットを示すグラフである。(A) Graph showing the true basis spectrum, and (b) Graph showing the data set used. （ａ）真の基底スペクトルを示すグラフ、及び（ｂ）〜（ｆ）スペクトルデータの分解結果を示すグラフである。(A) A graph showing a true basis spectrum, and (b) to (f) a graph showing decomposition results of spectral data. 各正則化項を用いたときのSADの分布を示すグラフである。It is a graph which shows distribution of SAD when each regularization term is used. cosθ^indepのハイパーパラメータ依存性を示すグラフである。It is a graph which shows the hyper parameter dependence of cos ( ^{theta) indep} . 実験から得られた81 個の測定データを示すグラフである。It is a graph which shows 81 measurement data obtained from experiment. スペクトルデータの分解結果と、データベースと照合した結果を示すグラフである。It is a graph which shows the decomposition | disassembly result of spectral data, and the result collated with a database.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１に示すように、本発明の実施の形態に係るスペクトルデータ解析装置１０は、ＣＰＵ１２、ＲＯＭ１４、ＲＡＭ１６、ＨＤＤ１８、通信インタフェース２１、及びこれらを相互に接続するためのバス２２を備えている。 As shown in FIG. 1, the spectrum data analysis apparatus 10 according to the embodiment of the present invention includes a CPU 12, a ROM 14, a RAM 16, an HDD 18, a communication interface 21, and a bus 22 for connecting them.

ＣＰＵ１２は、各種プログラムを実行する。ＲＯＭ１４には、各種プログラムやパラメータ等が記憶されている。ＲＡＭ１６は、ＣＰＵ１２による各種プログラムの実行時におけるワークエリア等として用いられる。記録媒体としてのＨＤＤ１８には、後述するスペクトルデータ解析処理ルーチンを実行するためのプログラムを含む各種プログラムや各種データが記憶されている。 The CPU 12 executes various programs. The ROM 14 stores various programs, parameters, and the like. The RAM 16 is used as a work area or the like when the CPU 12 executes various programs. The HDD 18 as a recording medium stores various programs and various data including a program for executing a spectrum data analysis processing routine described later.

本実施の形態におけるスペクトルデータ解析装置１０を、スペクトルデータ解析処理ルーチンを実行するためのプログラムに沿って、機能ブロックで表すと、図２に示すようになる。スペクトルデータ解析装置１０は、入力部２０、演算部３０、及び出力部５０を備えている。 The spectral data analysis apparatus 10 according to the present embodiment is as shown in FIG. 2 in terms of functional blocks along a program for executing a spectral data analysis processing routine. The spectrum data analysis device 10 includes an input unit 20, an operation unit 30, and an output unit 50.

入力部２０は、解析対象の信号について求められた、各周波数の成分を表す観測スペクトルデータのセットを受け付ける。例えば、解析対象の信号としては、Ｘ線回折信号、中性子線回折信号、質量分析信号、又は振動分光信号であり、観測スペクトルデータのセットは、サンプル毎に得られた多数の観測スペクトルデータである。 The input unit 20 receives a set of observation spectrum data representing the component of each frequency obtained for the signal to be analyzed. For example, the signal to be analyzed is an X-ray diffraction signal, a neutron beam diffraction signal, a mass analysis signal, or a vibrational spectroscopy signal, and the set of observation spectrum data is a large number of observation spectrum data obtained for each sample. .

演算部３０は、非負行列因子分解部３４を備えている。 The operation unit 30 includes a nonnegative matrix factorization unit 34.

非負行列因子分解部３４は、解析対象の信号について求められた観測スペクトルデータのセットを表す行列Ｙを非負値行列分解することにより、複数の基底スペクトルデータを表す行列Ｓと、各サンプルの各基底スペクトルの大きさを表すアクティベーションデータを表す行列Ｃとを求める。 The non-negative matrix factorization unit 34 performs matrix non-negative matrix decomposition on a matrix Y representing a set of observed spectrum data obtained for the signal to be analyzed to obtain a matrix S representing a plurality of basis spectrum data, and each base of each sample A matrix C representing activation data representing the size of the spectrum is determined.

具体的には、観測スペクトルデータのセットを表す行列Ｙと、複数の基底スペクトルデータを表す行列Ｓ及びアクティベーションデータを表す行列Ｃから計算される推定スペクトルデータのセットとの乖離度ｄ（Ｙ，ＣＳ）、並びに複数の基底スペクトルデータを表す行列Ｓ又はアクティベーションデータを表わす行列Ｃの一次独立性を評価する正則化項ｇ（Ｃ，Ｓ）を含む目的関数ｆ（Ｙ，ＣＳ）の値の極小値を探索することにより、複数の基底スペクトルデータを表す行列Ｓと、各基底スペクトルの大きさを表すアクティベーションデータを表す行列Ｃとを求める。 Specifically, the divergence d between the matrix Y representing the set of observed spectrum data and the set of estimated spectrum data calculated from the matrix S representing a plurality of basis spectrum data and the matrix C representing the activation data CS), and a value of an objective function f (Y, CS) including a regularization term g (C, S) for evaluating linear independence of a matrix S representing a plurality of basis spectrum data or a matrix C representing activation data By searching for the minimum value, a matrix S representing a plurality of basis spectrum data and a matrix C representing activation data representing the size of each basis spectrum are obtained.

ここで、正則化項ｇ（Ｃ，Ｓ）は、以下に示す式の少なくとも一方を含む。 Here, the regularization term g (C, S) includes at least one of the following formulas.

ただし、

である。 However,

It is.

また、正則化項は、

のとき

となり、かつ、Ａが無限大となるのを防ぐように予め定められる。 Also, the regularization term is

When

And is predetermined to prevent A from becoming infinite.

例えば、正則化項は、以下の式で表わされる。 For example, the regularization term is expressed by the following equation.

乖離度ｄ（Ｙ，ＣＳ）は、以下の式で表わされ、目的関数の値の極小値を探索する際に、以下の式に従って行列Ｃの各要素Ｃ_nl、行列Ｓの各要素Ｓ_lmを更新することを繰り返す。 The degree of divergence d (Y, CS) is expressed by the following equation, and when searching for the minimum value of the value of the objective function, each element C _nl of matrix C and each element S _{lm of} matrix S according to the following equation Repeat updating.

ただし、|| ||_Fは、Frobenius normである。また、要素Ｃ_nlは、サンプルｎ及びｌ番目の基底スペクトルに対応する要素であり、要素Ｓ_lmは、ｌ番目の基底スペクトル及び周波数ｍに対応する要素である。 However, || || _F is Frobenius norm. The element C _nl is an element corresponding to the sample n and the l-th basis spectrum, and the element S _lm is an element corresponding to the l-th basis spectrum and the frequency m.

また、観測スペクトルデータのセットを表す行列Ｙを、以下の式に従って規格化し、ハイパーパラメータとして、以下の式に示す値を用いてもよい。 Also, a matrix Y representing a set of observed spectrum data may be normalized according to the following equation, and the values shown in the following equation may be used as hyper parameters.

次に、本実施の形態に係るスペクトルデータ解析装置１０の作用について説明する。まず、解析対象の信号について求められた観測スペクトルデータのセットを表す行列Ｙが入力されると、図３に示すスペクトルデータ解析処理ルーチンが実行される。 Next, the operation of the spectrum data analysis apparatus 10 according to the present embodiment will be described. First, when a matrix Y representing a set of observed spectrum data obtained for a signal to be analyzed is input, a spectrum data analysis processing routine shown in FIG. 3 is executed.

ステップＳ１０２で、複数の基底スペクトルデータを表す行列Ｓと、各サンプルの各基底スペクトルの大きさを表すアクティベーションデータを表す行列Ｃとを初期化する。 In step S102, a matrix S representing a plurality of basis spectrum data and a matrix C representing activation data representing the magnitude of each basis spectrum of each sample are initialized.

ステップＳ１０４では、解析対象の信号について求められた観測スペクトルデータのセットを表す行列Ｙ、複数の基底スペクトルデータを表す行列Ｓ、及び各サンプルの各基底スペクトルの大きさを表すアクティベーションデータを表す行列Ｃに基づいて、行列Ｃの各要素Ｃ_nl及び行列Ｓの各要素Ｓ_lmを、それぞれの更新式に従って更新する。 In step S104, a matrix Y representing a set of observed spectrum data obtained for the signal to be analyzed, a matrix S representing a plurality of base spectrum data, and a matrix representing activation data representing the size of each base spectrum of each sample Based on C, each element C _nl of the matrix C and each element S _lm of the matrix S are updated according to the respective update formulas.

次に、ステップＳ１０６では、収束条件を満たすか否かを判定する。収束条件を満たした場合には、ステップＳ１０８へ移行し、収束条件を満たしていない場合には、ステップＳ１０４へ移行し、ステップＳ１０４の処理を繰り返す。 Next, in step S106, it is determined whether the convergence condition is satisfied. If the convergence condition is satisfied, the process proceeds to step S108. If the convergence condition is not satisfied, the process proceeds to step S104, and the process of step S104 is repeated.

なお、収束条件としては、例えば、繰り返し回数が、上限回数に到達したことを用いることができる。あるいは、収束条件として、目的関数の値と前回の目的関数の値との差分が、予め定められた閾値以下であることを用いることができる。 As the convergence condition, for example, the fact that the number of repetitions has reached the upper limit number can be used. Alternatively, as the convergence condition, it can be used that the difference between the value of the objective function and the value of the previous objective function is equal to or less than a predetermined threshold.

ステップＳ１０８では、上記ステップＳ１０６で最終的に更新された、複数の基底スペクトルデータを表す行列Ｓ、及び各サンプルの各基底スペクトルの大きさを表すアクティベーションデータを表す行列Ｃを、出力部５０により出力し、スペクトルデータ解析処理ルーチンを終了する。 In step S108, a matrix S representing a plurality of basis spectrum data finally updated in the above step S106 and a matrix C representing activation data representing the magnitude of each basis spectrum of each sample are output by the output unit 50. Output and complete the spectral data analysis processing routine.

ここで、本発明の効果を説明するために行った実験の例について説明する。 Here, an example of an experiment conducted to explain the effect of the present invention will be described.

（実験例１）
正則化項として以下の５種を用いてスペクトル分解の検証を行った。 (Experimental example 1)
The spectral decomposition was verified using the following five types as regularization terms.

それぞれ上から順に、一次独立性を利用した正則化項（本発明の実施の形態）、正則化なし、L2-norm を用いた正則化項、L1-norm を用いた正則化項、直交性を用いた正則化項となる。 From top to bottom, regularization terms using linear independence (embodiments of the present invention), no regularization, regularization terms using L2-norm, regularization terms using L1-norm, orthogonality It becomes the regularization term used.

入力データのスペクトルセットＹは、乱数により作成した５種の仮想スペクトルＳ^true（図４(a)に一例を示す）を、一様乱数を用いて重ね合わせて作成した（図４(b)）。入力データのセットは100パターン作成し、これらの結果の統計処理を行うことで、上記正則化項の評価を行った。 The spectrum set Y of the input data was created by superposing five types of virtual spectra S ^true (one example is shown in FIG. 4A) created by random numbers using uniform random numbers (FIG. 4 (b)) . 100 sets of input data were created, and statistical processing of these results was performed to evaluate the regularization term.

各正則化項のハイパーパラメータは、１０⁰〜１０^-8の範囲でグリッドサーチを行い、真の基底スペクトルＳ_l ^true と分解により得られた基底スペクトルＳ_l′との間のなす角（spectral angle distance, SAD） The hyperparameter of each regularization term performs a grid search in the range of 10 ⁰ to 10 ^-8 , and the angle between the ^true basis spectrum S _l ^true and the basis spectrum S _l 'obtained by the decomposition (spectral angle distance, SAD)

の平均値が最も高くなる条件を適用した。 We applied the condition that the average value of the highest.

各正則化項を利用した場合の分解結果の一例は図５の通り。ｇ^indep (Ｓ)を用いた場合は真の基底スペクトルと分解結果がほぼ一致するのに対し、他の正則化項を用いた場合は、単一のピークが完全に一つの分解結果に分解されず、複数の分解スペクトル中に分かれて分解されてしまう。すわなち、ｇ^indep (Ｓ)を用いた場合は、各ピークがほぼ完全に抽出されるのに対し、他の正則化項を用いた場合は余分なピークが残る。 An example of the decomposition result when each regularization term is used is as shown in FIG. When g ^indep (S) is used, the true basis spectrum and the decomposition result almost agree, while when other regularization terms are used, a single peak is completely decomposed into one decomposition result. Instead, it is split into multiple split spectra. ^That is, when g ^indep (S) is used, each peak is almost completely extracted, while when other regularization terms are used, extra peaks remain.

実際、作成した100パターンのデータセットに対し、SAD値の分布（図６）をみると、ｇ^indep (Ｓ)を用いた場合はcosθ＞0.98 となる割合が79 %であるのに対し、他の正則化項の場合は 30〜60%であり、ｇ^indep(Ｓ)を用いた場合の基底スペクトルの再現性は突出している。 In fact, if you look at the distribution of SAD values (Fig. 6) for the data set of 100 patterns created, the ratio of cos θ> 0.98 is 79% when g ^indep (S) is used, while others In the case of the regularization term of 30 to 60%, the reproducibility of the basis spectrum when using g ^indep (S) is prominent.

ただし、cosθが１に近いほど分解結果のスペクトルが真の基底スペクトルに近いことを意味する。ｇ^indep (Ｓ)を用いた場合のＳＡＤは８０％程がｃｏｓθ＞０．９８となっており、基底スペクトルの再現性が突出して高い。 However, as cos θ is closer to 1, it means that the spectrum of the decomposition result is closer to the true base spectrum. When g ^indep (S) is used, about 80% of SAD is cos θ> 0.98, and the reproducibility of the base spectrum is ^remarkably high.

同様の傾向は、以下の範囲で見られた。 Similar trends were seen in the following range.

ここで、実験例１においてハイパーハイパーパラメータの値を変えたときのcosθ^indepの変化を図７に示す。本結果より、ハイパーパラメータが

となる範囲で高精度の分解が達成されていることが分かる。なお、図７では、変化させてないほうのハイパーパラメータは、グリッドサーチで得られた最適値に固定した。 Here, the change of cos θ ^indep when the value of the hyper hyper parameter is changed in the experimental example 1 is shown in FIG. From this result, hyper parameter is

It can be seen that high-resolution decomposition is achieved in the range of In FIG. 7, the hyper parameter which is not changed is fixed to the optimum value obtained by the grid search.

（実験例２）
ハイスループットＸ線測定で得られたXRDスペクトル（図８）に対して本発明の実施の形態によるスペクトル分解を行い、データベースと照合した結果を図９に示す。図９では、曲線が分解結果のスペクトル、棒グラフがデータベースの登録値を示している。図９の結果は測定結果のみから得られた分解結果であるが、これを既存のデータベースと比較することにより、系中に含まれる 3 種の結晶相（図９(b)〜(d)）を高い精度で検出することに成功している。図９(a) についてはデータベースとの合致はそれほどよくないが、組成等のメタ情報を加味すれば、同定を行うことは可能である。 (Experimental example 2)
The spectral resolution according to the embodiment of the present invention is performed on the XRD spectrum (FIG. 8) obtained by high throughput X-ray measurement, and the result of comparison with the database is shown in FIG. In FIG. 9, the curve shows the spectrum of the decomposition result, and the bar graph shows the registered value of the database. Although the result of FIG. 9 is the decomposition result obtained only from the measurement result, by comparing this with the existing database, three crystal phases contained in the system (FIG. 9 (b) to (d)) Has succeeded in detecting with high accuracy. In FIG. 9 (a), matching with the database is not so good, but identification can be made by adding meta information such as composition.

以上説明したように、本実施の形態のスペクトルデータ解析装置によれば、観測スペクトルデータのセットと、複数の基底スペクトルデータ及びアクティベーションデータから計算される推定スペクトルデータのセットとの乖離度、並びに複数の基底スペクトルデータ又はアクティベーションデータの一次独立性を評価する正則化項を含む目的関数の値の極小値を探索することにより、複数の基底スペクトルデータと、アクティベーションデータとを求めるため、一次独立性が高いと期待されるスペクトルデータの分解において真の基底スペクトルに近い分解結果を返すことができる。 As described above, according to the spectrum data analysis apparatus of the present embodiment, the degree of divergence between the set of observed spectrum data and the set of estimated spectrum data calculated from a plurality of basis spectrum data and activation data, and Since a plurality of basis spectrum data and activation data are determined by searching for local minimum values of objective function values including regularization terms for evaluating linear independence of multiple basis spectrum data or activation data, first order In the decomposition of spectral data expected to be highly independent, decomposition results close to the true basis spectrum can be returned.

また、正則化項で用いられている行列式は行列の一次独立性を評価する指標として知られている。したがってこれを正則化項に用いることで、非負行列因子分解の極小値探索過程で一次独立性の制御が可能となる。これにより、混晶系のXRD スペクトル等、単相スペクトルの一次独立性が高いことが予想される系のスペクトルデータに対して、信頼度の高い分解結果を得ることが可能となる。とくに、ハイスループット測定と組み合わせることにより高速かつ自動的なスペクトル分析が可能となる。 Also, the determinant used in the regularization term is known as an index for evaluating the linear independence of a matrix. Therefore, by using this as the regularization term, it becomes possible to control linear independence in the minimal value search process of nonnegative matrix factorization. This makes it possible to obtain highly reliable decomposition results with respect to spectral data of a system expected to have a high degree of linear independence of single-phase spectra, such as XRD spectra of mixed crystal systems. In particular, the combination with high throughput measurement enables fast and automatic spectrum analysis.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the scope of the present invention.

例えば、上述のスペクトルデータ解析装置は、解析対象の信号について求められた観測スペクトルデータのセットを入力として受け付ける場合を例に説明したが、これに限定されるものではない。解析対象の信号を入力として受け付けて、観測スペクトルデータのセットを求めるようにしてもよい。 For example, although the above-mentioned spectrum data analysis device explained as an example the case where a set of observation spectrum data obtained about a signal to be analyzed was received as an example, it is not limited to this. A signal to be analyzed may be received as an input to obtain a set of observed spectrum data.

また、上述のスペクトルデータ解析装置は、内部にコンピュータシステムを有しているが、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 Furthermore, although the above-mentioned spectrum data analysis apparatus has a computer system inside, the "computer system" also includes a homepage providing environment (or display environment) if the WWW system is used. .

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 Furthermore, although the present invention has been described as an embodiment in which the program is installed in advance, it is also possible to provide the program by storing the program in a computer readable recording medium.

１０スペクトルデータ解析装置
２０入力部
３０演算部
３４非負行列因子分解部
５０出力部 10 spectrum data analysis device 20 input unit 30 operation unit 34 nonnegative matrix factorization unit 50 output unit

Claims

A spectrum data analysis apparatus for obtaining a plurality of base spectrum data and activation data representing the size of each base spectrum by non-negative value matrix decomposition of a set of observation spectrum data obtained for a signal to be analyzed,
Evaluation of divergence between a set of observed spectrum data and a set of estimated spectrum data calculated from the plurality of basis spectrum data and the activation data, and evaluation of linear independence of the plurality of basis spectrum data or the activation data A spectrum data analysis device for obtaining the plurality of basis spectrum data and the activation data by searching for local minimum values of values of an objective function including a regularization term.

Wherein the objective function is f, the deviation degree is d, the regularization term as g, the objective function f is expressed by the following equation, the regularization term is det (AA ^T) and det ( (AA ^T) ^-1) spectral data analyzer of claim 1 further comprising at least one of.

However, let Y be a matrix representing a set of observed spectrum data, S be a matrix representing the plurality of basis spectrum data, and C be a matrix representing the activation data.

The regularization term is

When

The spectrum data analysis apparatus according to claim 2, wherein the spectrum data analysis apparatus is predetermined so as to prevent A from becoming infinite.

The spectral data analysis device according to claim 2 or 3, wherein the regularization term is expressed by the following equation.

However, let k ₁ and k ₂ be hyperparameters.

The degree of deviation is expressed by the following equation:
In searching for the minimum value of the value of the objective function, updating each element C _{nl of the} matrix representing the activation data and each element S _{lm of the} matrix representing the plurality of basis spectrum data according to the following equation The spectrum data analysis device according to any one of claims 2 to 4, which is repeated.

However, || || _F is Frobenius norm.

Normalize the matrix Y representing the set of observed spectral data according to
The spectrum data analysis device according to claim 4, wherein a value represented by the following equation is used as the hyper parameter.

The spectrum data analysis apparatus according to any one of claims 1 to 6, wherein the observed spectrum data is an X-ray diffraction spectrum, a neutron diffraction spectrum, a mass analysis spectrum, or a vibration spectrum.

A program for determining a plurality of base spectrum data and activation data representing the size of each base spectrum by non-negative value matrix decomposition of a set of observation spectrum data obtained for a signal to be analyzed,
On the computer
Evaluation of divergence between a set of observed spectrum data and a set of estimated spectrum data calculated from the plurality of basis spectrum data and the activation data, and evaluation of linear independence of the plurality of basis spectrum data or the activation data A program for executing processing for obtaining the plurality of basis spectrum data and the activation data by searching for local minimum values of values of an objective function including a regularization term.