JP3671241B2

JP3671241B2 - Nonlinear multivariate infrared analysis

Info

Publication number: JP3671241B2
Application number: JP51922496A
Authority: JP
Inventors: ペリー・ブルース・エヌ; ブラウン・ジェイムズ・エム
Original assignee: Exxon Research and Engineering Co
Current assignee: ExxonMobil Technology and Engineering Co
Priority date: 1994-12-13
Filing date: 1995-12-13
Publication date: 2005-07-13
Anticipated expiration: 2015-12-13
Also published as: JPH10512667A; EP0801737A1; WO1996018881A1; AU4468596A; CA2208216A1; AU689016B2; EP0801737A4; CA2208216C

Description

発明の背景
本発明は一般的には赤外解析を利用して材料の物理的または化学的性質を決定する方法に関し、より詳細には、材料のサンプルの注目の性質を、そのサンプルの赤外スペクトルとの非線形相関に基づいて推定する改良された方法に関する。
この方法の具体定な用途としては、赤外解析によるガソリンのオクタン価推定の改良がある。
適切なサンプル集合に対しては、オクタン価、セタン価、芳香族化合物含量などの物理的または化学的性質を、赤外スペクトルと効果的に相関づけることができる。PLS、PCRなどの線形手法やCPSA（Constrained Principal Spectra Anallysis:条件付き主スペクトル解析、J.M.Brown、米国特許第5,121,337号）などの拡張手法、ならびにDiForggioの方法（米国特許第5,397,899号）を用いると、多くの場合で有効な相関が得られる。こうした相関を求める目的は赤外アナライザーを検定し、その赤外アナライザーを利用して、その後の未知のサンプルの物理的または化学的性質を、それらサンプルの赤外スペクトルに基づいて推定できるようにすることである。こうしたアナライザーを活用する際に考慮すべき重要な点は、異常値を有するサンプル、すなわち、解析結果が推定モデルの外挿部分に相当するようなサンプルを統計的に検出するアナライザーの能力である。
用途によっては、PLS、PCR、CPSAなどの線形相関手法による検定によって、十分な正確さで物理的または化学的性質を予測することができない。検定が不正確であるということは、推定される性質がサンプルの組成に対して非線形の関係を有することを示唆することがある。こうした問題に対処するために、局所線形回帰、MARS、ニューラルネットなど種々の手法が提案されてきたが、こうした手法では一般的に多数の係数が必要となり、通常、線形手法から得られる統計的指標を提供するものではない。
性質または組成データをスペクトルデータと相関づけるのに現在利用されている検定方法は、ほとんど例外なく線形方法である。こうした方法では、性質／成分濃度がスペクトルシグナルと線形関係にあると仮定している。これらの線形方法は、性質が化学成分と非線形関係にある場合、または成分間の相互作用が非線形のスペクトルレスポンスを生じさせる場合には、不適切なものとなる。いくつかの非線形モデル化方法が文献中で検討されてきたが、これらの方法は一般的に、スペクトルデータと性質／成分濃度との非線形関係を規定する試みである。こうした非線形方法では、通常、多数の係数を決定しなければならない。多数の係数を使用するには、検定の際に、非常に大きなサンプル集合を使用する必要が生じ、データを拡大当てはめする傾向がある。また、ほとんどの非線形方法は、解析される新しいサンプルが検定の範囲外にあることを求める、すなわち異常値検出のための統計的手段を含まない。拡大あてはめにそれほど影響されず、しかも異常値検出機能を有する簡単な非線形方法が必要であった。
性質および成分濃度の推定には様々な線形検定が利用されている。例えば、Hieftje、Honigs、およびHirschfeld（米国特許第4,800,279号）は、炭化水素の物理的性質の線形評価法を記述している。LambertおよびMartens（欧州特許第0 285 251号）は、オクタン価の線形推定方法を記載している。Maggardは、オクタン価の線形推定方法（米国特許第4,963,745号）および炭化水素中の芳香族化合物の線形推定方法（米国特許第5,145,785号）を記述している。Brown（米国特許第5,121,337号）は、条件付き主スペクトル解析（CPSA）に基づく線形手法を記述し、種々の実施例を示している。
Espinosaら（欧州特許第0 305 090 B1号および同第0 304 232 A2号）は、炭化水素の物理的性質を直接決定する方法を記載している。Espinosaらは、線形項（選択された周波数における吸収）、二次項（異なる周波数における吸収の積）、および相同項（異なる周波数における吸収の商）を式に加えている。実施例に示された式には、わずかな非線形項が含まれているにすぎず、これらの二次項および相同項を、多数の可能性のある非線形項の中から任意にまたは統計的に選んだ。例えば、欧州特許第0 305 090 B1号中の16個の推奨周波数に対しては、利用可能な項として18²（324）個の二次項、その他に18×17（306）個の相同項がある。16個の周波数に対しては、646個の係数を決定またはゼロに設定して相関式を誘導する必要がある。わずか６個の周波数を対象としたより単純な実施例においてすら、線形項、二次項、および相同項として216個が利用可能であり、216個の係数を決定またはゼロに設定して相関式を誘導する必要がある。
Crawfordら（Process Control and Quality,４（1992）13−20）は、ニュールネットワークを利用して、近赤外吸収データからリサーチオクタン価を予測した。231個の波長における吸光度を、一つの隠れた層に24個のノードを有するニューラルネットワークへのインプットとして使用した。ノードバイアスを含めて、合計で24＊331＋24（5568）個の係数（ウェートおよびバイアス）を、ネットワークのトレーニングで決定した。
非線形多変量検定法に関する概説がSekulicらによって報告された（Analytical Chemistry,65（1993）835A−845A）。局所荷重回帰（LWR）、射影追跡回帰（PPR）、交代条件予測（ACE）、多変量適応スプライン（MARS）、ニューラルネットワーク、非線形主成分回帰（NLPCR）、および非線形部分最小自乗（NLPLS）が記述されている。これらの手法はすべて、本発明の非線形後処理法よりもかなり計算が困難である。
発明の要旨
本発明は、分光計に基づくアナライザーの性能を顕著に改良する方法であり、このアナライザーを使用してテストサンプルを使用し、サンプルの性質または組成データを処理や解析用途に提供する。本発明の方法は、テストサンプルのスペクトルとテスルサンプルの性質または組成データの値との非線形相関からテストサンプルの該当性質または組成データを決定する。アナライザーの検定において、本発明の方法には、次の工程が含まれる。
（１）検定サンプル集合のスペクトルを測定する工程。
（２）該検定サンプル集合の性質または組成データを測定する工程。
（３）工程（１）で得られたスペクトルと工程（２）で得られた性質または組成データとの線形相関を求める工程。
（４）工程（３）の線形相関を工程（１）で収集した検定集合のスペクトルに適用することによって、検定サンプル性質または組成データの線形推定値を求める工程。
（５）工程（２）から得られた性質もしくは組成データ、または工程（２）から得られた性質もしくは組成データと工程（４）で得られた線形推定値との差を工程（４）から得られた線形推定値の非線形関数として当てはめることによって、線形推定値に対する非線形補正を求める工程。
解析の際は、この非線形検定を使用して、次の工程によりテストサンプルの性質または組成データを求める。
（６）テストサンプルのスペクトルを測定する工程。
（７）工程（３）で求めた線形相関を該スペクトルに適用して、その性質または組成データの線形推定値を得る工程。
（８）工程（５）で求めた非線形補正を工程（７）の線形推定値に適用して、テストサンプルの性質または組成データを推定する工程。
（９）工程（８）で求めたテストサンプルの性質または組成データの推定値を出力する工程。
工程（２）から得られる性質または組成データを工程（４）から得られる線形推定値の非線形関数として直接当てはめを行うことによって工程（５）の非線形補正を計算する場合は、工程（８）におけるテストサンプルの性質または組成データの推定では、工程（７）から得られる線形推定値を工程（５）から得られる非線形補正式へ代入する。
工程（２）から得られる性質または組成データと工程（４）から得られる性質または組成データの線形推定値との差を工程（４）から得られる線形推定値の非線形関数として当てはめることによって工程（５）の非線形補正を計算する場合は、工程（８）におけるテストサンプルの性質または組成データの推定では、工程（７）から得られる線形推定値を工程（５）から得られる非線形補正式へ代入し、得られた非線形補正を工程（７）から得られる線形推定値に加えて最終推定値を得る。
工程（３）の線形補正は、スペクトルデータに由来する変数に対して参照性質データを回帰処理することによって得られる線形多変量検定からなるものである。スペクトル変数として、特定の波長における吸光度を、回帰方法として、多重線形回帰（MLR）を利用することができる。別の方法として、主成分回帰（PCR）、部分最小自乗（PLS）、または条件付き主スペクトル解析（CPSA）を利用して、スペクトルデータから変数（スコア）を抽出し、これらの変数を性質データに対して回帰処理することもできる。残差、すなわち実際の参照性質値と線形モデルによる予測値との差を、検定集合中の各サンプル毎に求める。次に、この性質値の残差を、線形予測値の非線形関数（例えば、二次または三次の関数）として当てはめを行う。別の方法として、実際の参照値を、線形予測値の非線形関数として直接当てはめを行うこともできる。
本発明の方法によれば、検定精度および分光計に基づくアナラスザーの性質が顕著に改良され、しかも線形方法の異常値検出能力が維持される。
【図面の簡単な説明】
図１は、エンジンのリサーチオクタン価（RON）の実測値を、実施例１の線形CPSA検定から得られたRONの推定値に対してプロットしたものである。丸印は、検定データ集合に属する365個のPowerformateサンプルのデータを表している。実線は、エンジンRON測定値に対するASTM95％再現性限界を、RONの線形推定値に対して計算したものである。
図２は、実施例１のデータ集合に対する残差（線形CPSA検定から得られたRON推定値−エンジン試験から得られたRON実測値）を、線形CPSA検定から得られたRON推定値に対してプロットしたものである。丸印は、検定データ集合に属する365個のPowerformateサンプルに対する残差の値を表している。実線は、残差の最良当てはめを行ったRONの線形推定値の三次多項式関数である。
図３は、エンジンのリサーチオクタン価（RON）の実測値を、実施例１の線形CPSA検定の非線形後処理から得られたRONの推定値に対してプロットしたものである。丸印は、検定データ集合に属する365個のPowerformateサンプルのデータを表している。実線は、エンジンRON測定値に対するASTM95％再現性限界を、RONの非線形推定値に対して計算したものである。
図４は、エンジンのリサーチオクタン価（RON）の実測値を、実施例２の線形CPSA検定から得られたRONの推定値に対してプロットしたものである。丸印は、検定データ集合に属する385個のブレンドガソリンサンプルのデータを表している。実線は、エンジンRON値の最良当てはめを行ったRONの線形推定値の三次多項式関数である。
図５は、実施例２のテストデータ集合に対する、エンジンのリサーチオクタン価（RON）の実測値を、線形CPSA検定から得られたRONの推定値に対してプロットしたものである。菱形印は、テストデータ集合に属する238個のブレンドガソリンサンプルのデータを表している。実線は、エンジンRON測定値に対するASTM95％再現性限界を、RONの線形推定値に対して計算したものである。
図６は、実施例２のデータ集合に対する、残差（線形CPSA検定から得られたRON推定値−エンジン試験から得られたRON実測値）を、線形CPSA検定から得られたRON推定値に対してプロットしたものである。丸印は、検定データ集合に属する385個のブレンドガソリンサンプルの残差の値を表している。実線は、残差の最良当てはめを行ったRONの線形推定値の三次多項式関数である。
図７は、実施例２のテストデータ集合に対する、残差（線形CPSA検定から得られたRON推定値−エンジン試験から得られたRON実測値）を、線形CPSA検定から得られたRONの推定値に対してプロットしたものである。丸印は、テストデータ集合に属する238個のブレンドガソリンサンプルの残差をの値を表している。実線は、RONの線形推定値の三次多項式関数を表しており、検定データ集合から得られたものである。
図８は、実施例２のテストデータ集合に対する、エンジンのリサーチオクタン価（RON）の実測値を、線形CPSA検定の非線形後処理から得られたRONの推定値に対してプロットしたものである。菱形印は、テストデータ集合に属する238個のブレンドガソリンサンプルのデータを表している。実線は、エンジンRON測定値に対するASTM95％再現性限界を、RONの非線形推定値に対して計算したものである。
図９は、実施例３のテストデータ集合に対する、エンジンのリサーチオクタン価（RON）の実測値を、線形MLR検定から得られたRONの推定値に対してプロットしたものである。丸印は、テストデータ集合に属する238個のブレンドガソリンサンプルのデータを表している。実線は、エンジンRON測定値に対するASTM95％再現性限界を、RONの線形推定値に対して計算したものである。
図10は、エンジンのリサーチオクタン価（RON）の実測値を、実施例３の線形MLR検定から得られたRONの推定値に対してプロットしたものである。四角印は、検定データ集合に属する385個のブレンドガソリンサンプルのデータを表している。実線は、エンジンRON値の最良当てはめを行ったRONの線形推定値の三次多項式関数である。
図11は、エンジンのリサーチオクタン価（RON）の実測値を、実施例３のテストデータ集合に対する線形MLR検定の非線形後処理から得られたRONの推定値に対してプロットしたものである。丸印は、テストデータ集合に属する238個のブレンドガソリンサンプルのデータを表している。実線は、エンジンRON測定に対するASTM95％再現性限界を、RONの非線形推定値に対して計算したものである。
好ましい態様の説明
線形検定法を用いて、スペクトル測定値と、化学組成、物理的性質、および性能特性との関連づけが行われてきた。線形手法では、組成または性質が既知のサンプル、すなわち、組成または性質が参照技術により既に測定されているサンプルの集合を使用して検定またはトレーニングが行われる。次に、その検定を別にテスト集合の解析に適用し、その予測結果を参照法で得られた結果と比較することによって検定の妥当性を保証することが好ましい。最後に、検定済みのアナライザーを使用して、未知物質を解析し組成または性質データを予測する。
線形検定においては、検定サンプルのスペクトルは、ｆ×ｎ次元の行列Ｘの列を形成する。ただし、ｆは、一つのスペクトルに含まれる個々のデータ点（周波数または波長）の数であり、ｎは、検定サンプルの数である。ベクトルｙがｎ個の検定サンプルに対する組成／性質データを含むとすると、線形モデルは次式をｐに関して解くことにより構築される。
ｙ＝X^tp ［１］
ただし、ｐは回帰係数を含むベクトルである。一般的にはｆ〉〉ｎであるので、式［１］を直接解くことはできない。一般的には、三つのアプローチが利用される。MLRの場合、ｋ＜ｎの条件下でＸの中からｋ個の行の場合（個々の周波数あるいは波長）を選び、Ｘをｋ個の行だけを含むより小さい行列X_kに置き換える。次に、行列X_kの擬逆行列を計算することによってｐを求める。PCRの場合、行列Ｘを三つの行列の積、すなわち、Ｕ（ｆ×ｋ次元のローディング行列）、Σ（ｋ×ｋ次元の特異値行列）、およびＶ（ｎ×ｋ行列のスコア行列）の積に分解する。
Ｘ＝ＵΣV^t ［２］
次に、性質ベクトルｙに対してスコアを回帰処理して、モデルを構築する。PLSでは、Ｘに類似の分解を施して直交行列とし、スコア行列に対してｙの回帰処理を行う。
ｘがスペクトルを含むベクトル（ｆ×１次元）であるとすると、

は線形モデルにおける性質または成分濃度の推定値であり、次式で与えられる。

この線形モデルにおける残差r₁は次式で与えられる。

線形モデルによって適切に性質ｙを推定できる場合には、残差r₁は正規分布すると予想される。モデル化の対象となる性質とサンプルの化学成分との間に非線形性従属関係があるために線形モデルが不適切な場合には、一般に残差の中に構造が観測される。この場合は、推定値を後処理することによって、より正確なモデルを得ることができる。
後処理として、二つの形式のうちの一つが利用できる。残差r₁または性質値／組成値のいずれかを、性質の線形推定値

の非線形関数として回帰処理する。

ただし、ｆ（y₁）は、性質／成分の線形推定値の非線形関数を表す。
非線形関数は、性質／成分の線形推定値の累乗で表される多項式であることが好ましい。

ｍが２の場合には後処理が二次となり、ｍが３の場合には後処理が三次となる。ｍは、残差中に存在する構造に対する後処理関数の当てはめ能力に基づいて選ばれる。
［５］または［７］を使用して、性質の線形推定値の線形関数として残差の当てはめを行う場合は、線形推定値および非線形推定残差を合計することによって成分／性質の非線形推定値が得られる。

ただし、

は［５］または［７］を性質の線形推定値に適用することによって得られる残差の非線形推定値である。［６］または［８］を使用する場合は、成分／性質の非線形推定値が直接に得られる。
スペクトル行列Ｘは、前処理してからモデル化処理にかけることができる。前処理としては、例えば、平均センタリング、ベースライン補正、数値導関数、またはベースラインおよび補正スペクトルに対する直交化（例えば、CPSAアルゴリズムを使用する）が挙げられる。
単一の検定スペクトル集合を用いて、複数の性質のモデルを開発することができ、それをそれぞれ別々に後処理することができる。
予測対象となる成分としては、個々の化学種（例えば、ベンゼンなど）、一塊りの化学種（例えば、オレフィン類、芳香族化合物類など）、物理的性質（例えば、屈折率、比重など）、化学的性質（例えば、安定性など）、または性能性質（例えば、オクタン価、セタン価など）が挙げられる。
実施例を三つ提示する。
実施例１
365個のPOWERFORMATEサンプル（改質炉生成物サンプル）のデータ集合に対して、線形回帰法である条件付き主スペクトル解析（CPSA）を使用してリサーチオクタン価（RON）の回帰処理を行った。FT−IRスペクトルは、フッ化カルシウム窓を有する公称経路長0.5mmのセルにサンプルを入れて、7000〜400cm^-1範囲にわたり解像度2cm^-1で測定した。CPSA検定では、5300.392〜3150.151cm^-1、2599.573〜2445.296cm^-1、2274.627〜1649.804cm^-1の周波数範囲にある吸光度を使用した。7000〜5300.392^-1の範囲の吸光度は弱すぎるために有意な相関をとることはできない。3150.151〜2599.573cm^-1および1649.804〜400cm^-1の周波数範囲は、FT−IR装置の動的応答範囲を超える吸光度を含むために除外する。
2445.296〜2274.627cm^-1の周波数範囲は、大気中の二酸化酸素の干渉を防ぐために除外する。CPSA検定に二組の多項式補正を利用して、ベースライン変動を補償する。第一の組で5300.392〜3150.151cm^-1の範囲を補償し、第二の組で2599.573〜1649.804cm^-1の範囲を補償する。CPSA検定ではまた、水蒸気補正を行って、装置のパージの変動が推定値に及ぼす影響を最小限に抑える。五つの条件付き主成分を使用して、RON検定を実施した。この五つの条件付き主成分の係数を、段階式回帰に基づくPRESSを用いて決定した。参照（エンジン）値に対するRONの線形予測値のプロットを図１に示す。図１のデータの推定値の標準誤差は、0.54RON値である。
RONの線形予測値の二次関数に対して、RON残差（FT−IRからのRONの線形予測値−エンジンRON）を回帰処理した。RONの線形予測値に対するRON残差のプロットを、残差に対する二次当てはめと共に図２に示す。
図３は、図２の二次補正を図１のデータに適用することによって得られたモデルの結果を示している。これは、RONの線形予測値の二次関数として参照（エンジン）RON値の当てはめを行ったものと同一である。図２における推定値の標準誤差は、0.41RON値である。
本明細書中に記載した非線形後処理法を適用すると、前に使用した線形手法よりRON推定が24％改良されるが、最初の線形相関の五つの係数の他にわずか三つの係数を追跡決定するだけでよい。
実施例２
385個のブレンドガソリンサンプルのスペクトルの検定データ集合に対して、線形回帰法である条件付き主スペクトル解析（CPSA）を使用してリサーチオクタン価（RON）の回帰処理を行った。FT−IRスペクトルは、フッ化カルシウム窓を有する公称経路長0.5mmのセルにサンプルを入れて、7000〜400cm^-1範囲にわたり解像度2cm^-1で測定した。CPSA検定では、4850.094〜3324.677cm^-1および2200.381〜1634.376cm^-1の周波数範囲にある吸光度を使用した。7000〜4850.094cm^-1の範囲の吸光度は弱すぎるために有意な相関をとることはできない。3150.151〜2400cm^-1および1634.376〜400cm^-1の周波数範囲は、FT−IR装置の動的応答範囲を超える吸光度を含むために除外する。2400〜2200.381cm^-1の周波数範囲は、大気中の二酸化酸素の干渉を防ぐために除外する。CPSA検定に二組の多孔式補正を利用して、ベースライン変動を補償する。第一の組（三次の多項式）で5300.392〜3150.151cm^-1の範囲を補償し、第二の組（二次の多孔式）で2599.573〜1649.804cm^-1の範囲を補償する。CPSA検定ではまた、水蒸気補正を行って、装置のパージの変動が推定値に及ぼす影響を最小限に抑える。14個の条件付き主成分を使用して、RON検定を実施した。この14個の条件付き主成分の係数を、段階式回帰に基づくPRESSを用いて決定した。参照（エンジン）値に対するRONの線形予測値のプロットを図４に示す。線形CPSAモデルにおける検定の標準誤差は、0.411である。
図４に示された線形モデルを、モデルの構築に使用した集合には含まれない238個のブレンドガソリンサンプル（314個の別々のエンジン測定値）の分析に適用した。これらのテストサンプルに対する線形モデルから得られた予測値を図５に示す。この線形モデルに関しては、テストサンプルに対する有効性の標準誤差は0.569であり、ASTMエンジン再現限界の範囲内で参照エンジン値と一致する予測値を有するサンプルは、全体の84％にすぎない。
RONの線形予測値の三次関数に対して、検定集合に含まれる385個のサンプルのRON残差（FT−IRからのRONの線形予測値−エンジンRON）を回帰処理した。RONの線形予測値に対するRON残差をプロットを、残差に対する三次当てはめと共に図６に示す。RONの線形推定値の三次後処理を行うと、検定の標準誤差が0.327まで減少する。
図７は、テスト集合に含まれる238個のサンプルのRON残差（FT−IRからのRONの線形予測値−エンジンRON）を、検定サンプルの当てはめから得られた係数を使用して生成させた三次曲線に対してプロットしたものである。図８は、テスト集合のエンジンRONを、RONの線形推定値の三次後処理によって推定されたRON値に対してプロットしたものを示す。三次後処理を行うと、有効性の標準誤差は0.397まで低下し、RON推定値の95％が、RONエンジンのASTM再現性の範囲内で、参照エンジン値と一致する。
非線形後処理法を適用すると、テスト集合のRON推定値が30％改良されるが、最初の線形検定で使用された係数の他にわずか四つの係数を追加して決定する必要があるだけである。D2699 RON試験などのASTM試験においては、二つの異なる実験室で二人の異なるオペレータが測定した値は、その時点での見積もり再現性95％の範囲内にあることが期待される。非線形後処理を行うと、IR RON推定値が、その時点での再現性95％の範囲内でD2699 RONの試験データと一致することから、IR推定値がエンジン測定値と等価であると言える。
実施例３
実施例２に記載した385個のブレンドガソリンサンプルのスペクトルと同一の集合を使用し、LambertおよびMartens（欧州特許第0 285 251 B1号、1991年８月28日）が提示した方法に従って多重線形回帰（MLR）モデルを構築した。LambertおよびMartensにが記載した15個の周波数に最も近い周波数（表１）における吸光度を、ベースラインポイントにおける吸光度を差し引くことにより補正し、次に、これをエンジンRON値に対して回帰処理することによって表２の係数を得た。この線形MLRモデルにおける推定の標準誤差は、0.459であった。

MLRモデルを使用して、238個のブレンドガソリンのテストサンプルのスペクトルから成る同一の集合を解析した。MLR推定値をテスト集合の314のエンジン測定値と比較した。この線形MLRモデルから得られた予測値を図９に示す。この線形MLRモデルにおいては、テストサンプルの有効性の標準誤差は0.457であり、ASTMエンジン再現限界範囲内で予測されるサンプルは、全体のわずか81％にすぎない。
385個のブレンドガソリンサンプル検定集合の場合、線形MLR推定値の三次関数としてエンジンRON値の当てはめを行った。この当てはめを、図10にグラフで示す。

238個のブレンドガソリンのテスト集合の線形MLR推定値に三次後処理を適用した。非線形後処理MLR推定値と314個の別々のエンジン測定値との比較を図11に示す。テスト集合の有効性の標準誤差は0.406であり、ASTM RON試験の再現限界範囲内で推定されるサンプルは全体の91％である。三次非線形後処理法を適用すると、線形MLR検定よりも11％改良されるが、線形MLR検定で使用された係数の他にわずか四つの係数を追加して決定する必要があるだけである。BACKGROUND OF THE INVENTION The present invention relates generally to a method for determining the physical or chemical properties of a material using infrared analysis, and more particularly to the noted properties of a sample of material, the infrared of the sample. The present invention relates to an improved method for estimating based on a non-linear correlation with a spectrum.
A specific application of this method is to improve estimation of gasoline octane number by infrared analysis.
For appropriate sample sets, physical or chemical properties such as octane number, cetane number, aromatic content, etc. can be effectively correlated with the infrared spectrum. Using linear methods such as PLS and PCR, extended methods such as CPSA (Constrained Principal Spectra Analysis: JMBrown, US Pat. No. 5,121,337), and DiForggio's method (US Pat. No. 5,397,899) In this case, an effective correlation is obtained. The purpose of these correlations is to test an infrared analyzer and use it to estimate the physical or chemical properties of subsequent unknown samples based on their infrared spectra. That is. An important point to consider when utilizing such an analyzer is the ability of the analyzer to statistically detect samples with outliers, ie samples whose analysis results correspond to the extrapolated portion of the estimation model.
Depending on the application, physical or chemical properties cannot be predicted with sufficient accuracy by tests using linear correlation techniques such as PLS, PCR, CPSA. The inaccuracy of the assay may suggest that the estimated property has a non-linear relationship to the sample composition. Various methods such as local linear regression, MARS, and neural networks have been proposed to deal with these problems, but these methods generally require a large number of coefficients, and are usually statistical indicators obtained from linear methods. Does not provide.
The assay methods currently used to correlate property or composition data with spectral data are almost always linear methods. These methods assume that the property / component concentration is linearly related to the spectral signal. These linear methods are inadequate when the properties are in a non-linear relationship with the chemical component or when the interaction between the components produces a non-linear spectral response. Several nonlinear modeling methods have been discussed in the literature, but these methods are generally attempts to define a nonlinear relationship between spectral data and property / component concentrations. Such non-linear methods usually require a large number of coefficients to be determined. Using a large number of coefficients necessitates the use of a very large sample set during the test, and tends to fit the data in an expanded manner. Most nonlinear methods also require that the new sample being analyzed is outside the scope of the test, i.e., does not include statistical means for outlier detection. There was a need for a simple non-linear method that was not significantly affected by the expansion fit and had an abnormal value detection function.
Various linear tests are used to estimate properties and component concentrations. For example, Hieftje, Honigs, and Hirschfeld (US Pat. No. 4,800,279) describe a linear evaluation method for the physical properties of hydrocarbons. Lambert and Martens (European Patent No. 0 285 251) describe a method for linear estimation of octane number. Maggard describes a linear estimation method for octane number (US Pat. No. 4,963,745) and a linear estimation method for aromatics in hydrocarbons (US Pat. No. 5,145,785). Brown (US Pat. No. 5,121,337) describes a linear approach based on conditional principal spectral analysis (CPSA) and shows various embodiments.
Espinosa et al. (European Patents 0 305 090 B1 and 0 304 232 A2) describe a method for directly determining the physical properties of hydrocarbons. Espinosa et al. Add a linear term (absorption at selected frequencies), a quadratic term (product of absorption at different frequencies), and a homologous term (absorption quotient at different frequencies) to the equation. The equations shown in the examples contain only a few nonlinear terms, and these quadratic and homologous terms are arbitrarily or statistically selected from a large number of possible nonlinear terms. It is. For example, for the 16 recommended frequencies in EP 0 305 090 B1, there are 18 ² (324) secondary terms available as well as 18 x 17 (306) homologous terms as available terms. is there. For 16 frequencies, the 646 coefficients need to be determined or set to zero to derive the correlation equation. Even in the simpler example, which covers only 6 frequencies, 216 linear terms, quadratic terms, and homologous terms are available, and the 216 coefficients are determined or set to zero to give the correlation equation Need to guide.
Crawford et al. (Process Control and Quality, 4 (1992) 13-20) predicted the research octane number from near-infrared absorption data using a neural network. Absorbance at 231 wavelengths was used as an input to a neural network with 24 nodes in one hidden layer. A total of 24 * 331 + 24 (5568) coefficients (weight and bias), including node bias, were determined by network training.
A review on non-linear multivariate testing was reported by Sekulic et al. (Analytical Chemistry, 65 (1993) 835A-845A). Describes local load regression (LWR), projective tracking regression (PPR), alternating condition prediction (ACE), multivariate adaptive spline (MARS), neural network, nonlinear principal component regression (NLPCR), and nonlinear partial least squares (NLPLS) Has been. All of these approaches are considerably more difficult to compute than the nonlinear post-processing method of the present invention.
SUMMARY OF THE INVENTION The present invention is a method for significantly improving the performance of a spectrometer-based analyzer, using the test sample to provide sample property or composition data for processing and analysis applications. The method of the present invention determines the relevant property or composition data of the test sample from a non-linear correlation between the spectrum of the test sample and the value of the property or composition data of the tesle sample. In the analysis of the analyzer, the method of the present invention includes the following steps.
(1) Measuring the spectrum of the test sample set.
(2) Measuring the property or composition data of the test sample set.
(3) A step of obtaining a linear correlation between the spectrum obtained in step (1) and the property or composition data obtained in step (2).
(4) obtaining a linear estimate of the test sample properties or composition data by applying the linear correlation of step (3) to the spectrum of the test set collected in step (1).
(5) The property or composition data obtained from step (2), or the difference between the property or composition data obtained from step (2) and the linear estimate obtained in step (4) is obtained from step (4). Obtaining a non-linear correction for the linear estimate by fitting as a non-linear function of the obtained linear estimate.
In the analysis, the non-linear test is used to obtain the property or composition data of the test sample by the following steps.
(6) A step of measuring the spectrum of the test sample.
(7) A step of applying the linear correlation obtained in step (3) to the spectrum to obtain a linear estimate of its property or composition data.
(8) A step of estimating the property or composition data of the test sample by applying the nonlinear correction obtained in the step (5) to the linear estimation value of the step (7).
(9) A step of outputting the estimated value of the property or composition data of the test sample obtained in step (8).
If the nonlinear correction of step (5) is calculated by directly fitting the property or composition data obtained from step (2) as a nonlinear function of the linear estimate obtained from step (4), then in step (8) In estimating the property or composition data of the test sample, the linear estimated value obtained from the step (7) is substituted into the nonlinear correction formula obtained from the step (5).
By fitting the difference between the property or composition data obtained from step (2) and the linear estimate of the property or composition data obtained from step (4) as a nonlinear function of the linear estimate obtained from step (4) When calculating the nonlinear correction of 5), in the estimation of the property or composition data of the test sample in the step (8), the linear estimated value obtained from the step (7) is substituted into the nonlinear correction formula obtained from the step (5). Then, the obtained nonlinear correction is added to the linear estimated value obtained from the step (7) to obtain the final estimated value.
The linear correction in step (3) consists of a linear multivariate test obtained by regression processing of reference property data with respect to variables derived from spectral data. Absorbance at a specific wavelength can be used as a spectral variable, and multiple linear regression (MLR) can be used as a regression method. Alternatively, you can use principal component regression (PCR), partial least squares (PLS), or conditional principal spectral analysis (CPSA) to extract variables (scores) from the spectrum data and use these variables as property data. It is also possible to perform regression processing on. The residual, that is, the difference between the actual reference property value and the predicted value from the linear model is determined for each sample in the test set. Next, the residual of this property value is fitted as a nonlinear function (for example, a quadratic or cubic function) of the linear prediction value. Alternatively, the actual reference value can be directly fitted as a nonlinear function of the linear prediction value.
The method of the present invention significantly improves the accuracy of the assay and the properties of the analyzer based on the spectrometer while maintaining the outlier detection capability of the linear method.
[Brief description of the drawings]
FIG. 1 is a plot of measured values of engine research octane number (RON) against estimated values of RON obtained from the linear CPSA test of Example 1. Circles represent data of 365 Powerformate samples belonging to the test data set. The solid line shows the ASTM 95% reproducibility limit for engine RON measurements calculated for a linear estimate of RON.
FIG. 2 shows the residual for the data set of Example 1 (estimated RON value obtained from linear CPSA test−actually measured RON value obtained from engine test) with respect to estimated RON value obtained from linear CPSA test. It is a plot. Circles represent residual values for 365 Powerformate samples belonging to the test data set. The solid line is the cubic polynomial function of the linear estimate of RON with the best fit of the residuals.
FIG. 3 is a plot of measured values of engine research octane number (RON) against estimated values of RON obtained from the nonlinear post-processing of the linear CPSA test of Example 1. Circles represent data of 365 Powerformate samples belonging to the test data set. The solid line shows the ASTM 95% reproducibility limit for engine RON measurements calculated for a nonlinear estimate of RON.
FIG. 4 is a plot of measured values of engine research octane number (RON) against estimated values of RON obtained from the linear CPSA test of Example 2. Circles represent data of 385 blended gasoline samples belonging to the test data set. The solid line is a cubic polynomial function of a linear estimate of RON with a best fit of engine RON values.
FIG. 5 is a plot of measured values of engine research octane number (RON) for the test data set of Example 2 against estimated values of RON obtained from the linear CPSA test. The diamonds represent the data of 238 blended gasoline samples belonging to the test data set. The solid line shows the ASTM 95% reproducibility limit for engine RON measurements calculated for a linear estimate of RON.
FIG. 6 shows the residual (RON estimated value obtained from linear CPSA test−actually measured RON value obtained from engine test) for the data set of Example 2 with respect to the estimated RON value obtained from linear CPSA test. And plotted. Circles represent the residual values of 385 blended gasoline samples belonging to the test data set. The solid line is the cubic polynomial function of the linear estimate of RON with the best fit of the residuals.
FIG. 7 shows the residual (estimated value of RON obtained from linear CPSA test−actually measured value of RON obtained from engine test) for the test data set of Example 2, and estimated value of RON obtained from linear CPSA test. Is plotted against. Circles represent the residuals of 238 blended gasoline samples belonging to the test data set. The solid line represents the cubic polynomial function of the linear estimate of RON and is obtained from the test data set.
FIG. 8 is a plot of measured values of engine research octane number (RON) for the test data set of Example 2 against estimated values of RON obtained from nonlinear post-processing of the linear CPSA test. The diamonds represent the data of 238 blended gasoline samples belonging to the test data set. The solid line shows the ASTM 95% reproducibility limit for engine RON measurements calculated for a nonlinear estimate of RON.
FIG. 9 is a plot of the measured values of the research octane number (RON) of the engine for the test data set of Example 3 against the estimated value of RON obtained from the linear MLR test. Circles represent data of 238 blended gasoline samples belonging to the test data set. The solid line shows the ASTM 95% reproducibility limit for engine RON measurements calculated for a linear estimate of RON.
FIG. 10 is a plot of measured values of engine research octane number (RON) against estimated values of RON obtained from the linear MLR test of Example 3. The square mark represents the data of 385 blended gasoline samples belonging to the test data set. The solid line is a cubic polynomial function of a linear estimate of RON with a best fit of engine RON values.
FIG. 11 is a plot of measured values of engine research octane number (RON) against estimated values of RON obtained from nonlinear post-processing of the linear MLR test for the test data set of Example 3. Circles represent data of 238 blended gasoline samples belonging to the test data set. The solid line shows the ASTM 95% reproducibility limit for engine RON measurements calculated for a nonlinear estimate of RON.
DESCRIPTION OF THE PREFERRED EMBODIMENTS A linear assay has been used to relate spectral measurements to chemical composition, physical properties, and performance characteristics. In the linear approach, testing or training is performed using a sample of known composition or property, ie, a set of samples whose composition or property has already been measured by a reference technique. Next, it is preferable to apply the test separately to the analysis of the test set and to ensure the validity of the test by comparing the prediction result with the result obtained by the reference method. Finally, using a validated analyzer, analyze unknowns and predict composition or property data.
In a linear test, the spectrum of the test sample forms a column of a matrix X of f × n dimensions. Where f is the number of individual data points (frequency or wavelength) included in one spectrum, and n is the number of test samples. Assuming that vector y contains composition / property data for n test samples, a linear model is constructed by solving
y = X ^t p [1]
Here, p is a vector including a regression coefficient. In general, since f >>>> n, equation [1] cannot be solved directly. In general, three approaches are used. In the case of MLR, the case of k rows (individual frequencies or wavelengths) is selected from X under the condition of k <n, and X is replaced with a smaller matrix X _k containing only k rows. Next, determine the p by calculating the pseudo-inverse matrix of the matrix X _k. In the case of PCR, the matrix X is the product of three matrices: U (f × k dimensional loading matrix), Σ (k × k dimensional singular value matrix), and V (score matrix of n × k matrix). Break down into products.
X = UΣV ^t [2]
Next, the score is regressed on the property vector y to construct a model. In PLS, a similar decomposition is applied to X to form an orthogonal matrix, and a regression process of y is performed on the score matrix.
If x is a vector (f × 1 dimension) containing a spectrum,

Is an estimate of the property or component concentration in the linear model and is given by:

The residual r ₁ in this linear model is given by

If the property y can be properly estimated by the linear model, the residual r ₁ is expected to be normally distributed. When a linear model is inadequate due to the nonlinear dependence between the properties to be modeled and the chemical composition of the sample, a structure is generally observed in the residual. In this case, a more accurate model can be obtained by post-processing the estimated value.
One of two forms can be used as post-processing. Either the residual r ₁ or the property / composition value is a linear estimate of the property

Regression processing as a nonlinear function.

Where f (y ₁ ) represents a nonlinear function of the linear estimate of the property / component.
The nonlinear function is preferably a polynomial expressed as a power of a linear estimate of the property / component.

When m is 2, the post-processing is secondary, and when m is 3, the post-processing is tertiary. m is chosen based on the ability of the post-processing function to fit the structures present in the residual.
When using [5] or [7] to perform a residual fit as a linear function of the linear estimate of the property, the component / property nonlinear estimate by summing the linear estimate and the nonlinear estimate residual Is obtained.

However,

Is a non-linear estimate of the residual obtained by applying [5] or [7] to a linear estimate of the property. When using [6] or [8], a nonlinear estimate of the component / property is obtained directly.
The spectral matrix X can be preprocessed and then subjected to a modeling process. Preprocessing includes, for example, average centering, baseline correction, numerical derivatives, or orthogonalization to the baseline and corrected spectrum (eg, using a CPSA algorithm).
A single set of test spectra can be used to develop models of multiple properties, each of which can be post-processed separately.
The components to be predicted include individual chemical species (for example, benzene), a group of chemical species (for example, olefins, aromatic compounds, etc.), physical properties (for example, refractive index, specific gravity, etc.), Chemical properties (eg, stability, etc.) or performance properties (eg, octane number, cetane number, etc.) can be mentioned.
Three examples are presented.
Example 1
The data of 365 POWERFORMATE samples (reformer reactor product samples) was subjected to regression of research octane number (RON) using conditional main spectrum analysis (CPSA), which is a linear regression method. FT-IR spectrum, taking samples to the cells of the nominal path length 0.5mm with calcium fluoride windows was measured at a resolution 2 cm ^-1 over 7000～400Cm ^-1 range. In the CPSA assay, absorbances in the frequency ranges of 5300.392 to 3150.151 cm ⁻¹ , 2599.573 to 2445.296 cm ⁻¹ , 2274.627 to 1649804 cm ⁻¹ were used. Absorbance in the range of 7000-5300.392 ^-1 is too weak to take a significant correlation. The frequency ranges of 3150.151-2599.573 cm ^-1 and 1649804-400 cm ^-1 are excluded because they contain absorbances that exceed the dynamic response range of the FT-IR instrument.
The frequency range from 2445.296 to 2274.627 cm ^-1 is excluded to prevent interference with atmospheric oxygen dioxide. Compensate for baseline variations using two sets of polynomial corrections for the CPSA test. The first set compensates the range of 5300.392 to 3150.151 cm ⁻¹ , and the second set compensates the range of 2599.573 to 1649804 cm ⁻¹ . The CPSA test also performs water vapor correction to minimize the impact of instrument purge fluctuations on estimates. The RON test was performed using five conditional principal components. The coefficients of these five conditional principal components were determined using PRESS based on stepwise regression. A plot of the linear predicted value of RON against the reference (engine) value is shown in FIG. The standard error of the estimated value of the data in FIG. 1 is 0.54 RON value.
The RON residual (linear prediction value of RON from FT-IR−engine RON) was subjected to regression processing on the quadratic function of the linear prediction value of RON. A plot of the RON residual against the linear predicted value of RON is shown in FIG. 2 along with a second-order fit to the residual.
FIG. 3 shows the result of the model obtained by applying the secondary correction of FIG. 2 to the data of FIG. This is the same as fitting a reference (engine) RON value as a quadratic function of the linear prediction value of RON. The standard error of the estimated value in FIG. 2 is 0.41 RON value.
Applying the non-linear post-processing method described here improves the RON estimation by 24% over the previously used linear method, but tracks and determines only three coefficients in addition to the first five linear correlation coefficients. Just do it.
Example 2
Regression of research octane number (RON) was performed on the test data set of spectra of 385 blended gasoline samples using conditional main spectrum analysis (CPSA), which is a linear regression method. FT-IR spectrum, taking samples to the cells of the nominal path length 0.5mm with calcium fluoride windows was measured at a resolution 2 cm ^-1 over 7000～400Cm ^-1 range. In the CPSA assay, absorbances in the frequency range of 4850.094-3324.677 cm ^-1 and 2200.381-1643.376 cm ^-1 were used. Absorbance in the range of 7000 to 4850.094 cm ^-1 is too weak to take a significant correlation. The frequency ranges of 3150.151 to 2400 cm ⁻¹ and 1643376 to 400 cm ⁻¹ are excluded because they contain absorbances that exceed the dynamic response range of the FT-IR instrument. The frequency range of 2400-2200.381 cm ^-1 is excluded to prevent atmospheric oxygen dioxide interference. Compensate for baseline variations using two sets of porosity corrections for CPSA testing. The first set (third order polynomial) compensates the range of 5300.392 to 3150.151 cm ⁻¹ and the second set (second order porous) compensates the range of 2599.573 to 1649804 cm ⁻¹ . The CPSA test also performs water vapor correction to minimize the impact of instrument purge fluctuations on estimates. The RON test was performed using 14 conditional principal components. The coefficients of the 14 conditional principal components were determined using PRESS based on stepwise regression. A plot of the linear predicted value of RON against the reference (engine) value is shown in FIG. The standard error of the test in the linear CPSA model is 0.411.
The linear model shown in FIG. 4 was applied to the analysis of 238 blended gasoline samples (314 separate engine measurements) not included in the set used to build the model. The predicted values obtained from the linear model for these test samples are shown in FIG. For this linear model, the standard error of validity for the test sample is 0.569, and only 84% of the samples have predicted values that match the reference engine values within the ASTM engine reproduction limits.
The RON residual of 385 samples included in the test set (linear predictive value of RON from FT-IR−engine RON) was subjected to regression processing on the cubic function of the linear predictive value of RON. A plot of the RON residual against the linear prediction of RON is shown in FIG. 6 with a third order fit to the residual. The third-order post-processing of the linear estimate of RON reduces the standard error of the test to 0.327.
FIG. 7 shows that the RON residuals of 238 samples included in the test set (linear prediction of RON from FT-IR−engine RON) were generated using the coefficients obtained from the test sample fitting. It is plotted against a cubic curve. FIG. 8 shows a plot of the test set engine RON against the RON values estimated by the cubic post-processing of the linear estimate of RON. With tertiary post-processing, the standard error in effectiveness drops to 0.397, and 95% of the RON estimate matches the reference engine value within the RON engine ASTM repeatability.
Applying non-linear post-processing improves the RON estimate of the test set by 30%, but only needs to be determined by adding only four coefficients in addition to the coefficients used in the first linear test . In ASTM tests such as the D2699 RON test, values measured by two different operators in two different laboratories are expected to be within 95% of the estimated reproducibility at that time. When non-linear post-processing is performed, the IR RON estimated value agrees with the test data of D2699 RON within the range of 95% reproducibility at that time, so it can be said that the IR estimated value is equivalent to the engine measured value.
Example 3
Using the same set of spectra of the 385 blended gasoline samples described in Example 2, multiple linear regression according to the method presented by Lambert and Martens (European Patent No. 0 285 251 B1, August 28, 1991) (MLR) model was built. Correct the absorbance at the frequency closest to the 15 frequencies (Table 1) described by Lambert and Martens by subtracting the absorbance at the baseline point, and then regress this against the engine RON value Gave the coefficients in Table 2. The standard error of estimation in this linear MLR model was 0.459.

An identical set of spectra of 238 blended gasoline test samples was analyzed using the MLR model. MLR estimates were compared with 314 engine measurements in the test set. The predicted values obtained from this linear MLR model are shown in FIG. In this linear MLR model, the standard error of test sample effectiveness is 0.457, and only 81% of the samples are predicted within the ASTM engine reproduction limit range.
In the case of a test set of 385 blended gasoline samples, the engine RON value was fitted as a cubic function of the linear MLR estimate. This fit is shown graphically in FIG.

A cubic post-processing was applied to the linear MLR estimate of a test set of 238 blended gasolines. A comparison of the non-linear post-processing MLR estimate and 314 separate engine measurements is shown in FIG. The standard error of test set effectiveness is 0.406, and 91% of the samples are estimated within the reproducibility limits of the ASTM RON test. Applying third-order nonlinear post-processing improves 11% over the linear MLR test, but only needs to be determined by adding only four coefficients in addition to the coefficients used in the linear MLR test.

Claims

A method for determining a property or composition data of a test sample from a nonlinear correlation between a spectrum of the test sample and a value of the property or composition data of the test sample,
1. measuring the spectrum of the test sample;
2. applying a linear correlation to the spectrum to obtain a linear estimate of the property or composition data;
3. applying non-linear correction to the linear estimate of step (2) to estimate test sample property or composition data;
4. outputting the estimated value of the property or composition data of the test sample obtained in step (3);
Analyzing the test sample by the linear correlation and nonlinear correction
a) measuring the spectrum of the test sample set;
b) measuring the property or composition data of the test sample set using a reference method;
c) determining a linear correlation between the spectrum obtained in step (a) and the property or composition data obtained in step (b);
(D) obtaining a linear estimate of the property or composition data of the test sample by applying the linear correlation of step (c) to the spectrum collected in step (b);
e) The property or composition data obtained from step (b), or the difference between the property or composition data obtained from step (b) and the linear estimate obtained in step (d) is obtained from step (d). Obtaining a non-linear correction for the linear estimate obtained from step (d) by fitting as a non-linear function of the obtained linear estimate;
The method as described above.

The non-linear correction in the steps (3) and (e) is the non-linear correction of the linear estimate from the step (e) using the linear estimate obtained from the step (2) for the estimation of the property or composition data of the test sample in the step (3). The claim 1 calculated by fitting the property or composition data obtained from step (b) directly as a non-linear function of the linear estimate from step (d) to include substituting into the correction equation. the method of.

In the nonlinear correlation in the steps (3) and (e), the property of the test sample in the step (3) or the estimation of the composition data is substituted with the linear estimation value obtained from the step (2) into the nonlinear correction formula from the step (e). Then, the property or composition data obtained from step (b) and the step (4) are obtained so as to obtain the final estimate by adding the obtained nonlinear correction value to the linear estimate obtained from step (2). The claim 1 calculated by applying the difference from the linear estimate of the property or composition data as a nonlinear function of the linear estimate from step (d) as a nonlinear function of the linear estimate obtained from step (d). The method described.

The method of claim 1, wherein the form of nonlinear correction is a polynomial.

The method of claim 2, wherein the form of nonlinear correction is a polynomial.

4. A method according to claim 3, wherein the form of nonlinear correction is a polynomial.

The method of claim 1, wherein the property is a research octane number.

The method according to claim 2, wherein the property is a research octane number.

The method according to claim 3, wherein the property is a research octane number.