CN109324013A - A method of it is quickly analyzed using Gaussian process regression model building oil property near-infrared - Google Patents

A method of it is quickly analyzed using Gaussian process regression model building oil property near-infrared Download PDF

Info

Publication number
CN109324013A
CN109324013A CN201811168265.7A CN201811168265A CN109324013A CN 109324013 A CN109324013 A CN 109324013A CN 201811168265 A CN201811168265 A CN 201811168265A CN 109324013 A CN109324013 A CN 109324013A
Authority
CN
China
Prior art keywords
sample
variable
spectrum
value
pca
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811168265.7A
Other languages
Chinese (zh)
Other versions
CN109324013B (en
Inventor
钱锋
钟伟民
杨明磊
杜文莉
隆建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN201811168265.7A priority Critical patent/CN109324013B/en
Publication of CN109324013A publication Critical patent/CN109324013A/en
Application granted granted Critical
Publication of CN109324013B publication Critical patent/CN109324013B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light

Abstract

The invention discloses a kind of methods quickly analyzed using Gaussian process regression model building oil property near-infrared.Building is for the property data that the method for the model of oil property prediction includes: that (1) measures crude oil sample;(2) atlas of near infrared spectra of the crude oil sample is measured;(3) it is pre-processed using subtracting the atlas of near infrared spectra that straight line obtains step (2), to eliminate background interference and baseline drift;(4) principal component analysis (PCA) is carried out to the spectrum that step (3) obtain, saves load vectors of the eigenvalue contribution rate greater than 95% and remembers Ppca;(5) P is utilizedpcaAfter carrying out dimensionality reduction to the training subset for the sample composition randomly selected, (GPR) is returned by Gaussian process and obtains submodel;(6) P is utilizedpcaAfter carrying out dimensionality reduction to the near infrared spectrum of sample to be tested, (GPR) is returned by Gaussian process and selects training set;(7) one or more wave number sections are determined according to training set, the prediction model of oil property is established using Partial Least Squares (PLS).

Description

It is a kind of to construct what oil property near-infrared was quickly analyzed using Gaussian process regression model Method
Technical field
The present invention relates to a kind of using the Gaussian process regression model method quickly analyzed of building oil property near-infrared and Using.
Background technique
Crude oil evaluation crude oil production, trade, in terms of play an important role, China is carrying out always original The related work of oil evaluation, has the standard evaluation method of complete set at present, but often analysis time is longer for these methods, Required sample size is larger, and analysis cost is high, has been unable to meet the needs of practical application.NIR technology is before most having at present One of scape and most widely used rapid analysis method.Application of the optical fiber in near-infrared spectrum technique field in recent years makes close red External spectrum technology moves towards scene from laboratory, and near-infrared spectrum technique has to electromagnetic interference insensitive, transmitted signal energy collection In, high sensitivity, it is cheap the advantages that, this allows near infrared spectrometer to carry out long distance in severe, dangerous environment From fast on-line analyzing.The complicated components of crude oil, property to be measured are more, and its Near-infrared Spectral Absorption band is wider and overlapping is tight Weight, and since near-infrared analyzer is secondary meter.Therefore, it is close for establishing the near-infrared model that precision is high, robustness is good The key that can infrared technique effectively be applied.
Original modeling method generally belongs to the scope of static models, the pretreatment of spectrum, variables choice, model foundation And the committed steps such as model modification and maintenance require to carry out offline, and remain unchanged in application process.For process industry, The continuity of production, which generally requires model, to carry out real-time tracking to field working conditions;And model and current working occur compared with When big deviation, precision of prediction are unable to satisfy the demand of on-line checking, model, which is able to carry out, timely and effectively to be updated.
Summary of the invention
In view of the above problems, it is fast using Gaussian process regression model building oil property near-infrared that the invention proposes a kind of The method of speed analysis.On the basis of this method is using near-infrared analyzer acquisition crude oil atlas of near infrared spectra, using subtracting one The method of straight line pre-processes collected crude oil sample near infrared spectrum, to eliminate interference;To pretreated spectrum Data carry out screening sample;The sample to be tested newly obtained according to each selects suitably training using GPR from sample database Collection, and modeling wave-number range is determined according to the training set, partial model is established using PLS, for predicting the category of spectrum to be measured Property value.
It is provided by the invention to include using the method quickly analyzed of Gaussian process regression model building oil property near-infrared Following steps:
Step 1: crude oil sample is collected, and measures the attribute value of all crude oil samples;
Step 2: the atlas of near infrared spectra of all samples is measured using on-line nir system;
Step 3: the crude oil near infrared spectrum obtained to step 2 pre-processes, and to pretreated spectroscopic data Carry out the screening of sample, rejecting abnormalities sample point;
Step 4: PCA principal component analysis is carried out to all samples in sample database, and selected characteristic value contribution rate is greater than 95% load vectors are denoted as Ppca, and store;
Step 5: randomly selecting n sample using the Monte Carlo methods of sampling from sample database, forms training subset A, benefit Use PpcaDimensionality reduction is carried out to this subset, and establishes Gaussian process regression model in the subset after dimensionality reduction.This step n times is repeated, most N number of training subset and submodel are obtained eventually.
Step 6: as the new spectrum x to be measured of acquisitionpWhen, utilize PpcaTo spectrum x to be measuredpDimensionality reduction is carried out, and is brought into all In GPR submodel, computation model estimate variance σ.Select the corresponding sub- training set of the smallest model of σ value as local training set S。
Step 7: determining wave-number range according to local training set S, establishes partial model on local training set using PLS, And predict spectrum x to be measuredpAttribute value.
In one or more embodiments, for constructing 20 DEG C of crude oil of density of calibration set in 0.7- in step 1 1.1g/cm3In the range of, sulfur content is in the range of 0.03%-5.50%, range of the acid value in 0.01-12.00mgKOH/g It is interior;And/or
The oil property includes that density, carbon residue, acid value, sulfur content, nitrogen content, wax content, gum level, asphalitine contain One or more of amount and true boiling-point (TBP) data.
In one or more embodiments, the step 2 includes that training set sample is placed at a temperature of 35 DEG C A certain temperature measures the near infrared spectrum data of the crude oil sample after crude oil sample temperature reaches stable state;
In one or more embodiments, in step 2, scanning range 4000-12500cm-1, scanning times 10- 100 times.
In one or more embodiments, in the step 3, the near infrared spectrum preprocess method is to utilize to subtract Removing straight line is 12500~4000cm to the wave-number range obtained to step 2-1The crude oil sample atlas of near infrared spectra in region It is pre-processed, to eliminate background interference and baseline drift;
In one or more embodiments, the step 3 includes, using principal component analysis combination Hotelling T2 The method of statistics calculates the T2 statistic for each sample that initial training is concentrated, and according to preset T2 statistic threshold value, rejects just Abnormal sample point in beginning training set, constitutes final training set;
Preferably, using the process of the method rejecting abnormalities sample point of principal component analysis combination Hotelling T2 statistics Are as follows: principal component analysis is carried out to sample spectrum first and calculates each sample then using principal component scores as characteristic variable T2 statistic rejects initial training and concentrates abnormal sample point, constitute final training according to preset T2 statistic threshold value Collection.
PCA is carried out to spectrum matrix X in step 4 and analyzes the covariance matrix X being equivalent to matrix XTX carry out feature to Amount is decomposed, and load vectors are exactly covariance matrix XTThe feature vector of X.If λ is enabled to indicate XTThe characteristic value of X, then before k it is main at The accumulation contribution rate divided can be calculated as follows:
M is the wavelength points number of spectrum.
In one or more embodiments, the Gaussian process regression model in the step 5 is as follows:
Gaussian process is the set that arbitrary finite stochastic variable all has Joint Gaussian distribution, it is completely by mean function It determines, can be denoted as with covariance function:
F (x)~GP (m (x), k (x, x '))
In view of there are in the environment of noise, real output value y is equal to the sum of observation and noise, i.e.,
Y=f (x)+ε
ε is white Gaussian noise, is distributed as follows
Wherein following form may be selected in covariance function:
Hyper parameter θ={ l, σ can be acquired by maximum likelihood functionfn}。
In formula: x, x ': arbitrary sample in training set;
Y: attribute value data;
M (x): mean function;
K (x, x '): covariance function;
In one or more embodiments, the model estimate variance σ in the step 6 determines that method is as follows:
For the spectrum x to be measured after dimensionality reductionp' corresponding to attribute forecast value ypWith attribute corresponding to sample in training set The joint prior distribution of value y is
Wherein
Kp=[k (xp,x1) k(xp,x2) … k(xp,xn)]
Kpp=k (xp,xp)
Spectrum x to be measured can be calculated by above formulap' corresponding to attribute value Posterior distrbutionp, i.e. ypEstimation mean value and side It is poor as follows:
μ=KpK-1y
σ2=Kpp-KpK-1Kp
In one or more embodiments, it is shown that the wave-number range in the step 7 determines that steps are as follows,
(a) first to each specification of variables initial weightM is total variable number;
The number of iterations t=1 ..., g are taken, is repeated the steps of:
(b) sampled probability of each variable is calculatedAnd according to sampled probability from all wave number points Middle k variable of extraction;
(c) according to k variable of selection, submodel ht is established using PLS method;
(d) spectrum matrix D is reconstructed using the obtained score matrix of PLS and loading matrix ', and the error of each variable is calculated by office ex;
In formula, exj: the mean error of j-th of variable;
K: total number of samples;
Dij: for the original number of j-th of variable of i-th of sample;
D′ij: for the reconstruct number of j-th of variable of i-th of sample;
(e) error e is calculatedy
In formula, ey: root-mean-square error ey
K: total number of samples;
yi: for the true value of i-th of sample;
For the predicted value of i-th of sample;
(f) by exWith eyIt brings into and calculates error in following formula
errt=exj+βey
In formula,The error of the t times iteration;
(g) the new weight of variable is calculated:
After right value update, into next iteration.
(h) after iteration stopping, the weight of each variable is ranked up from big to small, z variable is as last before choosing The variable used when modeling.
In one or more embodiments, the mathematical model of the step 7 is established using PLS method.
In one or more embodiments, the model established in the step 7 can be according to the spectrogram of sample to be tested Feature is adaptively suitably changed, i.e., the method for the present invention can be used according to the adaptive change training set of spectrogram to be measured and modeling Wave-number range, to obtain preferably modeling effect.
Detailed description of the invention
Fig. 1: On-line NIR analyzer detects crude oil sample near infrared spectrum and tests schematic diagram.
Fig. 2: the method for building up of the oil property adaptive prediction model based on near-infrared.
Fig. 3: original crude oil atlas of near infrared spectra.
Fig. 4: pretreated crude oil near-infrared spectrogram.
Fig. 5: PCA analysis principal component.
Fig. 6: the Hotelling T2 figure of abnormal point sample.
Fig. 7: near-infrared sulfur content in crude oil forecast of regression model effect.
Specific embodiment
Fig. 1 shows On-line NIR analyzer test sample near infrared spectrum data experimentation of the present invention.Fig. 2 is The general flow chart of the method for the present invention, specifically includes the following steps:
The first step collects crude oil sample, and measures the attribute value of all crude oil samples;
Second step measures the atlas of near infrared spectra of all samples using on-line nir system;
Third step, the crude oil near infrared spectrum obtained to second step pre-process, and to pretreated spectroscopic data Carry out the screening of sample, rejecting abnormalities sample point;
4th step carries out PCA principal component analysis to all samples in sample database, and selected characteristic value contribution rate is greater than 95% load vectors are denoted as Ppca, and store;
5th step randomly selects n sample using the Monte Carlo methods of sampling from sample database, forms training subset A, benefit Use PpcaDimensionality reduction is carried out to this subset, and establishes Gaussian process regression model in the subset after dimensionality reduction.This step n times is repeated, most N number of training subset and submodel are obtained eventually;
6th step, as the new spectrum x to be measured of acquisitionpWhen, utilize PpcaTo spectrum x to be measuredpDimensionality reduction is carried out, and is brought into all In GPR submodel, computation model estimate variance σ.Select the corresponding sub- training set of the smallest model of σ value as local training set S;
7th step, determines wave-number range according to local training set S, establishes partial model on local training set using PLS, And predict spectrum x to be measuredpAttribute value.
These steps will be hereafter described in detail.It should be understood that within the scope of the present invention, above-mentioned each skill of the invention It can be combined with each other between art feature and specifically described in below (e.g. embodiment) each technical characteristic, to constitute preferred Technical solution.
One, crude oil calibration set is constructed, the property of calibration set Crude Oil is measured
Different types of crude oil sample is collected, usually covering paraffinic base crude oil, intermediate base crude and naphthene base crude etc.. In general, collected crude oil sample quantity is no less than 200.Its near infrared light is repeatedly measured preferably for each crude oil Spectrogram and attribute value, to eliminate accidental error.
It is preferred that density (20 DEG C), sulfur content and the acid value index of collected crude oil sample control respectively 0.7~ 1.1g/cm3, 0.03%~5.50% and 0.01~12.00mgKOH/g within the scope of.Then traditional standard method is utilized The multiple attributes for measuring collected crude oil, such as density, carbon residue, nitrogen content, sulfur content, acid value, salt content, wax content, glue Matter content, asphalt content and true boiling point distillation data etc., and record data.
Two, crude oil near infrared spectrum is acquired
The offline or On-line NIR instrument of suitable types can be chosen, near infrared spectrum scanning is carried out, use will pop one's head in It is inserted directly into the measurement method that temperature maintains the crude oil sample of 35 DEG C of some steady temperatures below, keeps former in measurement process It is oily uniform, and then obtain the atlas of near infrared spectra of every part of sample.For example, crude oil sample can be placed at a temperature of 30 DEG C, and tie up It is constant to hold temperature, after crude oil sample temperature reaches stable state, measures the near infrared spectrum data of the crude oil sample.
In general, every spectrogram sweep time is 10-100 times, it is averaged.Spectral scanning range is 4000- 12500cm-1, resolution ratio 8-32cm-1.Illustrative crude oil pre-processed spectrum is shown in Fig. 3.
Three, the crude oil near infrared spectrum that step 2 obtains is pre-processed using subtracting straight line
The pretreatment includes the 12500-4000cm to every part of sample of calibration set-1Spectrum area carry out subtracting straight line and locate in advance Reason eliminates baseline drift and background interference, improves resolution ratio and sensitivity.After pretreatment, initial training collection can be established.It is exemplary Pretreated crude oil near-infrared spectrogram see Fig. 4.
Four, the method rejecting abnormalities sample point counted using principal component analysis combination Hotelling T2
The method rejecting abnormalities sample point of principal component analysis combination Hotelling T2 statistics can be used.Its basic process To carry out principal component (PCA) analysis to sample spectrum first, then using principal component scores as characteristic variable, calculating each The T2 statistic of sample rejects initial training and concentrates abnormal sample point, constitute final according to preset T2 statistic threshold value Training set.
The T2 value of samples all in sample database is compared with threshold value, rejects the sample for being greater than threshold value, establishes final instruction Practice collection.Illustrative PCA analysis principal component is as shown in Figure 5.
Five, PCA principal component analysis is carried out to all samples in sample database, and selected characteristic value contribution rate is greater than 95% Load vectors are denoted as Ppca
This step carries out PCA principal component analysis to the spectrum samples in sample database, and selected characteristic value contribution rate is greater than 95% load vectors save, and can not only reduce the memory space of sample database in this way, but also can calculate in the steps afterwards similar Calculation amount is reduced when spending index, is reduced and is calculated the time.
Six, suitable local training set is selected from sample database using Gaussian process model
N sample is randomly selected from sample database first with the Monte Carlo methods of sampling, forms training subset A, is utilized PpcaDimensionality reduction is carried out to this subset, and establishes Gaussian process regression model in the subset after dimensionality reduction.This step n times is repeated, finally Obtain N number of training subset and submodel.As the new spectrum x to be measured of acquisitionpWhen, utilize PpcaTo spectrum x to be measuredpDimensionality reduction is carried out, and It brings into all GPR submodels, computation model estimate variance σ.Select the corresponding sub- training set of the smallest model of σ value as office Portion training set S.The calculation formula of model estimate variance σ is as follows:
σ2=Kpp-KpK-1Kp
Wherein
Kp=[k (xp,x1) k(xp,x2) … k(xp,xn)]
Kpp=k (xp,xp)
Wherein
In formula: xi,xj: arbitrary sample in training set;
xp: spectrum to be measured
Y: attribute value data;
M (x): mean function;
k(xi,xj): covariance function;
Seven, wave-number range is determined according to local training set S
This step carries out wave number selection to the spectrum samples in training set.With deeply grinding to the methods of offset minimum binary Study carefully, discovery is possible to obtain preferably quantitative model by screening characteristic waves or section.It can simplify mould by wave number selection Type, and incoherent variable can be rejected by wave number selection, it is stronger to obtain predictive ability, the better model of robustness.
Shown in wave-number range in step 7 determines that steps are as follows,
(a) first to each specification of variables initial weightM is total variable number;
The number of iterations t=1 ..., g are taken, is repeated the steps of:
(b) sampled probability of each variable is calculatedAnd according to sampled probability from all wave number points Middle k variable of extraction;
(c) according to k variable of selection, submodel h is established using PLS methodt
(d) spectrum matrix D is reconstructed using the obtained score matrix of PLS and loading matrix ', and the error of each variable is calculated by office ex
In formula, exj: the mean error of j-th of variable;
K: total number of samples;
Dij: for the original number of j-th of variable of i-th of sample;
D′ij: for the reconstruct number of j-th of variable of i-th of sample;
(e) error e is calculatedy
In formula, ey: root-mean-square error ey
K: total number of samples;
yi: for the true value of i-th of sample;
For the predicted value of i-th of sample;
(f) by exWith eyIt brings into and calculates error in following formula
errt=exj+βey
In formula,The error of the t times iteration;
(g) the new weight of variable is calculated:
After right value update, into next iteration;
(h) after iteration stopping, the weight of each variable is ranked up from big to small, z variable is as last before choosing The variable used when modeling.
Eight, according to determining wave-number range, partial model is established on local training set using PLS, and predict light to be measured Compose xpAttribute value.
The model established in this step adaptively can suitably be changed according to the chromatogram characteristic of sample to be tested, i.e., originally Inventive method can be according to the wave-number range of spectrogram to be measured adaptive change training set and modeling, preferably to be modeled Effect.
The present invention is when predicting the property of crude oil sample to be measured, first using the measurement of method described in step 2 of the present invention The atlas of near infrared spectra of crude oil sample to be measured, then using method described in step 3 to the near infrared spectrum of crude oil sample to be measured Figure is pre-processed, and the projection matrix according to determined by step 4 carries out dimensionality reduction to spectrum to be measured later, and utilizes step 5 In the method mentioned determine local training set, using the wave-number range of the modeling of step 7 selection on this training set, and build Part PLS model is found to predict sample to be tested.
Beneficial effects of the present invention are as follows:
The method of the present invention test mode is simple, quick, practical, quickly measures oil property using near infrared spectrometer.With Traditional measurement method is compared, and is substantially reduced detection time, is reduced human and material resources.Without using any in test process Reagent does not damage sample to crude oil sample processing;Simultaneously this method can according to the variation change model appropriate of sample to be tested, To realize to the real-time tracking of operating condition, obtains that precision is better anticipated, reduce the cost of model maintenance.
The present invention is specifically described below by embodiment.It is necessarily pointed out that following embodiment is only used In the invention will be further described, it should not be understood as limiting the scope of the invention, professional and technical personnel in the field Some nonessential modifications and adaptations that content according to the present invention is made, still fall within protection scope of the present invention.
Embodiment 1
Include: to illustrate specific steps of the present invention with the embodiment that sulfur content is predicted below
Step 1: acquiring different types of crude oil sample 200, forms crude oil sample library.
Step 2: sample temperature is controlled at 30 DEG C, is selected Brooker near infrared spectrometer, is determined.Passing through will Probe is inserted directly into the mode of each crude oil sample, measures the near infrared spectrum of crude oil sample, and spectral region scanning range is 4000- 12500cm-1, resolution ratio 16cm-1, add up scanning times 32 times.And the sulphur of method measurement crude oil sample contains according to the traditional standard Amount.Fig. 3 is original crude oil atlas of near infrared spectra.It can be seen that the baseline drift of original spectrum is serious, peak overlap is serious.
Step 3: 4000-12500cm is chosen-1The absorbance of Spectral range carries out it to subtract straight line pretreatment, Establish crude oil sample near infrared light spectrum matrix.Fig. 4 is the spectrogram after pretreatment.
Step 4: the selection of sample is trained by the way of rejecting to pretreated crude oil sample, first to pre- After crude oil sample spectrum progress principal component analysis that treated, using principal component scores (Fig. 5) as characteristic variable, calculate each The T2 statistic of sample rejects initial training and concentrates abnormal sample point, reject according to preset T2 statistic threshold value 3.911 T2 statistic is greater than the sample of threshold value, and Rejection of samples 93,95,96,175 is in this example to reject redundant samples, remaining sample As training sample.Finally, 196 training samples is chosen to constitute crude oil spectra training sample set (Fig. 6).
Step 5: PCA principal component analysis is carried out to all samples in sample database, and selected characteristic value contribution rate is greater than 95% load vectors are denoted as Ppca
Step 6: randomly selecting 150 samples using the Monte Carlo methods of sampling from sample database, form training subset A, Utilize PpcaDimensionality reduction is carried out to this subset, and establishes Gaussian process regression model in the subset after dimensionality reduction.Repeat this step 1000 It is secondary, finally obtain 1000 training subsets and submodel.When obtaining new spectrum to be measured, P is utilizedpcaSpectrum to be measured is carried out Dimensionality reduction, and bring into all GPR submodels, computation model estimate variance σ.Select the corresponding son training of the smallest model of σ value Collection is as local training set S.
Step 7: wave-number range (about 7496cm is determined according to local training set S-1-8449cm-1、4431cm-1-4519cm-1 In range), and partial model, and the attribute value for predicting spectrum to be measured are established using PLS.
The sulfur content forecast of regression model result such as Fig. 7 built.Up to 0.9916, root-mean-square error is the coefficient of determination 0.1197.The comparison result of predicted value and actual value is as shown in table 1 below, and quickly, simply, prediction result is accurate for prediction process.
Table 1: sulfur content in crude oil predicted value with actual value Comparative result (be generally acknowledged that pre- by the prediction to sulfur content, this field Relative error is surveyed less than 10%, precision is good)
Prediction for other oil properties, modeling method is identical, and choose corresponding local training set, wave-number range with And modeling parameters just can obtain result.

Claims (10)

1. a kind of method of building for the model of oil property prediction, which is characterized in that the described method includes:
(1) property data of crude oil sample is measured;
(2) atlas of near infrared spectra of the crude oil sample is measured;
(3) it is pre-processed using subtracting the atlas of near infrared spectra that straight line obtains step (2), to eliminate background interference With baseline drift;
(4) principal component analysis (PCA) is carried out to the spectrum that step (3) obtain, saves the load that eigenvalue contribution rate is greater than 95% Vector remembers Ppca
(5) P is utilizedpcaAfter carrying out dimensionality reduction to the training subset for the sample composition randomly selected, (GPR) is returned by Gaussian process Obtain submodel;
(6) P is utilizedpcaAfter carrying out dimensionality reduction to the near infrared spectrum of sample to be tested, (GPR) selection training is returned by Gaussian process Collection;
(7) one or more wave number sections are determined according to training set, the prediction of oil property is established using Partial Least Squares (PLS) Model.
2. the method as described in claim 1, which is characterized in that
The oil property is selected from: density, carbon residue, acid value, sulfur content, nitrogen content, wax content, gum level, asphalt content One or more of with true boiling-point (TBP) data;
The quantity of step (1) Crude Oil sample is no less than 200 parts;Crude oil sample is acquired using offline or on-line nir system Near infrared spectrum data.
3. method according to claim 1 or 2, which is characterized in that in measurement described in step (2), spectral scanning range is 4000-12500cm-1, resolution ratio 2-32cm-1, multiple scanning 10-100 times takes average near infrared light spectrum.
4. method as claimed in any one of claims 1-3, which is characterized in that step (3) includes, using principal component analysis knot The method for closing Hotelling T2 statistics, calculates the T2 statistic of each sample in initial sample database, is united according to preset T2 Threshold value is measured, sample point abnormal in initial sample database is rejected;
Preferably, using the process of the method rejecting abnormalities sample point of principal component analysis combination Hotelling T2 statistics are as follows: first Principal component analysis first is carried out to sample spectrum, then using principal component scores as characteristic variable, calculates the T2 system of each sample Metering, according to preset T2 statistic threshold value, abnormal sample point in Rejection of samples library.
It is highly preferred that the description formula of T2 statistics is as follows:
In formula, t is variable of the original spectrum matrix X after PCA dimensionality reduction, and σ is the standard deviation of t, and Iter is the principal component extracted Number;Since the T2 value of exceptional sample can be far longer than normal sample, so calculating the T2 of the spectrum samples in all sample databases Value, and using 99% confidence interval as upper threshold, according to the following formula, and F distribution table is looked into, threshold value is calculated,
The T2 value of samples all in sample database is compared with threshold value, rejects the sample for being greater than threshold value.
5. such as method of any of claims 1-4, which is characterized in that step (5) utilizes the Monte Carlo methods of sampling Randomly drawing sample forms training subset, and Gaussian process regression model is established in the subset after dimensionality reduction;
Preferably, step (5) n times are repeated, N number of training subset and submodel are obtained;It is highly preferred that N is 200-5000.
6. method as claimed in claim 5, which is characterized in that the Gaussian process regression model in step (5) is as follows:
Gaussian process is the set that arbitrary finite stochastic variable all has Joint Gaussian distribution, it is completely by mean function and association Variance function determines, can be denoted as:
F (x)~GP (m (x), k (x, x '))
In view of there are in the environment of noise, real output value y is equal to the sum of observation and noise, i.e.,
Y=f (x)+ε
ε is white Gaussian noise, is distributed as follows
Wherein following form may be selected in covariance function:
Hyper parameter θ={ l, σ can be acquired by maximum likelihood functionfn}。
In formula: x, x ': arbitrary sample in training set;
Y: attribute value data;
M (x): mean function;
K (x, x '): covariance function.
7. such as method of any of claims 1-6, which is characterized in that step (6) utilizes PpcaTo the close of sample to be tested Infrared spectroscopy is brought into all Gaussian processes recurrence (GPR) submodels after carrying out dimensionality reduction, and computation model estimate variance σ selects σ value The corresponding sub- training set of the smallest model is as local training set.
8. the method for claim 7, which is characterized in that the determination method of the model estimate variance σ in the step (6) It is as follows:
For the spectrum x to be measured after dimensionality reductionp' corresponding to attribute forecast value ypWith attribute value y corresponding to sample in training set Joint prior distribution is
Wherein
Kp=[k (xp,x1) k(xp,x2) … k(xp,xn)]
Kpp=k (xp,xp)
Spectrum x to be measured can be calculated by above formulap' corresponding to attribute value Posterior distrbutionp, i.e. ypEstimation mean value and variance such as Shown in lower:
μ=KpK-1y
σ2=Kpp-KpK-1Kp
9. such as method of any of claims 1-8, which is characterized in that the wave number section in the step (7) determines step It is rapid as follows,
(a) first to each specification of variables initial weightM is total variable number;
The number of iterations t=1 ..., g are taken, is repeated the steps of:
(b) sampled probability of each variable is calculatedAnd it is taken out from all wave number points according to sampled probability Take k variable;
(c) according to k variable of selection, submodel h is established using PLS methodt
(d) spectrum matrix D is reconstructed using the obtained score matrix of PLS and loading matrix ', and the error e of each variable is calculated by officex
In formula, exj: the mean error of j-th of variable;
K: total number of samples;
Dij: for the original number of j-th of variable of i-th of sample;
D′ij: for the reconstruct number of j-th of variable of i-th of sample;
(e) error e is calculatedy
In formula, ey: root-mean-square error ey
K: total number of samples;
yi: for the true value of i-th of sample;
For the predicted value of i-th of sample;
(f) by exWith eyIt brings into and calculates error in following formula
errt=exj+βey
In formula,The error of the t times iteration;
(g) the new weight of variable is calculated:
After right value update, into next iteration;
(h) after iteration stopping, the weight of each variable is ranked up from big to small, z variable is as last modeling before choosing When the variable that uses.
10. a kind of oil property prediction technique based near infrared spectrum detection, which is characterized in that the described method includes:
(i) atlas of near infrared spectra of crude oil to be detected is measured;
(ii) model pair for oil property prediction constructed using the method for any of claims 1-11 The oil property is predicted.
CN201811168265.7A 2018-10-08 2018-10-08 Near-infrared rapid analysis method for constructing crude oil property by using Gaussian process regression model Active CN109324013B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811168265.7A CN109324013B (en) 2018-10-08 2018-10-08 Near-infrared rapid analysis method for constructing crude oil property by using Gaussian process regression model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811168265.7A CN109324013B (en) 2018-10-08 2018-10-08 Near-infrared rapid analysis method for constructing crude oil property by using Gaussian process regression model

Publications (2)

Publication Number Publication Date
CN109324013A true CN109324013A (en) 2019-02-12
CN109324013B CN109324013B (en) 2021-09-24

Family

ID=65261570

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811168265.7A Active CN109324013B (en) 2018-10-08 2018-10-08 Near-infrared rapid analysis method for constructing crude oil property by using Gaussian process regression model

Country Status (1)

Country Link
CN (1) CN109324013B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111238997A (en) * 2020-02-12 2020-06-05 江南大学 On-line measurement method for feed density in crude oil desalting and dewatering process
CN113077006A (en) * 2021-04-15 2021-07-06 天津大学 Model training method and analysis method for analyzing quality of bio-oil
CN113125377A (en) * 2021-03-30 2021-07-16 武汉理工大学 Method and device for detecting diesel oil property based on near infrared spectrum
CN113239621A (en) * 2021-05-11 2021-08-10 西南石油大学 PVT (physical vapor transport) measurement method based on elastic network regression algorithm
CN113569951A (en) * 2021-07-29 2021-10-29 山东科技大学 Method for constructing near-infrared quantitative analysis model based on generation countermeasure network
CN113702328A (en) * 2021-08-20 2021-11-26 广东省惠州市石油产品质量监督检验中心 Method, device, equipment and storage medium for analyzing properties of product oil
CN117451691A (en) * 2023-12-21 2024-01-26 浙江恒逸石化有限公司 Method for pre-judging yarn dyeing property

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102265227A (en) * 2008-10-20 2011-11-30 西门子公司 Method and apparatus for creating state estimation models in machine condition monitoring
CN103345593A (en) * 2013-07-31 2013-10-09 哈尔滨工业大学 Gathering abnormity detection method for single sensor data flow
CN103425888A (en) * 2013-08-22 2013-12-04 重庆大学 Metal tube agentia compacting method based on compaction density prediction
CN105447840A (en) * 2015-12-09 2016-03-30 西安电子科技大学 Image super-resolution method based on active sampling and Gaussian process regression
CN105701572A (en) * 2016-01-13 2016-06-22 国网甘肃省电力公司电力科学研究院 Photovoltaic short-term output prediction method based on improved Gaussian process regression
CN105699319A (en) * 2016-01-28 2016-06-22 山西汾西矿业(集团)有限责任公司 Near infrared spectrum quick detection method for total moisture of coal based on gaussian process
US9658104B2 (en) * 2010-04-05 2017-05-23 Chemimage Corporation System and method for detecting unknown materials using short wave infrared hyperspectral imaging
CN106951695A (en) * 2017-03-09 2017-07-14 杭州安脉盛智能技术有限公司 Plant equipment remaining life computational methods and system under multi-state
CN107918709A (en) * 2017-11-17 2018-04-17 浙江工业大学 A kind of Forecasting Methodology of multiphase mixing transmission pump check valve transient state Lift

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102265227A (en) * 2008-10-20 2011-11-30 西门子公司 Method and apparatus for creating state estimation models in machine condition monitoring
US9658104B2 (en) * 2010-04-05 2017-05-23 Chemimage Corporation System and method for detecting unknown materials using short wave infrared hyperspectral imaging
CN103345593A (en) * 2013-07-31 2013-10-09 哈尔滨工业大学 Gathering abnormity detection method for single sensor data flow
CN103425888A (en) * 2013-08-22 2013-12-04 重庆大学 Metal tube agentia compacting method based on compaction density prediction
CN105447840A (en) * 2015-12-09 2016-03-30 西安电子科技大学 Image super-resolution method based on active sampling and Gaussian process regression
CN105701572A (en) * 2016-01-13 2016-06-22 国网甘肃省电力公司电力科学研究院 Photovoltaic short-term output prediction method based on improved Gaussian process regression
CN105699319A (en) * 2016-01-28 2016-06-22 山西汾西矿业(集团)有限责任公司 Near infrared spectrum quick detection method for total moisture of coal based on gaussian process
CN106951695A (en) * 2017-03-09 2017-07-14 杭州安脉盛智能技术有限公司 Plant equipment remaining life computational methods and system under multi-state
CN107918709A (en) * 2017-11-17 2018-04-17 浙江工业大学 A kind of Forecasting Methodology of multiphase mixing transmission pump check valve transient state Lift

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ARJAN GIJSBERTS ET AL.: "Real-time model learning using incremental sparse spectrum gaussian process regression", 《NEURAL NETWORKS》 *
JAMES E.BARRETT ET AL.: "Covariate dimension reduction for survival data via the gaussian process latent variable model", 《STATISTICS IN MEDICINE》 *
何志昆等: "高斯过程回归方法综述", 《控制与决策》 *
高阳: "高光谱数据降维算法研究", 《博士学位论文全文库》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111238997A (en) * 2020-02-12 2020-06-05 江南大学 On-line measurement method for feed density in crude oil desalting and dewatering process
CN111238997B (en) * 2020-02-12 2021-07-27 江南大学 On-line measurement method for feed density in crude oil desalting and dewatering process
CN113125377A (en) * 2021-03-30 2021-07-16 武汉理工大学 Method and device for detecting diesel oil property based on near infrared spectrum
CN113125377B (en) * 2021-03-30 2024-02-23 武汉理工大学 Method and device for detecting property of diesel based on near infrared spectrum
CN113077006A (en) * 2021-04-15 2021-07-06 天津大学 Model training method and analysis method for analyzing quality of bio-oil
CN113239621A (en) * 2021-05-11 2021-08-10 西南石油大学 PVT (physical vapor transport) measurement method based on elastic network regression algorithm
CN113239621B (en) * 2021-05-11 2022-07-12 西南石油大学 PVT (Voltage-volume-temperature) measurement method based on elastic network regression algorithm
CN113569951A (en) * 2021-07-29 2021-10-29 山东科技大学 Method for constructing near-infrared quantitative analysis model based on generation countermeasure network
CN113569951B (en) * 2021-07-29 2023-11-07 山东科技大学 Near infrared quantitative analysis model construction method based on generation countermeasure network
CN113702328A (en) * 2021-08-20 2021-11-26 广东省惠州市石油产品质量监督检验中心 Method, device, equipment and storage medium for analyzing properties of product oil
CN117451691A (en) * 2023-12-21 2024-01-26 浙江恒逸石化有限公司 Method for pre-judging yarn dyeing property
CN117451691B (en) * 2023-12-21 2024-04-02 浙江恒逸石化有限公司 Method for pre-judging yarn dyeing property

Also Published As

Publication number Publication date
CN109324013B (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN109324013A (en) A method of it is quickly analyzed using Gaussian process regression model building oil property near-infrared
CN105300923B (en) Without measuring point model of temperature compensation modification method during a kind of near-infrared spectrometers application on site
CN104062257B (en) A kind of based on the method for general flavone content near infrared ray solution
CN101995389B (en) Method for fast recognition of crude oil variety through near infrared spectrum
CN109324014A (en) A kind of adaptive oil property near-infrared method for quick predicting
CN104062256B (en) A kind of flexible measurement method based near infrared spectrum
CN105388123B (en) A kind of method by near infrared spectrum prediction oil property
CN105424641B (en) A kind of near infrared spectrum recognition methods of crude oil type
CN107817223A (en) The construction method of quick nondestructive real-time estimate oil property model and its application
CN107703097B (en) Method for constructing model for rapidly predicting crude oil property by using near-infrared spectrometer
CN109669023A (en) A kind of soil attribute prediction technique based on Multi-sensor Fusion
CN105466884B (en) It is a kind of by near infrared light spectrum discrimination crude oil species and its method for property
CN107958267B (en) Oil product property prediction method based on spectral linear representation
CN102841069B (en) Method for rapidly identifying types of crude oil by using mid-infrared spectrum
CN108875118B (en) Method and device for evaluating accuracy of prediction model of silicon content of blast furnace molten iron
CN105334185A (en) Spectrum projection discrimination-based near infrared model maintenance method
CN104062259A (en) Method for rapid determination of total saponin content in compound ass-hide glue pulp by near infrared spectroscopy
CN104062258A (en) Method for rapid determination of soluble solids in compound ass-hide glue pulp by near infrared spectroscopy
CN107860743A (en) Utilize the method and its application of the model of reflective near infrared fibre-optical probe structure fast prediction oil property
CN115993344A (en) Quality monitoring and analyzing system and method for near infrared spectrum analyzer
CN108663334B (en) Method for searching spectral characteristic wavelength of soil nutrient based on multi-classifier fusion
CN109283153A (en) A kind of method for building up of soy sauce Quantitative Analysis Model
CN105954228A (en) Method for measuring content of sodium metal in oil sand based on near infrared spectrum
CN108693139A (en) The near infrared prediction model method for building up of electronics tobacco tar physical and chemical index and application
CN106485049B (en) A kind of detection method of the NIRS exceptional sample based on Monte Carlo cross validation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant