CN109324013A - A method of it is quickly analyzed using Gaussian process regression model building oil property near-infrared - Google Patents
A method of it is quickly analyzed using Gaussian process regression model building oil property near-infrared Download PDFInfo
- Publication number
- CN109324013A CN109324013A CN201811168265.7A CN201811168265A CN109324013A CN 109324013 A CN109324013 A CN 109324013A CN 201811168265 A CN201811168265 A CN 201811168265A CN 109324013 A CN109324013 A CN 109324013A
- Authority
- CN
- China
- Prior art keywords
- sample
- variable
- spectrum
- value
- pca
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 85
- 230000008569 process Effects 0.000 title claims abstract description 32
- 238000012549 training Methods 0.000 claims abstract description 60
- 239000010779 crude oil Substances 0.000 claims abstract description 57
- 238000001228 spectrum Methods 0.000 claims abstract description 42
- 238000002329 infrared spectrum Methods 0.000 claims abstract description 32
- 238000000513 principal component analysis Methods 0.000 claims abstract description 21
- 230000009467 reduction Effects 0.000 claims abstract description 20
- 239000003921 oil Substances 0.000 claims abstract description 18
- 239000013598 vector Substances 0.000 claims abstract description 9
- 239000011159 matrix material Substances 0.000 claims description 16
- NINIDFKCEFEMDL-UHFFFAOYSA-N Sulfur Chemical compound [S] NINIDFKCEFEMDL-UHFFFAOYSA-N 0.000 claims description 11
- 229910052717 sulfur Inorganic materials 0.000 claims description 10
- 239000011593 sulfur Substances 0.000 claims description 10
- 230000002159 abnormal effect Effects 0.000 claims description 7
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 claims description 6
- 230000005856 abnormality Effects 0.000 claims description 6
- 238000000342 Monte Carlo simulation Methods 0.000 claims description 5
- 239000002253 acid Substances 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 5
- 230000003595 spectral effect Effects 0.000 claims description 5
- 238000005259 measurement Methods 0.000 claims description 4
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical group [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 claims description 3
- 229910052757 nitrogen Inorganic materials 0.000 claims description 3
- 238000007476 Maximum Likelihood Methods 0.000 claims description 2
- 239000010426 asphalt Substances 0.000 claims description 2
- 238000001514 detection method Methods 0.000 claims description 2
- 238000004566 IR spectroscopy Methods 0.000 claims 1
- 239000000523 sample Substances 0.000 description 91
- 230000006870 function Effects 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 5
- 238000012216 screening Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 239000012141 concentrate Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000000691 measurement method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000004611 spectroscopical analysis Methods 0.000 description 2
- UFWIBTONFRDIAS-UHFFFAOYSA-N Naphthalene Chemical compound C1=CC=CC2=CC=CC=C21 UFWIBTONFRDIAS-UHFFFAOYSA-N 0.000 description 1
- 239000005864 Sulphur Substances 0.000 description 1
- 238000002835 absorbance Methods 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000009835 boiling Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000004821 distillation Methods 0.000 description 1
- 239000003292 glue Substances 0.000 description 1
- 238000000227 grinding Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
- G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
- G01N21/359—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
Abstract
The invention discloses a kind of methods quickly analyzed using Gaussian process regression model building oil property near-infrared.Building is for the property data that the method for the model of oil property prediction includes: that (1) measures crude oil sample;(2) atlas of near infrared spectra of the crude oil sample is measured;(3) it is pre-processed using subtracting the atlas of near infrared spectra that straight line obtains step (2), to eliminate background interference and baseline drift;(4) principal component analysis (PCA) is carried out to the spectrum that step (3) obtain, saves load vectors of the eigenvalue contribution rate greater than 95% and remembers Ppca;(5) P is utilizedpcaAfter carrying out dimensionality reduction to the training subset for the sample composition randomly selected, (GPR) is returned by Gaussian process and obtains submodel;(6) P is utilizedpcaAfter carrying out dimensionality reduction to the near infrared spectrum of sample to be tested, (GPR) is returned by Gaussian process and selects training set;(7) one or more wave number sections are determined according to training set, the prediction model of oil property is established using Partial Least Squares (PLS).
Description
Technical field
The present invention relates to a kind of using the Gaussian process regression model method quickly analyzed of building oil property near-infrared and
Using.
Background technique
Crude oil evaluation crude oil production, trade, in terms of play an important role, China is carrying out always original
The related work of oil evaluation, has the standard evaluation method of complete set at present, but often analysis time is longer for these methods,
Required sample size is larger, and analysis cost is high, has been unable to meet the needs of practical application.NIR technology is before most having at present
One of scape and most widely used rapid analysis method.Application of the optical fiber in near-infrared spectrum technique field in recent years makes close red
External spectrum technology moves towards scene from laboratory, and near-infrared spectrum technique has to electromagnetic interference insensitive, transmitted signal energy collection
In, high sensitivity, it is cheap the advantages that, this allows near infrared spectrometer to carry out long distance in severe, dangerous environment
From fast on-line analyzing.The complicated components of crude oil, property to be measured are more, and its Near-infrared Spectral Absorption band is wider and overlapping is tight
Weight, and since near-infrared analyzer is secondary meter.Therefore, it is close for establishing the near-infrared model that precision is high, robustness is good
The key that can infrared technique effectively be applied.
Original modeling method generally belongs to the scope of static models, the pretreatment of spectrum, variables choice, model foundation
And the committed steps such as model modification and maintenance require to carry out offline, and remain unchanged in application process.For process industry,
The continuity of production, which generally requires model, to carry out real-time tracking to field working conditions;And model and current working occur compared with
When big deviation, precision of prediction are unable to satisfy the demand of on-line checking, model, which is able to carry out, timely and effectively to be updated.
Summary of the invention
In view of the above problems, it is fast using Gaussian process regression model building oil property near-infrared that the invention proposes a kind of
The method of speed analysis.On the basis of this method is using near-infrared analyzer acquisition crude oil atlas of near infrared spectra, using subtracting one
The method of straight line pre-processes collected crude oil sample near infrared spectrum, to eliminate interference;To pretreated spectrum
Data carry out screening sample;The sample to be tested newly obtained according to each selects suitably training using GPR from sample database
Collection, and modeling wave-number range is determined according to the training set, partial model is established using PLS, for predicting the category of spectrum to be measured
Property value.
It is provided by the invention to include using the method quickly analyzed of Gaussian process regression model building oil property near-infrared
Following steps:
Step 1: crude oil sample is collected, and measures the attribute value of all crude oil samples;
Step 2: the atlas of near infrared spectra of all samples is measured using on-line nir system;
Step 3: the crude oil near infrared spectrum obtained to step 2 pre-processes, and to pretreated spectroscopic data
Carry out the screening of sample, rejecting abnormalities sample point;
Step 4: PCA principal component analysis is carried out to all samples in sample database, and selected characteristic value contribution rate is greater than
95% load vectors are denoted as Ppca, and store;
Step 5: randomly selecting n sample using the Monte Carlo methods of sampling from sample database, forms training subset A, benefit
Use PpcaDimensionality reduction is carried out to this subset, and establishes Gaussian process regression model in the subset after dimensionality reduction.This step n times is repeated, most
N number of training subset and submodel are obtained eventually.
Step 6: as the new spectrum x to be measured of acquisitionpWhen, utilize PpcaTo spectrum x to be measuredpDimensionality reduction is carried out, and is brought into all
In GPR submodel, computation model estimate variance σ.Select the corresponding sub- training set of the smallest model of σ value as local training set
S。
Step 7: determining wave-number range according to local training set S, establishes partial model on local training set using PLS,
And predict spectrum x to be measuredpAttribute value.
In one or more embodiments, for constructing 20 DEG C of crude oil of density of calibration set in 0.7- in step 1
1.1g/cm3In the range of, sulfur content is in the range of 0.03%-5.50%, range of the acid value in 0.01-12.00mgKOH/g
It is interior;And/or
The oil property includes that density, carbon residue, acid value, sulfur content, nitrogen content, wax content, gum level, asphalitine contain
One or more of amount and true boiling-point (TBP) data.
In one or more embodiments, the step 2 includes that training set sample is placed at a temperature of 35 DEG C
A certain temperature measures the near infrared spectrum data of the crude oil sample after crude oil sample temperature reaches stable state;
In one or more embodiments, in step 2, scanning range 4000-12500cm-1, scanning times 10-
100 times.
In one or more embodiments, in the step 3, the near infrared spectrum preprocess method is to utilize to subtract
Removing straight line is 12500~4000cm to the wave-number range obtained to step 2-1The crude oil sample atlas of near infrared spectra in region
It is pre-processed, to eliminate background interference and baseline drift;
In one or more embodiments, the step 3 includes, using principal component analysis combination Hotelling T2
The method of statistics calculates the T2 statistic for each sample that initial training is concentrated, and according to preset T2 statistic threshold value, rejects just
Abnormal sample point in beginning training set, constitutes final training set;
Preferably, using the process of the method rejecting abnormalities sample point of principal component analysis combination Hotelling T2 statistics
Are as follows: principal component analysis is carried out to sample spectrum first and calculates each sample then using principal component scores as characteristic variable
T2 statistic rejects initial training and concentrates abnormal sample point, constitute final training according to preset T2 statistic threshold value
Collection.
PCA is carried out to spectrum matrix X in step 4 and analyzes the covariance matrix X being equivalent to matrix XTX carry out feature to
Amount is decomposed, and load vectors are exactly covariance matrix XTThe feature vector of X.If λ is enabled to indicate XTThe characteristic value of X, then before k it is main at
The accumulation contribution rate divided can be calculated as follows:
M is the wavelength points number of spectrum.
In one or more embodiments, the Gaussian process regression model in the step 5 is as follows:
Gaussian process is the set that arbitrary finite stochastic variable all has Joint Gaussian distribution, it is completely by mean function
It determines, can be denoted as with covariance function:
F (x)~GP (m (x), k (x, x '))
In view of there are in the environment of noise, real output value y is equal to the sum of observation and noise, i.e.,
Y=f (x)+ε
ε is white Gaussian noise, is distributed as follows
Wherein following form may be selected in covariance function:
Hyper parameter θ={ l, σ can be acquired by maximum likelihood functionf,σn}。
In formula: x, x ': arbitrary sample in training set;
Y: attribute value data;
M (x): mean function;
K (x, x '): covariance function;
In one or more embodiments, the model estimate variance σ in the step 6 determines that method is as follows:
For the spectrum x to be measured after dimensionality reductionp' corresponding to attribute forecast value ypWith attribute corresponding to sample in training set
The joint prior distribution of value y is
Wherein
Kp=[k (xp,x1) k(xp,x2) … k(xp,xn)]
Kpp=k (xp,xp)
Spectrum x to be measured can be calculated by above formulap' corresponding to attribute value Posterior distrbutionp, i.e. ypEstimation mean value and side
It is poor as follows:
μ=KpK-1y
σ2=Kpp-KpK-1Kp
In one or more embodiments, it is shown that the wave-number range in the step 7 determines that steps are as follows,
(a) first to each specification of variables initial weightM is total variable number;
The number of iterations t=1 ..., g are taken, is repeated the steps of:
(b) sampled probability of each variable is calculatedAnd according to sampled probability from all wave number points
Middle k variable of extraction;
(c) according to k variable of selection, submodel ht is established using PLS method;
(d) spectrum matrix D is reconstructed using the obtained score matrix of PLS and loading matrix ', and the error of each variable is calculated by office
ex;
In formula, exj: the mean error of j-th of variable;
K: total number of samples;
Dij: for the original number of j-th of variable of i-th of sample;
D′ij: for the reconstruct number of j-th of variable of i-th of sample;
(e) error e is calculatedy;
In formula, ey: root-mean-square error ey;
K: total number of samples;
yi: for the true value of i-th of sample;
For the predicted value of i-th of sample;
(f) by exWith eyIt brings into and calculates error in following formula
errt=exj+βey
In formula,The error of the t times iteration;
(g) the new weight of variable is calculated:
After right value update, into next iteration.
(h) after iteration stopping, the weight of each variable is ranked up from big to small, z variable is as last before choosing
The variable used when modeling.
In one or more embodiments, the mathematical model of the step 7 is established using PLS method.
In one or more embodiments, the model established in the step 7 can be according to the spectrogram of sample to be tested
Feature is adaptively suitably changed, i.e., the method for the present invention can be used according to the adaptive change training set of spectrogram to be measured and modeling
Wave-number range, to obtain preferably modeling effect.
Detailed description of the invention
Fig. 1: On-line NIR analyzer detects crude oil sample near infrared spectrum and tests schematic diagram.
Fig. 2: the method for building up of the oil property adaptive prediction model based on near-infrared.
Fig. 3: original crude oil atlas of near infrared spectra.
Fig. 4: pretreated crude oil near-infrared spectrogram.
Fig. 5: PCA analysis principal component.
Fig. 6: the Hotelling T2 figure of abnormal point sample.
Fig. 7: near-infrared sulfur content in crude oil forecast of regression model effect.
Specific embodiment
Fig. 1 shows On-line NIR analyzer test sample near infrared spectrum data experimentation of the present invention.Fig. 2 is
The general flow chart of the method for the present invention, specifically includes the following steps:
The first step collects crude oil sample, and measures the attribute value of all crude oil samples;
Second step measures the atlas of near infrared spectra of all samples using on-line nir system;
Third step, the crude oil near infrared spectrum obtained to second step pre-process, and to pretreated spectroscopic data
Carry out the screening of sample, rejecting abnormalities sample point;
4th step carries out PCA principal component analysis to all samples in sample database, and selected characteristic value contribution rate is greater than
95% load vectors are denoted as Ppca, and store;
5th step randomly selects n sample using the Monte Carlo methods of sampling from sample database, forms training subset A, benefit
Use PpcaDimensionality reduction is carried out to this subset, and establishes Gaussian process regression model in the subset after dimensionality reduction.This step n times is repeated, most
N number of training subset and submodel are obtained eventually;
6th step, as the new spectrum x to be measured of acquisitionpWhen, utilize PpcaTo spectrum x to be measuredpDimensionality reduction is carried out, and is brought into all
In GPR submodel, computation model estimate variance σ.Select the corresponding sub- training set of the smallest model of σ value as local training set
S;
7th step, determines wave-number range according to local training set S, establishes partial model on local training set using PLS,
And predict spectrum x to be measuredpAttribute value.
These steps will be hereafter described in detail.It should be understood that within the scope of the present invention, above-mentioned each skill of the invention
It can be combined with each other between art feature and specifically described in below (e.g. embodiment) each technical characteristic, to constitute preferred
Technical solution.
One, crude oil calibration set is constructed, the property of calibration set Crude Oil is measured
Different types of crude oil sample is collected, usually covering paraffinic base crude oil, intermediate base crude and naphthene base crude etc..
In general, collected crude oil sample quantity is no less than 200.Its near infrared light is repeatedly measured preferably for each crude oil
Spectrogram and attribute value, to eliminate accidental error.
It is preferred that density (20 DEG C), sulfur content and the acid value index of collected crude oil sample control respectively 0.7~
1.1g/cm3, 0.03%~5.50% and 0.01~12.00mgKOH/g within the scope of.Then traditional standard method is utilized
The multiple attributes for measuring collected crude oil, such as density, carbon residue, nitrogen content, sulfur content, acid value, salt content, wax content, glue
Matter content, asphalt content and true boiling point distillation data etc., and record data.
Two, crude oil near infrared spectrum is acquired
The offline or On-line NIR instrument of suitable types can be chosen, near infrared spectrum scanning is carried out, use will pop one's head in
It is inserted directly into the measurement method that temperature maintains the crude oil sample of 35 DEG C of some steady temperatures below, keeps former in measurement process
It is oily uniform, and then obtain the atlas of near infrared spectra of every part of sample.For example, crude oil sample can be placed at a temperature of 30 DEG C, and tie up
It is constant to hold temperature, after crude oil sample temperature reaches stable state, measures the near infrared spectrum data of the crude oil sample.
In general, every spectrogram sweep time is 10-100 times, it is averaged.Spectral scanning range is 4000-
12500cm-1, resolution ratio 8-32cm-1.Illustrative crude oil pre-processed spectrum is shown in Fig. 3.
Three, the crude oil near infrared spectrum that step 2 obtains is pre-processed using subtracting straight line
The pretreatment includes the 12500-4000cm to every part of sample of calibration set-1Spectrum area carry out subtracting straight line and locate in advance
Reason eliminates baseline drift and background interference, improves resolution ratio and sensitivity.After pretreatment, initial training collection can be established.It is exemplary
Pretreated crude oil near-infrared spectrogram see Fig. 4.
Four, the method rejecting abnormalities sample point counted using principal component analysis combination Hotelling T2
The method rejecting abnormalities sample point of principal component analysis combination Hotelling T2 statistics can be used.Its basic process
To carry out principal component (PCA) analysis to sample spectrum first, then using principal component scores as characteristic variable, calculating each
The T2 statistic of sample rejects initial training and concentrates abnormal sample point, constitute final according to preset T2 statistic threshold value
Training set.
The T2 value of samples all in sample database is compared with threshold value, rejects the sample for being greater than threshold value, establishes final instruction
Practice collection.Illustrative PCA analysis principal component is as shown in Figure 5.
Five, PCA principal component analysis is carried out to all samples in sample database, and selected characteristic value contribution rate is greater than 95%
Load vectors are denoted as Ppca。
This step carries out PCA principal component analysis to the spectrum samples in sample database, and selected characteristic value contribution rate is greater than
95% load vectors save, and can not only reduce the memory space of sample database in this way, but also can calculate in the steps afterwards similar
Calculation amount is reduced when spending index, is reduced and is calculated the time.
Six, suitable local training set is selected from sample database using Gaussian process model
N sample is randomly selected from sample database first with the Monte Carlo methods of sampling, forms training subset A, is utilized
PpcaDimensionality reduction is carried out to this subset, and establishes Gaussian process regression model in the subset after dimensionality reduction.This step n times is repeated, finally
Obtain N number of training subset and submodel.As the new spectrum x to be measured of acquisitionpWhen, utilize PpcaTo spectrum x to be measuredpDimensionality reduction is carried out, and
It brings into all GPR submodels, computation model estimate variance σ.Select the corresponding sub- training set of the smallest model of σ value as office
Portion training set S.The calculation formula of model estimate variance σ is as follows:
σ2=Kpp-KpK-1Kp
Wherein
Kp=[k (xp,x1) k(xp,x2) … k(xp,xn)]
Kpp=k (xp,xp)
Wherein
In formula: xi,xj: arbitrary sample in training set;
xp: spectrum to be measured
Y: attribute value data;
M (x): mean function;
k(xi,xj): covariance function;
Seven, wave-number range is determined according to local training set S
This step carries out wave number selection to the spectrum samples in training set.With deeply grinding to the methods of offset minimum binary
Study carefully, discovery is possible to obtain preferably quantitative model by screening characteristic waves or section.It can simplify mould by wave number selection
Type, and incoherent variable can be rejected by wave number selection, it is stronger to obtain predictive ability, the better model of robustness.
Shown in wave-number range in step 7 determines that steps are as follows,
(a) first to each specification of variables initial weightM is total variable number;
The number of iterations t=1 ..., g are taken, is repeated the steps of:
(b) sampled probability of each variable is calculatedAnd according to sampled probability from all wave number points
Middle k variable of extraction;
(c) according to k variable of selection, submodel h is established using PLS methodt;
(d) spectrum matrix D is reconstructed using the obtained score matrix of PLS and loading matrix ', and the error of each variable is calculated by office
ex;
In formula, exj: the mean error of j-th of variable;
K: total number of samples;
Dij: for the original number of j-th of variable of i-th of sample;
D′ij: for the reconstruct number of j-th of variable of i-th of sample;
(e) error e is calculatedy;
In formula, ey: root-mean-square error ey;
K: total number of samples;
yi: for the true value of i-th of sample;
For the predicted value of i-th of sample;
(f) by exWith eyIt brings into and calculates error in following formula
errt=exj+βey
In formula,The error of the t times iteration;
(g) the new weight of variable is calculated:
After right value update, into next iteration;
(h) after iteration stopping, the weight of each variable is ranked up from big to small, z variable is as last before choosing
The variable used when modeling.
Eight, according to determining wave-number range, partial model is established on local training set using PLS, and predict light to be measured
Compose xpAttribute value.
The model established in this step adaptively can suitably be changed according to the chromatogram characteristic of sample to be tested, i.e., originally
Inventive method can be according to the wave-number range of spectrogram to be measured adaptive change training set and modeling, preferably to be modeled
Effect.
The present invention is when predicting the property of crude oil sample to be measured, first using the measurement of method described in step 2 of the present invention
The atlas of near infrared spectra of crude oil sample to be measured, then using method described in step 3 to the near infrared spectrum of crude oil sample to be measured
Figure is pre-processed, and the projection matrix according to determined by step 4 carries out dimensionality reduction to spectrum to be measured later, and utilizes step 5
In the method mentioned determine local training set, using the wave-number range of the modeling of step 7 selection on this training set, and build
Part PLS model is found to predict sample to be tested.
Beneficial effects of the present invention are as follows:
The method of the present invention test mode is simple, quick, practical, quickly measures oil property using near infrared spectrometer.With
Traditional measurement method is compared, and is substantially reduced detection time, is reduced human and material resources.Without using any in test process
Reagent does not damage sample to crude oil sample processing;Simultaneously this method can according to the variation change model appropriate of sample to be tested,
To realize to the real-time tracking of operating condition, obtains that precision is better anticipated, reduce the cost of model maintenance.
The present invention is specifically described below by embodiment.It is necessarily pointed out that following embodiment is only used
In the invention will be further described, it should not be understood as limiting the scope of the invention, professional and technical personnel in the field
Some nonessential modifications and adaptations that content according to the present invention is made, still fall within protection scope of the present invention.
Embodiment 1
Include: to illustrate specific steps of the present invention with the embodiment that sulfur content is predicted below
Step 1: acquiring different types of crude oil sample 200, forms crude oil sample library.
Step 2: sample temperature is controlled at 30 DEG C, is selected Brooker near infrared spectrometer, is determined.Passing through will
Probe is inserted directly into the mode of each crude oil sample, measures the near infrared spectrum of crude oil sample, and spectral region scanning range is 4000-
12500cm-1, resolution ratio 16cm-1, add up scanning times 32 times.And the sulphur of method measurement crude oil sample contains according to the traditional standard
Amount.Fig. 3 is original crude oil atlas of near infrared spectra.It can be seen that the baseline drift of original spectrum is serious, peak overlap is serious.
Step 3: 4000-12500cm is chosen-1The absorbance of Spectral range carries out it to subtract straight line pretreatment,
Establish crude oil sample near infrared light spectrum matrix.Fig. 4 is the spectrogram after pretreatment.
Step 4: the selection of sample is trained by the way of rejecting to pretreated crude oil sample, first to pre-
After crude oil sample spectrum progress principal component analysis that treated, using principal component scores (Fig. 5) as characteristic variable, calculate each
The T2 statistic of sample rejects initial training and concentrates abnormal sample point, reject according to preset T2 statistic threshold value 3.911
T2 statistic is greater than the sample of threshold value, and Rejection of samples 93,95,96,175 is in this example to reject redundant samples, remaining sample
As training sample.Finally, 196 training samples is chosen to constitute crude oil spectra training sample set (Fig. 6).
Step 5: PCA principal component analysis is carried out to all samples in sample database, and selected characteristic value contribution rate is greater than
95% load vectors are denoted as Ppca。
Step 6: randomly selecting 150 samples using the Monte Carlo methods of sampling from sample database, form training subset A,
Utilize PpcaDimensionality reduction is carried out to this subset, and establishes Gaussian process regression model in the subset after dimensionality reduction.Repeat this step 1000
It is secondary, finally obtain 1000 training subsets and submodel.When obtaining new spectrum to be measured, P is utilizedpcaSpectrum to be measured is carried out
Dimensionality reduction, and bring into all GPR submodels, computation model estimate variance σ.Select the corresponding son training of the smallest model of σ value
Collection is as local training set S.
Step 7: wave-number range (about 7496cm is determined according to local training set S-1-8449cm-1、4431cm-1-4519cm-1
In range), and partial model, and the attribute value for predicting spectrum to be measured are established using PLS.
The sulfur content forecast of regression model result such as Fig. 7 built.Up to 0.9916, root-mean-square error is the coefficient of determination
0.1197.The comparison result of predicted value and actual value is as shown in table 1 below, and quickly, simply, prediction result is accurate for prediction process.
Table 1: sulfur content in crude oil predicted value with actual value Comparative result (be generally acknowledged that pre- by the prediction to sulfur content, this field
Relative error is surveyed less than 10%, precision is good)
Prediction for other oil properties, modeling method is identical, and choose corresponding local training set, wave-number range with
And modeling parameters just can obtain result.
Claims (10)
1. a kind of method of building for the model of oil property prediction, which is characterized in that the described method includes:
(1) property data of crude oil sample is measured;
(2) atlas of near infrared spectra of the crude oil sample is measured;
(3) it is pre-processed using subtracting the atlas of near infrared spectra that straight line obtains step (2), to eliminate background interference
With baseline drift;
(4) principal component analysis (PCA) is carried out to the spectrum that step (3) obtain, saves the load that eigenvalue contribution rate is greater than 95%
Vector remembers Ppca;
(5) P is utilizedpcaAfter carrying out dimensionality reduction to the training subset for the sample composition randomly selected, (GPR) is returned by Gaussian process
Obtain submodel;
(6) P is utilizedpcaAfter carrying out dimensionality reduction to the near infrared spectrum of sample to be tested, (GPR) selection training is returned by Gaussian process
Collection;
(7) one or more wave number sections are determined according to training set, the prediction of oil property is established using Partial Least Squares (PLS)
Model.
2. the method as described in claim 1, which is characterized in that
The oil property is selected from: density, carbon residue, acid value, sulfur content, nitrogen content, wax content, gum level, asphalt content
One or more of with true boiling-point (TBP) data;
The quantity of step (1) Crude Oil sample is no less than 200 parts;Crude oil sample is acquired using offline or on-line nir system
Near infrared spectrum data.
3. method according to claim 1 or 2, which is characterized in that in measurement described in step (2), spectral scanning range is
4000-12500cm-1, resolution ratio 2-32cm-1, multiple scanning 10-100 times takes average near infrared light spectrum.
4. method as claimed in any one of claims 1-3, which is characterized in that step (3) includes, using principal component analysis knot
The method for closing Hotelling T2 statistics, calculates the T2 statistic of each sample in initial sample database, is united according to preset T2
Threshold value is measured, sample point abnormal in initial sample database is rejected;
Preferably, using the process of the method rejecting abnormalities sample point of principal component analysis combination Hotelling T2 statistics are as follows: first
Principal component analysis first is carried out to sample spectrum, then using principal component scores as characteristic variable, calculates the T2 system of each sample
Metering, according to preset T2 statistic threshold value, abnormal sample point in Rejection of samples library.
It is highly preferred that the description formula of T2 statistics is as follows:
In formula, t is variable of the original spectrum matrix X after PCA dimensionality reduction, and σ is the standard deviation of t, and Iter is the principal component extracted
Number;Since the T2 value of exceptional sample can be far longer than normal sample, so calculating the T2 of the spectrum samples in all sample databases
Value, and using 99% confidence interval as upper threshold, according to the following formula, and F distribution table is looked into, threshold value is calculated,
The T2 value of samples all in sample database is compared with threshold value, rejects the sample for being greater than threshold value.
5. such as method of any of claims 1-4, which is characterized in that step (5) utilizes the Monte Carlo methods of sampling
Randomly drawing sample forms training subset, and Gaussian process regression model is established in the subset after dimensionality reduction;
Preferably, step (5) n times are repeated, N number of training subset and submodel are obtained;It is highly preferred that N is 200-5000.
6. method as claimed in claim 5, which is characterized in that the Gaussian process regression model in step (5) is as follows:
Gaussian process is the set that arbitrary finite stochastic variable all has Joint Gaussian distribution, it is completely by mean function and association
Variance function determines, can be denoted as:
F (x)~GP (m (x), k (x, x '))
In view of there are in the environment of noise, real output value y is equal to the sum of observation and noise, i.e.,
Y=f (x)+ε
ε is white Gaussian noise, is distributed as follows
Wherein following form may be selected in covariance function:
Hyper parameter θ={ l, σ can be acquired by maximum likelihood functionf,σn}。
In formula: x, x ': arbitrary sample in training set;
Y: attribute value data;
M (x): mean function;
K (x, x '): covariance function.
7. such as method of any of claims 1-6, which is characterized in that step (6) utilizes PpcaTo the close of sample to be tested
Infrared spectroscopy is brought into all Gaussian processes recurrence (GPR) submodels after carrying out dimensionality reduction, and computation model estimate variance σ selects σ value
The corresponding sub- training set of the smallest model is as local training set.
8. the method for claim 7, which is characterized in that the determination method of the model estimate variance σ in the step (6)
It is as follows:
For the spectrum x to be measured after dimensionality reductionp' corresponding to attribute forecast value ypWith attribute value y corresponding to sample in training set
Joint prior distribution is
Wherein
Kp=[k (xp,x1) k(xp,x2) … k(xp,xn)]
Kpp=k (xp,xp)
Spectrum x to be measured can be calculated by above formulap' corresponding to attribute value Posterior distrbutionp, i.e. ypEstimation mean value and variance such as
Shown in lower:
μ=KpK-1y
σ2=Kpp-KpK-1Kp。
9. such as method of any of claims 1-8, which is characterized in that the wave number section in the step (7) determines step
It is rapid as follows,
(a) first to each specification of variables initial weightM is total variable number;
The number of iterations t=1 ..., g are taken, is repeated the steps of:
(b) sampled probability of each variable is calculatedAnd it is taken out from all wave number points according to sampled probability
Take k variable;
(c) according to k variable of selection, submodel h is established using PLS methodt;
(d) spectrum matrix D is reconstructed using the obtained score matrix of PLS and loading matrix ', and the error e of each variable is calculated by officex;
In formula, exj: the mean error of j-th of variable;
K: total number of samples;
Dij: for the original number of j-th of variable of i-th of sample;
D′ij: for the reconstruct number of j-th of variable of i-th of sample;
(e) error e is calculatedy;
In formula, ey: root-mean-square error ey;
K: total number of samples;
yi: for the true value of i-th of sample;
For the predicted value of i-th of sample;
(f) by exWith eyIt brings into and calculates error in following formula
errt=exj+βey
In formula,The error of the t times iteration;
(g) the new weight of variable is calculated:
After right value update, into next iteration;
(h) after iteration stopping, the weight of each variable is ranked up from big to small, z variable is as last modeling before choosing
When the variable that uses.
10. a kind of oil property prediction technique based near infrared spectrum detection, which is characterized in that the described method includes:
(i) atlas of near infrared spectra of crude oil to be detected is measured;
(ii) model pair for oil property prediction constructed using the method for any of claims 1-11
The oil property is predicted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811168265.7A CN109324013B (en) | 2018-10-08 | 2018-10-08 | Near-infrared rapid analysis method for constructing crude oil property by using Gaussian process regression model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811168265.7A CN109324013B (en) | 2018-10-08 | 2018-10-08 | Near-infrared rapid analysis method for constructing crude oil property by using Gaussian process regression model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109324013A true CN109324013A (en) | 2019-02-12 |
CN109324013B CN109324013B (en) | 2021-09-24 |
Family
ID=65261570
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811168265.7A Active CN109324013B (en) | 2018-10-08 | 2018-10-08 | Near-infrared rapid analysis method for constructing crude oil property by using Gaussian process regression model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109324013B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111238997A (en) * | 2020-02-12 | 2020-06-05 | 江南大学 | On-line measurement method for feed density in crude oil desalting and dewatering process |
CN113077006A (en) * | 2021-04-15 | 2021-07-06 | 天津大学 | Model training method and analysis method for analyzing quality of bio-oil |
CN113125377A (en) * | 2021-03-30 | 2021-07-16 | 武汉理工大学 | Method and device for detecting diesel oil property based on near infrared spectrum |
CN113239621A (en) * | 2021-05-11 | 2021-08-10 | 西南石油大学 | PVT (physical vapor transport) measurement method based on elastic network regression algorithm |
CN113569951A (en) * | 2021-07-29 | 2021-10-29 | 山东科技大学 | Method for constructing near-infrared quantitative analysis model based on generation countermeasure network |
CN113702328A (en) * | 2021-08-20 | 2021-11-26 | 广东省惠州市石油产品质量监督检验中心 | Method, device, equipment and storage medium for analyzing properties of product oil |
CN117451691A (en) * | 2023-12-21 | 2024-01-26 | 浙江恒逸石化有限公司 | Method for pre-judging yarn dyeing property |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102265227A (en) * | 2008-10-20 | 2011-11-30 | 西门子公司 | Method and apparatus for creating state estimation models in machine condition monitoring |
CN103345593A (en) * | 2013-07-31 | 2013-10-09 | 哈尔滨工业大学 | Gathering abnormity detection method for single sensor data flow |
CN103425888A (en) * | 2013-08-22 | 2013-12-04 | 重庆大学 | Metal tube agentia compacting method based on compaction density prediction |
CN105447840A (en) * | 2015-12-09 | 2016-03-30 | 西安电子科技大学 | Image super-resolution method based on active sampling and Gaussian process regression |
CN105701572A (en) * | 2016-01-13 | 2016-06-22 | 国网甘肃省电力公司电力科学研究院 | Photovoltaic short-term output prediction method based on improved Gaussian process regression |
CN105699319A (en) * | 2016-01-28 | 2016-06-22 | 山西汾西矿业(集团)有限责任公司 | Near infrared spectrum quick detection method for total moisture of coal based on gaussian process |
US9658104B2 (en) * | 2010-04-05 | 2017-05-23 | Chemimage Corporation | System and method for detecting unknown materials using short wave infrared hyperspectral imaging |
CN106951695A (en) * | 2017-03-09 | 2017-07-14 | 杭州安脉盛智能技术有限公司 | Plant equipment remaining life computational methods and system under multi-state |
CN107918709A (en) * | 2017-11-17 | 2018-04-17 | 浙江工业大学 | A kind of Forecasting Methodology of multiphase mixing transmission pump check valve transient state Lift |
-
2018
- 2018-10-08 CN CN201811168265.7A patent/CN109324013B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102265227A (en) * | 2008-10-20 | 2011-11-30 | 西门子公司 | Method and apparatus for creating state estimation models in machine condition monitoring |
US9658104B2 (en) * | 2010-04-05 | 2017-05-23 | Chemimage Corporation | System and method for detecting unknown materials using short wave infrared hyperspectral imaging |
CN103345593A (en) * | 2013-07-31 | 2013-10-09 | 哈尔滨工业大学 | Gathering abnormity detection method for single sensor data flow |
CN103425888A (en) * | 2013-08-22 | 2013-12-04 | 重庆大学 | Metal tube agentia compacting method based on compaction density prediction |
CN105447840A (en) * | 2015-12-09 | 2016-03-30 | 西安电子科技大学 | Image super-resolution method based on active sampling and Gaussian process regression |
CN105701572A (en) * | 2016-01-13 | 2016-06-22 | 国网甘肃省电力公司电力科学研究院 | Photovoltaic short-term output prediction method based on improved Gaussian process regression |
CN105699319A (en) * | 2016-01-28 | 2016-06-22 | 山西汾西矿业(集团)有限责任公司 | Near infrared spectrum quick detection method for total moisture of coal based on gaussian process |
CN106951695A (en) * | 2017-03-09 | 2017-07-14 | 杭州安脉盛智能技术有限公司 | Plant equipment remaining life computational methods and system under multi-state |
CN107918709A (en) * | 2017-11-17 | 2018-04-17 | 浙江工业大学 | A kind of Forecasting Methodology of multiphase mixing transmission pump check valve transient state Lift |
Non-Patent Citations (4)
Title |
---|
ARJAN GIJSBERTS ET AL.: "Real-time model learning using incremental sparse spectrum gaussian process regression", 《NEURAL NETWORKS》 * |
JAMES E.BARRETT ET AL.: "Covariate dimension reduction for survival data via the gaussian process latent variable model", 《STATISTICS IN MEDICINE》 * |
何志昆等: "高斯过程回归方法综述", 《控制与决策》 * |
高阳: "高光谱数据降维算法研究", 《博士学位论文全文库》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111238997A (en) * | 2020-02-12 | 2020-06-05 | 江南大学 | On-line measurement method for feed density in crude oil desalting and dewatering process |
CN111238997B (en) * | 2020-02-12 | 2021-07-27 | 江南大学 | On-line measurement method for feed density in crude oil desalting and dewatering process |
CN113125377A (en) * | 2021-03-30 | 2021-07-16 | 武汉理工大学 | Method and device for detecting diesel oil property based on near infrared spectrum |
CN113125377B (en) * | 2021-03-30 | 2024-02-23 | 武汉理工大学 | Method and device for detecting property of diesel based on near infrared spectrum |
CN113077006A (en) * | 2021-04-15 | 2021-07-06 | 天津大学 | Model training method and analysis method for analyzing quality of bio-oil |
CN113239621A (en) * | 2021-05-11 | 2021-08-10 | 西南石油大学 | PVT (physical vapor transport) measurement method based on elastic network regression algorithm |
CN113239621B (en) * | 2021-05-11 | 2022-07-12 | 西南石油大学 | PVT (Voltage-volume-temperature) measurement method based on elastic network regression algorithm |
CN113569951A (en) * | 2021-07-29 | 2021-10-29 | 山东科技大学 | Method for constructing near-infrared quantitative analysis model based on generation countermeasure network |
CN113569951B (en) * | 2021-07-29 | 2023-11-07 | 山东科技大学 | Near infrared quantitative analysis model construction method based on generation countermeasure network |
CN113702328A (en) * | 2021-08-20 | 2021-11-26 | 广东省惠州市石油产品质量监督检验中心 | Method, device, equipment and storage medium for analyzing properties of product oil |
CN117451691A (en) * | 2023-12-21 | 2024-01-26 | 浙江恒逸石化有限公司 | Method for pre-judging yarn dyeing property |
CN117451691B (en) * | 2023-12-21 | 2024-04-02 | 浙江恒逸石化有限公司 | Method for pre-judging yarn dyeing property |
Also Published As
Publication number | Publication date |
---|---|
CN109324013B (en) | 2021-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109324013A (en) | A method of it is quickly analyzed using Gaussian process regression model building oil property near-infrared | |
CN105300923B (en) | Without measuring point model of temperature compensation modification method during a kind of near-infrared spectrometers application on site | |
CN104062257B (en) | A kind of based on the method for general flavone content near infrared ray solution | |
CN101995389B (en) | Method for fast recognition of crude oil variety through near infrared spectrum | |
CN109324014A (en) | A kind of adaptive oil property near-infrared method for quick predicting | |
CN104062256B (en) | A kind of flexible measurement method based near infrared spectrum | |
CN105388123B (en) | A kind of method by near infrared spectrum prediction oil property | |
CN105424641B (en) | A kind of near infrared spectrum recognition methods of crude oil type | |
CN107817223A (en) | The construction method of quick nondestructive real-time estimate oil property model and its application | |
CN107703097B (en) | Method for constructing model for rapidly predicting crude oil property by using near-infrared spectrometer | |
CN109669023A (en) | A kind of soil attribute prediction technique based on Multi-sensor Fusion | |
CN105466884B (en) | It is a kind of by near infrared light spectrum discrimination crude oil species and its method for property | |
CN107958267B (en) | Oil product property prediction method based on spectral linear representation | |
CN102841069B (en) | Method for rapidly identifying types of crude oil by using mid-infrared spectrum | |
CN108875118B (en) | Method and device for evaluating accuracy of prediction model of silicon content of blast furnace molten iron | |
CN105334185A (en) | Spectrum projection discrimination-based near infrared model maintenance method | |
CN104062259A (en) | Method for rapid determination of total saponin content in compound ass-hide glue pulp by near infrared spectroscopy | |
CN104062258A (en) | Method for rapid determination of soluble solids in compound ass-hide glue pulp by near infrared spectroscopy | |
CN107860743A (en) | Utilize the method and its application of the model of reflective near infrared fibre-optical probe structure fast prediction oil property | |
CN115993344A (en) | Quality monitoring and analyzing system and method for near infrared spectrum analyzer | |
CN108663334B (en) | Method for searching spectral characteristic wavelength of soil nutrient based on multi-classifier fusion | |
CN109283153A (en) | A kind of method for building up of soy sauce Quantitative Analysis Model | |
CN105954228A (en) | Method for measuring content of sodium metal in oil sand based on near infrared spectrum | |
CN108693139A (en) | The near infrared prediction model method for building up of electronics tobacco tar physical and chemical index and application | |
CN106485049B (en) | A kind of detection method of the NIRS exceptional sample based on Monte Carlo cross validation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |