CN109324013A

CN109324013A - A method of it is quickly analyzed using Gaussian process regression model building oil property near-infrared

Info

Publication number: CN109324013A
Application number: CN201811168265.7A
Authority: CN
Inventors: 钱锋; 钟伟民; 杨明磊; 杜文莉; 隆建
Original assignee: East China University of Science and Technology
Current assignee: East China University of Science and Technology
Priority date: 2018-10-08
Filing date: 2018-10-08
Publication date: 2019-02-12
Anticipated expiration: 2038-10-08
Also published as: CN109324013B

Abstract

The invention discloses a kind of methods quickly analyzed using Gaussian process regression model building oil property near-infrared.Building is for the property data that the method for the model of oil property prediction includes: that (1) measures crude oil sample；(2) atlas of near infrared spectra of the crude oil sample is measured；(3) it is pre-processed using subtracting the atlas of near infrared spectra that straight line obtains step (2), to eliminate background interference and baseline drift；(4) principal component analysis (PCA) is carried out to the spectrum that step (3) obtain, saves load vectors of the eigenvalue contribution rate greater than 95% and remembers P_pca；(5) P is utilized_pcaAfter carrying out dimensionality reduction to the training subset for the sample composition randomly selected, (GPR) is returned by Gaussian process and obtains submodel；(6) P is utilized_pcaAfter carrying out dimensionality reduction to the near infrared spectrum of sample to be tested, (GPR) is returned by Gaussian process and selects training set；(7) one or more wave number sections are determined according to training set, the prediction model of oil property is established using Partial Least Squares (PLS).

Description

It is a kind of to construct what oil property near-infrared was quickly analyzed using Gaussian process regression model Method

Technical field

The present invention relates to a kind of using the Gaussian process regression model method quickly analyzed of building oil property near-infrared and Using.

Background technique

Crude oil evaluation crude oil production, trade, in terms of play an important role, China is carrying out always original The related work of oil evaluation, has the standard evaluation method of complete set at present, but often analysis time is longer for these methods, Required sample size is larger, and analysis cost is high, has been unable to meet the needs of practical application.NIR technology is before most having at present One of scape and most widely used rapid analysis method.Application of the optical fiber in near-infrared spectrum technique field in recent years makes close red External spectrum technology moves towards scene from laboratory, and near-infrared spectrum technique has to electromagnetic interference insensitive, transmitted signal energy collection In, high sensitivity, it is cheap the advantages that, this allows near infrared spectrometer to carry out long distance in severe, dangerous environment From fast on-line analyzing.The complicated components of crude oil, property to be measured are more, and its Near-infrared Spectral Absorption band is wider and overlapping is tight Weight, and since near-infrared analyzer is secondary meter.Therefore, it is close for establishing the near-infrared model that precision is high, robustness is good The key that can infrared technique effectively be applied.

Original modeling method generally belongs to the scope of static models, the pretreatment of spectrum, variables choice, model foundation And the committed steps such as model modification and maintenance require to carry out offline, and remain unchanged in application process.For process industry, The continuity of production, which generally requires model, to carry out real-time tracking to field working conditions；And model and current working occur compared with When big deviation, precision of prediction are unable to satisfy the demand of on-line checking, model, which is able to carry out, timely and effectively to be updated.

Summary of the invention

In view of the above problems, it is fast using Gaussian process regression model building oil property near-infrared that the invention proposes a kind of The method of speed analysis.On the basis of this method is using near-infrared analyzer acquisition crude oil atlas of near infrared spectra, using subtracting one The method of straight line pre-processes collected crude oil sample near infrared spectrum, to eliminate interference；To pretreated spectrum Data carry out screening sample；The sample to be tested newly obtained according to each selects suitably training using GPR from sample database Collection, and modeling wave-number range is determined according to the training set, partial model is established using PLS, for predicting the category of spectrum to be measured Property value.

It is provided by the invention to include using the method quickly analyzed of Gaussian process regression model building oil property near-infrared Following steps:

Step 1: crude oil sample is collected, and measures the attribute value of all crude oil samples；

Step 2: the atlas of near infrared spectra of all samples is measured using on-line nir system；

Step 3: the crude oil near infrared spectrum obtained to step 2 pre-processes, and to pretreated spectroscopic data Carry out the screening of sample, rejecting abnormalities sample point；

Step 4: PCA principal component analysis is carried out to all samples in sample database, and selected characteristic value contribution rate is greater than 95% load vectors are denoted as P_pca, and store；

Step 5: randomly selecting n sample using the Monte Carlo methods of sampling from sample database, forms training subset A, benefit Use P_pcaDimensionality reduction is carried out to this subset, and establishes Gaussian process regression model in the subset after dimensionality reduction.This step n times is repeated, most N number of training subset and submodel are obtained eventually.

Step 6: as the new spectrum x to be measured of acquisition_pWhen, utilize P_pcaTo spectrum x to be measured_pDimensionality reduction is carried out, and is brought into all In GPR submodel, computation model estimate variance σ.Select the corresponding sub- training set of the smallest model of σ value as local training set S。

Step 7: determining wave-number range according to local training set S, establishes partial model on local training set using PLS, And predict spectrum x to be measured_pAttribute value.

In one or more embodiments, for constructing 20 DEG C of crude oil of density of calibration set in 0.7- in step 1 1.1g/cm³In the range of, sulfur content is in the range of 0.03%-5.50%, range of the acid value in 0.01-12.00mgKOH/g It is interior；And/or

The oil property includes that density, carbon residue, acid value, sulfur content, nitrogen content, wax content, gum level, asphalitine contain One or more of amount and true boiling-point (TBP) data.

In one or more embodiments, the step 2 includes that training set sample is placed at a temperature of 35 DEG C A certain temperature measures the near infrared spectrum data of the crude oil sample after crude oil sample temperature reaches stable state；

In one or more embodiments, in step 2, scanning range 4000-12500cm^-1, scanning times 10- 100 times.

In one or more embodiments, in the step 3, the near infrared spectrum preprocess method is to utilize to subtract Removing straight line is 12500~4000cm to the wave-number range obtained to step 2^-1The crude oil sample atlas of near infrared spectra in region It is pre-processed, to eliminate background interference and baseline drift；

In one or more embodiments, the step 3 includes, using principal component analysis combination Hotelling T2 The method of statistics calculates the T2 statistic for each sample that initial training is concentrated, and according to preset T2 statistic threshold value, rejects just Abnormal sample point in beginning training set, constitutes final training set；

Preferably, using the process of the method rejecting abnormalities sample point of principal component analysis combination Hotelling T2 statistics Are as follows: principal component analysis is carried out to sample spectrum first and calculates each sample then using principal component scores as characteristic variable T2 statistic rejects initial training and concentrates abnormal sample point, constitute final training according to preset T2 statistic threshold value Collection.

PCA is carried out to spectrum matrix X in step 4 and analyzes the covariance matrix X being equivalent to matrix X^TX carry out feature to Amount is decomposed, and load vectors are exactly covariance matrix X^TThe feature vector of X.If λ is enabled to indicate X^TThe characteristic value of X, then before k it is main at The accumulation contribution rate divided can be calculated as follows:

M is the wavelength points number of spectrum.

In one or more embodiments, the Gaussian process regression model in the step 5 is as follows:

Gaussian process is the set that arbitrary finite stochastic variable all has Joint Gaussian distribution, it is completely by mean function It determines, can be denoted as with covariance function:

F (x)~GP (m (x), k (x, x '))

In view of there are in the environment of noise, real output value y is equal to the sum of observation and noise, i.e.,

Y=f (x)+ε

ε is white Gaussian noise, is distributed as follows

Wherein following form may be selected in covariance function:

Hyper parameter θ={ l, σ can be acquired by maximum likelihood function_f,σ_n}。

In formula: x, x ': arbitrary sample in training set；

Y: attribute value data；

M (x): mean function；

K (x, x '): covariance function；

In one or more embodiments, the model estimate variance σ in the step 6 determines that method is as follows:

For the spectrum x to be measured after dimensionality reduction_p' corresponding to attribute forecast value y_pWith attribute corresponding to sample in training set The joint prior distribution of value y is

Wherein

K_p=[k (x_p,x₁) k(x_p,x₂) … k(x_p,x_n)]

K_pp=k (x_p,x_p)

Spectrum x to be measured can be calculated by above formula_p' corresponding to attribute value Posterior distrbutionp, i.e. y_pEstimation mean value and side It is poor as follows:

μ=K_pK^-1y

σ²=K_pp-K_pK^-1K_p

In one or more embodiments, it is shown that the wave-number range in the step 7 determines that steps are as follows,

(a) first to each specification of variables initial weightM is total variable number；

The number of iterations t=1 ..., g are taken, is repeated the steps of:

(b) sampled probability of each variable is calculatedAnd according to sampled probability from all wave number points Middle k variable of extraction；

(c) according to k variable of selection, submodel ht is established using PLS method；

(d) spectrum matrix D is reconstructed using the obtained score matrix of PLS and loading matrix ', and the error of each variable is calculated by office ex；

In formula, e_xj: the mean error of j-th of variable；

K: total number of samples；

D_ij: for the original number of j-th of variable of i-th of sample；

D′_ij: for the reconstruct number of j-th of variable of i-th of sample；

(e) error e is calculated_y；

In formula, e_y: root-mean-square error e_y；

K: total number of samples；

y_i: for the true value of i-th of sample；

For the predicted value of i-th of sample；

(f) by e_xWith e_yIt brings into and calculates error in following formula

err_t=e_xj+βe_y

In formula,The error of the t times iteration；

(g) the new weight of variable is calculated:

After right value update, into next iteration.

(h) after iteration stopping, the weight of each variable is ranked up from big to small, z variable is as last before choosing The variable used when modeling.

In one or more embodiments, the mathematical model of the step 7 is established using PLS method.

In one or more embodiments, the model established in the step 7 can be according to the spectrogram of sample to be tested Feature is adaptively suitably changed, i.e., the method for the present invention can be used according to the adaptive change training set of spectrogram to be measured and modeling Wave-number range, to obtain preferably modeling effect.

Detailed description of the invention

Fig. 1: On-line NIR analyzer detects crude oil sample near infrared spectrum and tests schematic diagram.

Fig. 2: the method for building up of the oil property adaptive prediction model based on near-infrared.

Fig. 3: original crude oil atlas of near infrared spectra.

Fig. 4: pretreated crude oil near-infrared spectrogram.

Fig. 5: PCA analysis principal component.

Fig. 6: the Hotelling T2 figure of abnormal point sample.

Fig. 7: near-infrared sulfur content in crude oil forecast of regression model effect.

Specific embodiment

Fig. 1 shows On-line NIR analyzer test sample near infrared spectrum data experimentation of the present invention.Fig. 2 is The general flow chart of the method for the present invention, specifically includes the following steps:

The first step collects crude oil sample, and measures the attribute value of all crude oil samples；

Second step measures the atlas of near infrared spectra of all samples using on-line nir system；

Third step, the crude oil near infrared spectrum obtained to second step pre-process, and to pretreated spectroscopic data Carry out the screening of sample, rejecting abnormalities sample point；

4th step carries out PCA principal component analysis to all samples in sample database, and selected characteristic value contribution rate is greater than 95% load vectors are denoted as P_pca, and store；

5th step randomly selects n sample using the Monte Carlo methods of sampling from sample database, forms training subset A, benefit Use P_pcaDimensionality reduction is carried out to this subset, and establishes Gaussian process regression model in the subset after dimensionality reduction.This step n times is repeated, most N number of training subset and submodel are obtained eventually；

6th step, as the new spectrum x to be measured of acquisition_pWhen, utilize P_pcaTo spectrum x to be measured_pDimensionality reduction is carried out, and is brought into all In GPR submodel, computation model estimate variance σ.Select the corresponding sub- training set of the smallest model of σ value as local training set S；

7th step, determines wave-number range according to local training set S, establishes partial model on local training set using PLS, And predict spectrum x to be measured_pAttribute value.

These steps will be hereafter described in detail.It should be understood that within the scope of the present invention, above-mentioned each skill of the invention It can be combined with each other between art feature and specifically described in below (e.g. embodiment) each technical characteristic, to constitute preferred Technical solution.

One, crude oil calibration set is constructed, the property of calibration set Crude Oil is measured

Different types of crude oil sample is collected, usually covering paraffinic base crude oil, intermediate base crude and naphthene base crude etc.. In general, collected crude oil sample quantity is no less than 200.Its near infrared light is repeatedly measured preferably for each crude oil Spectrogram and attribute value, to eliminate accidental error.

It is preferred that density (20 DEG C), sulfur content and the acid value index of collected crude oil sample control respectively 0.7~ 1.1g/cm³, 0.03%~5.50% and 0.01~12.00mgKOH/g within the scope of.Then traditional standard method is utilized The multiple attributes for measuring collected crude oil, such as density, carbon residue, nitrogen content, sulfur content, acid value, salt content, wax content, glue Matter content, asphalt content and true boiling point distillation data etc., and record data.

Two, crude oil near infrared spectrum is acquired

The offline or On-line NIR instrument of suitable types can be chosen, near infrared spectrum scanning is carried out, use will pop one's head in It is inserted directly into the measurement method that temperature maintains the crude oil sample of 35 DEG C of some steady temperatures below, keeps former in measurement process It is oily uniform, and then obtain the atlas of near infrared spectra of every part of sample.For example, crude oil sample can be placed at a temperature of 30 DEG C, and tie up It is constant to hold temperature, after crude oil sample temperature reaches stable state, measures the near infrared spectrum data of the crude oil sample.

In general, every spectrogram sweep time is 10-100 times, it is averaged.Spectral scanning range is 4000- 12500cm^-1, resolution ratio 8-32cm^-1.Illustrative crude oil pre-processed spectrum is shown in Fig. 3.

Three, the crude oil near infrared spectrum that step 2 obtains is pre-processed using subtracting straight line

The pretreatment includes the 12500-4000cm to every part of sample of calibration set^-1Spectrum area carry out subtracting straight line and locate in advance Reason eliminates baseline drift and background interference, improves resolution ratio and sensitivity.After pretreatment, initial training collection can be established.It is exemplary Pretreated crude oil near-infrared spectrogram see Fig. 4.

Four, the method rejecting abnormalities sample point counted using principal component analysis combination Hotelling T2

The method rejecting abnormalities sample point of principal component analysis combination Hotelling T2 statistics can be used.Its basic process To carry out principal component (PCA) analysis to sample spectrum first, then using principal component scores as characteristic variable, calculating each The T2 statistic of sample rejects initial training and concentrates abnormal sample point, constitute final according to preset T2 statistic threshold value Training set.

The T2 value of samples all in sample database is compared with threshold value, rejects the sample for being greater than threshold value, establishes final instruction Practice collection.Illustrative PCA analysis principal component is as shown in Figure 5.

Five, PCA principal component analysis is carried out to all samples in sample database, and selected characteristic value contribution rate is greater than 95% Load vectors are denoted as P_pca。

This step carries out PCA principal component analysis to the spectrum samples in sample database, and selected characteristic value contribution rate is greater than 95% load vectors save, and can not only reduce the memory space of sample database in this way, but also can calculate in the steps afterwards similar Calculation amount is reduced when spending index, is reduced and is calculated the time.

Six, suitable local training set is selected from sample database using Gaussian process model

N sample is randomly selected from sample database first with the Monte Carlo methods of sampling, forms training subset A, is utilized P_pcaDimensionality reduction is carried out to this subset, and establishes Gaussian process regression model in the subset after dimensionality reduction.This step n times is repeated, finally Obtain N number of training subset and submodel.As the new spectrum x to be measured of acquisition_pWhen, utilize P_pcaTo spectrum x to be measured_pDimensionality reduction is carried out, and It brings into all GPR submodels, computation model estimate variance σ.Select the corresponding sub- training set of the smallest model of σ value as office Portion training set S.The calculation formula of model estimate variance σ is as follows:

σ²=K_pp-K_pK^-1K_p

Wherein

K_p=[k (x_p,x₁) k(x_p,x₂) … k(x_p,x_n)]

K_pp=k (x_p,x_p)

Wherein

In formula: x_i,x_j: arbitrary sample in training set；

x_p: spectrum to be measured

Y: attribute value data；

M (x): mean function；

k(x_i,x_j): covariance function；

Seven, wave-number range is determined according to local training set S

This step carries out wave number selection to the spectrum samples in training set.With deeply grinding to the methods of offset minimum binary Study carefully, discovery is possible to obtain preferably quantitative model by screening characteristic waves or section.It can simplify mould by wave number selection Type, and incoherent variable can be rejected by wave number selection, it is stronger to obtain predictive ability, the better model of robustness.

Shown in wave-number range in step 7 determines that steps are as follows,

The number of iterations t=1 ..., g are taken, is repeated the steps of:

(c) according to k variable of selection, submodel h is established using PLS method_t；

(d) spectrum matrix D is reconstructed using the obtained score matrix of PLS and loading matrix ', and the error of each variable is calculated by office e_x；

In formula, e_xj: the mean error of j-th of variable；

K: total number of samples；

D_ij: for the original number of j-th of variable of i-th of sample；

D′_ij: for the reconstruct number of j-th of variable of i-th of sample；

(e) error e is calculated_y；

In formula, e_y: root-mean-square error e_y；

K: total number of samples；

y_i: for the true value of i-th of sample；

For the predicted value of i-th of sample；

(f) by e_xWith e_yIt brings into and calculates error in following formula

err_t=e_xj+βe_y

In formula,The error of the t times iteration；

(g) the new weight of variable is calculated:

After right value update, into next iteration；

Eight, according to determining wave-number range, partial model is established on local training set using PLS, and predict light to be measured Compose x_pAttribute value.

The model established in this step adaptively can suitably be changed according to the chromatogram characteristic of sample to be tested, i.e., originally Inventive method can be according to the wave-number range of spectrogram to be measured adaptive change training set and modeling, preferably to be modeled Effect.

The present invention is when predicting the property of crude oil sample to be measured, first using the measurement of method described in step 2 of the present invention The atlas of near infrared spectra of crude oil sample to be measured, then using method described in step 3 to the near infrared spectrum of crude oil sample to be measured Figure is pre-processed, and the projection matrix according to determined by step 4 carries out dimensionality reduction to spectrum to be measured later, and utilizes step 5 In the method mentioned determine local training set, using the wave-number range of the modeling of step 7 selection on this training set, and build Part PLS model is found to predict sample to be tested.

Beneficial effects of the present invention are as follows:

The method of the present invention test mode is simple, quick, practical, quickly measures oil property using near infrared spectrometer.With Traditional measurement method is compared, and is substantially reduced detection time, is reduced human and material resources.Without using any in test process Reagent does not damage sample to crude oil sample processing；Simultaneously this method can according to the variation change model appropriate of sample to be tested, To realize to the real-time tracking of operating condition, obtains that precision is better anticipated, reduce the cost of model maintenance.

The present invention is specifically described below by embodiment.It is necessarily pointed out that following embodiment is only used In the invention will be further described, it should not be understood as limiting the scope of the invention, professional and technical personnel in the field Some nonessential modifications and adaptations that content according to the present invention is made, still fall within protection scope of the present invention.

Embodiment 1

Include: to illustrate specific steps of the present invention with the embodiment that sulfur content is predicted below

Step 1: acquiring different types of crude oil sample 200, forms crude oil sample library.

Step 2: sample temperature is controlled at 30 DEG C, is selected Brooker near infrared spectrometer, is determined.Passing through will Probe is inserted directly into the mode of each crude oil sample, measures the near infrared spectrum of crude oil sample, and spectral region scanning range is 4000- 12500cm^-1, resolution ratio 16cm^-1, add up scanning times 32 times.And the sulphur of method measurement crude oil sample contains according to the traditional standard Amount.Fig. 3 is original crude oil atlas of near infrared spectra.It can be seen that the baseline drift of original spectrum is serious, peak overlap is serious.

Step 3: 4000-12500cm is chosen^-1The absorbance of Spectral range carries out it to subtract straight line pretreatment, Establish crude oil sample near infrared light spectrum matrix.Fig. 4 is the spectrogram after pretreatment.

Step 4: the selection of sample is trained by the way of rejecting to pretreated crude oil sample, first to pre- After crude oil sample spectrum progress principal component analysis that treated, using principal component scores (Fig. 5) as characteristic variable, calculate each The T2 statistic of sample rejects initial training and concentrates abnormal sample point, reject according to preset T2 statistic threshold value 3.911 T2 statistic is greater than the sample of threshold value, and Rejection of samples 93,95,96,175 is in this example to reject redundant samples, remaining sample As training sample.Finally, 196 training samples is chosen to constitute crude oil spectra training sample set (Fig. 6).

Step 5: PCA principal component analysis is carried out to all samples in sample database, and selected characteristic value contribution rate is greater than 95% load vectors are denoted as P_pca。

Step 6: randomly selecting 150 samples using the Monte Carlo methods of sampling from sample database, form training subset A, Utilize P_pcaDimensionality reduction is carried out to this subset, and establishes Gaussian process regression model in the subset after dimensionality reduction.Repeat this step 1000 It is secondary, finally obtain 1000 training subsets and submodel.When obtaining new spectrum to be measured, P is utilized_pcaSpectrum to be measured is carried out Dimensionality reduction, and bring into all GPR submodels, computation model estimate variance σ.Select the corresponding son training of the smallest model of σ value Collection is as local training set S.

Step 7: wave-number range (about 7496cm is determined according to local training set S^-1-8449cm^-1、4431cm^-1-4519cm^-1 In range), and partial model, and the attribute value for predicting spectrum to be measured are established using PLS.

The sulfur content forecast of regression model result such as Fig. 7 built.Up to 0.9916, root-mean-square error is the coefficient of determination 0.1197.The comparison result of predicted value and actual value is as shown in table 1 below, and quickly, simply, prediction result is accurate for prediction process.

Table 1: sulfur content in crude oil predicted value with actual value Comparative result (be generally acknowledged that pre- by the prediction to sulfur content, this field Relative error is surveyed less than 10%, precision is good)

Prediction for other oil properties, modeling method is identical, and choose corresponding local training set, wave-number range with And modeling parameters just can obtain result.

Claims

1. a kind of method of building for the model of oil property prediction, which is characterized in that the described method includes:

(1) property data of crude oil sample is measured；

(2) atlas of near infrared spectra of the crude oil sample is measured；

(3) it is pre-processed using subtracting the atlas of near infrared spectra that straight line obtains step (2), to eliminate background interference With baseline drift；

(4) principal component analysis (PCA) is carried out to the spectrum that step (3) obtain, saves the load that eigenvalue contribution rate is greater than 95% Vector remembers P_pca；

(5) P is utilized_pcaAfter carrying out dimensionality reduction to the training subset for the sample composition randomly selected, (GPR) is returned by Gaussian process Obtain submodel；

(6) P is utilized_pcaAfter carrying out dimensionality reduction to the near infrared spectrum of sample to be tested, (GPR) selection training is returned by Gaussian process Collection；

(7) one or more wave number sections are determined according to training set, the prediction of oil property is established using Partial Least Squares (PLS) Model.

2. the method as described in claim 1, which is characterized in that

The oil property is selected from: density, carbon residue, acid value, sulfur content, nitrogen content, wax content, gum level, asphalt content One or more of with true boiling-point (TBP) data；

The quantity of step (1) Crude Oil sample is no less than 200 parts；Crude oil sample is acquired using offline or on-line nir system Near infrared spectrum data.

3. method according to claim 1 or 2, which is characterized in that in measurement described in step (2), spectral scanning range is 4000-12500cm^-1, resolution ratio 2-32cm^-1, multiple scanning 10-100 times takes average near infrared light spectrum.

4. method as claimed in any one of claims 1-3, which is characterized in that step (3) includes, using principal component analysis knot The method for closing Hotelling T2 statistics, calculates the T2 statistic of each sample in initial sample database, is united according to preset T2 Threshold value is measured, sample point abnormal in initial sample database is rejected；

Preferably, using the process of the method rejecting abnormalities sample point of principal component analysis combination Hotelling T2 statistics are as follows: first Principal component analysis first is carried out to sample spectrum, then using principal component scores as characteristic variable, calculates the T2 system of each sample Metering, according to preset T2 statistic threshold value, abnormal sample point in Rejection of samples library.

It is highly preferred that the description formula of T2 statistics is as follows:

In formula, t is variable of the original spectrum matrix X after PCA dimensionality reduction, and σ is the standard deviation of t, and Iter is the principal component extracted Number；Since the T2 value of exceptional sample can be far longer than normal sample, so calculating the T2 of the spectrum samples in all sample databases Value, and using 99% confidence interval as upper threshold, according to the following formula, and F distribution table is looked into, threshold value is calculated,

The T2 value of samples all in sample database is compared with threshold value, rejects the sample for being greater than threshold value.

5. such as method of any of claims 1-4, which is characterized in that step (5) utilizes the Monte Carlo methods of sampling Randomly drawing sample forms training subset, and Gaussian process regression model is established in the subset after dimensionality reduction；

Preferably, step (5) n times are repeated, N number of training subset and submodel are obtained；It is highly preferred that N is 200-5000.

6. method as claimed in claim 5, which is characterized in that the Gaussian process regression model in step (5) is as follows:

Gaussian process is the set that arbitrary finite stochastic variable all has Joint Gaussian distribution, it is completely by mean function and association Variance function determines, can be denoted as:

F (x)~GP (m (x), k (x, x '))

Y=f (x)+ε

ε is white Gaussian noise, is distributed as follows

Wherein following form may be selected in covariance function:

In formula: x, x ': arbitrary sample in training set；

Y: attribute value data；

M (x): mean function；

K (x, x '): covariance function.

7. such as method of any of claims 1-6, which is characterized in that step (6) utilizes P_pcaTo the close of sample to be tested Infrared spectroscopy is brought into all Gaussian processes recurrence (GPR) submodels after carrying out dimensionality reduction, and computation model estimate variance σ selects σ value The corresponding sub- training set of the smallest model is as local training set.

8. the method for claim 7, which is characterized in that the determination method of the model estimate variance σ in the step (6) It is as follows:

For the spectrum x to be measured after dimensionality reduction_p' corresponding to attribute forecast value y_pWith attribute value y corresponding to sample in training set Joint prior distribution is

Wherein

K_p=[k (x_p,x₁) k(x_p,x₂) … k(x_p,x_n)]

K_pp=k (x_p,x_p)

Spectrum x to be measured can be calculated by above formula_p' corresponding to attribute value Posterior distrbutionp, i.e. y_pEstimation mean value and variance such as Shown in lower:

μ=K_pK^-1y

σ²=K_pp-K_pK^-1K_p。

9. such as method of any of claims 1-8, which is characterized in that the wave number section in the step (7) determines step It is rapid as follows,

The number of iterations t=1 ..., g are taken, is repeated the steps of:

(b) sampled probability of each variable is calculatedAnd it is taken out from all wave number points according to sampled probability Take k variable；

(d) spectrum matrix D is reconstructed using the obtained score matrix of PLS and loading matrix ', and the error e of each variable is calculated by office_x；

In formula, e_xj: the mean error of j-th of variable；

K: total number of samples；

D_ij: for the original number of j-th of variable of i-th of sample；

D′_ij: for the reconstruct number of j-th of variable of i-th of sample；

(e) error e is calculated_y；

In formula, e_y: root-mean-square error e_y；

K: total number of samples；

y_i: for the true value of i-th of sample；

For the predicted value of i-th of sample；

(f) by e_xWith e_yIt brings into and calculates error in following formula

err_t=e_xj+βe_y

In formula,The error of the t times iteration；

(g) the new weight of variable is calculated:

After right value update, into next iteration；

(h) after iteration stopping, the weight of each variable is ranked up from big to small, z variable is as last modeling before choosing When the variable that uses.

10. a kind of oil property prediction technique based near infrared spectrum detection, which is characterized in that the described method includes:

(i) atlas of near infrared spectra of crude oil to be detected is measured；

(ii) model pair for oil property prediction constructed using the method for any of claims 1-11 The oil property is predicted.