Set up the true and false model method of DPLS-BS-UVE quick discriminating honey
Technical field
The model method true and false the present invention relates to set up DPLS-BS-UVE quick discriminating honey, belongs to field of food safety.
Background technology
Honey is by the nectar of honeybee herborization, secretion or honeydew, after being combined with itself secretion, through fully brewageing
Natural sweet substance, be a kind of nutritious natural nourishing food.Its main component is carbohydrate, and wherein 60%-80% is
Glucose and fructose that human body easily absorbs.Additionally, containing the plurality of inorganic salt and vitamin with human serum concentration comparable, various
The trace element and several amino acids of organic acid and beneficial health, and containing the very strong invertase of bioactivity and
Amylase etc., with high health-care effect and nutritive value, is well received by consumers.China's honey national Specification,
Must not be added in honey or be mixed into any starch, carbohydrate or for glucide.In recent years, domestic and international market is to honey
Demand constantly expand, but the yield of honey is difficult to meet the demand in market, and due to the difference in nectar source, the species of honey
It is various, complicated component so that the detection technique of honey quality has very big limitation.Under the ordering about of tremendous economic interests, a lot
The honey that lawless person adds other low-quality in high-quality honey is adulterated, or HFCS etc. is mixed in honey
Sweet substance is mixed the spurious with the genuine, and greatly compromises the interests of beekeeper, consumer and normal honey manufacturing enterprise, has had a strong impact on honeybee
The market order of sweet product and the export trade of China's honey product.Honey quality detection technique encounters very big challenge.This
A little opportunities that honey adulteration is all provided to lawless person.
Current existing authentication technique uses and a certain in honey or a certain class material is analyzed.But fake producer
It is to be allocated according to the technical requirement of national standard, the adulterated honey items Physico-chemical tests index being made is complied fully with
National standard, therefore be difficult to make a distinction it with real honey using existing authentication technique.
Boot strap(Bootstrap, BS)It is a kind of method for resampling and is suitable for the analysis of Chemical Measurement.The method
N sample number can be produced, and a larger resampling number of times is able to ensure that data structure realizes simulation well.Become without information
Amount null method(UVE)It is a kind of Variable Selection method based on offset minimum binary correlation analysis, can be used for extraction and model is not intended to
The variable of justice(DPLS-UVE).
But the identification that boot strap is combined with without information variable null method for differentiating adulterated honey is had no into report.
The content of the invention
It is adulterated so as to differentiate present invention aim to address being analyzed with a certain in honey or a certain class material at present
The drawbacks of honey, boot strap is combined with without information variable null method, such that it is able to fast and accurately differentiate adulterated honey.
First, the present invention sets up the true and false model method of DPLS-BS-UVE quick discriminating honey, comprises the following steps:
(1)Sample of the sum not less than 100 is collected, including mixing for honey sample, syrup sample and known mixing ratio
There is the honey sample of syrup;
(2)Sample is analyzed using nuclear magnetic resonance chemical analyser, obtains respective nuclear-magnetism Metabolic fingerprinting;
(3)The nuclear-magnetism Metabolic fingerprinting that will be obtained is changed to form exemplary two dimensional matrix, wherein row represent analysis sample
Species, row represents peak area of the correspondence analysis sample in a certain chemical shift interval;
(4)By step(3)The two-dimensional matrix for obtaining is carried out data processing and is covered with data caused by eliminating content difference, and
After observation variable strongly out-of-bounds is rejected, analyzed using DPLS-BS-UVE;The mark of honey adulteration is screened, and to indicate
The corresponding spectral peak peak area of thing is X variables, and sample type is Y variables, and numerical value is " 0 ", represents pure honey sample, and numerical value is " 1 ",
The adulterated honey sample of syrup is represented, threshold value is set to 0.5;The corresponding data matrix X of mark and class variable Y are carried out linearly
Return, obtain multiple linear regression equations,
Y=-0.06290X1-0.07438X2+0.08985X3-0.09160X4-0.07896X5+0.07828X6+0.8595;
Wherein Y represents true honey or false honey, X1、X2、X3、X4、X5And X6Respectively represent subsection integral after the 130th, 152,
397th, 419,457 and 463 peak areas of spectral peak;
(5)Using step(4)The multiple linear regression equations for obtaining are predicted to unknown sample, if Y value close to 1 ±
0.5, then true honey is accredited as, if Y value is close to 0 ± 0.5, it is accredited as false honey.
The reliability and representativeness of method for building up in order to ensure, we have collected as much as possible including all of honey kind
With the honey in the place of production and the syrup of multiple types as far as possible.It is on the market qualified through the adulterated index test in laboratory to collect
Honey and the underproof adulterated honey of test in laboratory, and the honey and syrup sample that enterprise provides, wherein honey sample
Including the sweet and non-ripe honey of maturation, honey is planted and is related to Yang Huaimi, rape honey, Mel, chaste honey, clover honey, honey of lychee flowers, dragon
Eye honey, Mel Jujubae, matrimony vine honey, sunflower honey, Radix Astragali honey, motherwort honey, loquat honey, Radix Codonopsis honey, the sweet and miscellaneous nectar of fennel seeds;Honey
Processing technology include dehydration, decolourize, de- antibiotic;The place of production that honey sample is related to includes:Jiangsu, Henan, Xinjiang, Sichuan,
Inner Mongol, Hubei, Liaoning, Jilin, Shaanxi, Shandong, Gansu;Syrup sample includes sugar beet molasses, rice syrup, cassava syrup, small
Wheat syrup, HFCS or its mixing molasses.
Processed before needing to carry out sample before carrying out nuclear magnetic resonance chemical analyser and being analyzed sample, specifically included following
Step:
1. to the laboratory sample of nodeless mesh, it is stirred for uniformly, to the sample for having crystallization, in the case of closed, being placed in
Warmed in water-bath no more than 60 DEG C, vibration, stirred evenly after sample all melts, be rapidly cooled to room temperature;
2. by the honey sample after thawing, filtered through 0.10mm-0.14mm apertures nylon filtering cloth, solid is miscellaneous in removing honey
After matter, 0.25 g samples accurately being weighed in centrifuge tube, adding 1mL heavy water, dissolving is complete;
3. to step 2. in solution in, add 200 μ l concentration for 1.5mol/L PBS(pH=4.0)
With 100 μ l containing the 3- TMS -1- propyl sulfonic acid sodium inner mark solutions that volume fraction is 0.05%, vortex oscillation is to uniform
Mixing, centrifugation, takes supernatant in nuclear magnetic tube;
4. adjust magnetic resonance spectroscopy instrument and nuclear magnetic tube is positioned in analyzer, rotation nuclear magnetic tube is in case NMR is detected.
In order to the step of further preferably, we disclose preceding method(2)The spectral analysis of the nuclear magnetic resonance bar of middle sample
Part is:Temperature of the measurement(Probe temperature):25 DEG C of room temperature;Ambient humidity: 35%;The MHz of observed frequency 400;Pulse protocol is:
Jresgpprqf;Spectrum width: 6410 Hz;90 ° of pulse widths: pl= 6. 45μs;Pulse delay time: dl= 15 s;Adopt
The collection time: 4 s;Accumulative frequency: 240.
Based on preferred spectral analysis of the nuclear magnetic resonance condition, the step of we are further disclosed in method(4)At middle data
Manage carries out peak alignment, Baseline wander and phase place is corrected using Mestrenova softwares by the original Metabolic fingerprinting for obtaining.
With reference to foregoing Spectrum Analysis condition, 3- TMS -1- propyl sulfonic acid sodium is done internal standard compound by us, chemistry
Displacement is set to 0.00 and carries out spectral peak alignment;Baseline correction selects polynomial fit fit approach;Mutually correction selects global
With metabonomics algorithms elder generation Automatic Optimal, manual correction is carried out for specific region afterwards, make the integrated value as much as possible be
On the occasion of.
Then will1H chemical shifts δ is that the spectral strength in 4.69~5.20 intervals is set to zero, and from data matrix
Reject, while the spectral peak between interior target chemical shift δ 0.00-0.10 is also rejected from data matrix.
DPLS-BS methods are a kind of classical method for resampling, and more data are obtained by the sampling to initial data.
Need not carry out apriority it is assumed that and have been used to the Variable Selection research of PLS analyses.The method is firstly the need of foundation
DPLS models, the confidential interval of its regression coefficient can be estimated based on all bootstrap samples.Finally, system is returned by by PLS
Part of several confidential intervals not comprising " 0 " is used as biomarker.
Because not made to any of parameter distribution it is assumed that therefore non-parametric bootstrap methods can be used to estimate
Count the average of coefficient correlation()And confidential interval(D).Under certain significance, if confidential interval includes " 0 ", then related
Property significantly and as singular value reject./ D values show that more greatly correlation is more notable, and it is arranged according to sizes values descending, arrangement
More preceding contribution rate is bigger, can be used as biomarker.
DPLS-UVE methods:In data setxWithyThe linear relationship of variable can be expressed as:,X(n × p) represents sample
Number,y (n × 1) represents variable response,β (p × 1) is coefficient correlation,e (n × 1) represents can not be by model
The error of explanation.
DPLS-UVE is a kind of based on analysisβThe Variable Selection method of coefficient correlation, phase relation is calculated by validation-cross
Several matrixesβ = [β 1 ,…, β p ], the critical value of the reliability of the adjustment modelc j It is coefficient correlationβ j With standard deviation s (β j ) ratio,
Calculating formula is shown in below equation:
j = 1, 2, …, p
In order to estimate critical value, one and former X matrix random matrix of a size are added in data set, for simulating
Noise, this group of data value is not in a model.Calculate random matrixc j Value, the critical value for building ratio.If variablej
'sc j Value is less than random matrixcValue(c artif )Maximum, will be considered as without information variable, if i.e. abs (c j ) < abs
(max(c artif )),jThe actual value of variable will be removed.In this experiment, calculated using a kind of sane methodcValue, calculating side
Method is:c j = (median (b j )/ interquartile (b j )
DPLS-BS-UVE methods:DPLS-BS is combined with UVE algorithms, BS methods are used for adopting again for sample in the method
Sample expands original small sample amount, and UVE is used for filter information variable as biomarker.
Used as a preferred technical scheme, we are further in step(4)Middle use Hotelling ' s T2
Range algorithms obtain strong observation variable out-of-bounds.For multidimensional normal distribution, the t2 values of all samples form Hotelling '
S T2 Range scheme.When t2 exceedes critical value(0.01 used as confidential interval), the observation variable be considered as it is strong out-of-bounds
Value, is rejected.
Preferably, in order to caused by preferably eliminating content difference data cover, it would be desirable to data concentrate data
Pre-processed, using from upscaled method, after data are entered into line translation, made weight phase of its all spectral peak in classification
Together, formula is:
Wherein, it is the variance of variable j.
Used as a preferred embodiment, we are further disclosed in step(3)In the interval peak of a certain chemical shift
Area refers to, using the method for subsection integral, the peak area that subsection integral is obtained to be carried out by 0.01ppm by every section of width △ δ,
Specifically, using the method for subsection integral, limit of integration be chemical shift δ 0.10-10.0 between, be by every section of width △ δ
0.01ppm carries out subsection integral.Normalized using peak area, obtain the data of integrating peak areas value.In data set, row is represented
The interval peak area of a certain chemical shift, is classified as the species of sample, for further statistical analysis.
The method that the present invention is set up can be used in the quick discriminating of adulterated honey.
The present invention has advantages below:
1. non-invasi, will not destroy the structure and property of sample, be capable of all information of actual response honey;2. sample
Product pre-treatment is simple, it is only necessary to which heavy water dissolves, and can avoid due to the loss of the minor constituent caused by separation;3. signal in spectrogram
The relatively strong and weak relative amount for directly reflecting each component in sample;4. the multiple marks looked for based on DPLS-BS-UVE algorithms
Note thing, true and false to honey can carry out quick, precise classification, and more comprehensive, sensitivity is high, accurate for single index
Property is good.
It is used for the examination of honey adulteration mark, compared with parametric test method, the party by DPLS-BS-UVE algorithms
Method carries out a priori assumption without the distribution to data;The data structure few for sample size, adopting again for sample is done using BS methods
Sample is enlarged sample size and avoids over-fitting from occurring.The method has that analysis time is short, integrality is strong, specificity good, can
The advantages of control property is strong, accuracy is high, can make up the defect that current honey adulteration discrimination method is present.
The implementation of this research to promoting industry scientific and technological progress, specification bee's product market, ensure consumer legitimate right and
Promoting the sound development of China's bee product cause has highly important meaning.
Brief description of the drawings
Fig. 1 is Hotelling ' the s T2 Range figures of honey group;
Fig. 2 is Hotelling ' the s T2 Range figures of syrup sample.
Specific embodiment
In order to be better understood from the present invention, below we the present invention is further explained in conjunction with specific embodiments
State.
Embodiment
Institute's instrument, equipment and reagent etc. are as follows in the present embodiment:
Experimental facilities includes:The type NMRs of Advance 400(Brooker,Switzerland company), 14.1 T superconducting magnets,
5 mm double-core z- gradient probes and the controlling tests of Topspin 2.3 and data processing software;5mm nuclear magnetic resonance sample pipes;At a high speed
Centrifuge(Sigma, German company);Turbine mixer(XW-80A types, Instrument Factory, Shanghai Medical Science Univ.);LP403 assay balances
(Sai Duolisi, German company);
Test solvent includes:Deuterated water(Deuterium band width is 99.8%)It is purchased from Cambrideg Isotope Laboratories
Company;3- TMS -1- propyl sulfonic acid sodium(TSPSA)It is purchased from Aldich-Sigma companies;Dipotassium hydrogen phosphate and phosphoric acid
Sodium dihydrogen(Top pure grade)It is purchased from Aldich-Sigma companies.
Data processing software includes:The softwares of SIMCA-P 12.0(Sweden, Umetrics AB, Ume),
MestReNova 5.3.1 softwares(Spain, MestreLab Research SL)
Honey sample is collected
Honey sample is provided by apiculture Co., Ltd of Beijing Tongrentang, including:Poplar Chinese scholartree honey, rape honey, Mel, the twigs of the chaste tree
Honey, clover honey, honey of lychee flowers, honey of lungan flowers, Mel Jujubae, matrimony vine honey, sunflower honey, Radix Astragali honey, motherwort honey, loquat honey, Radix Codonopsis honey,
The sweet and miscellaneous nectar of fennel seeds;The processing technology of honey includes dehydration, decolourizes, takes off antibiotic;The place of production that honey sample is related to includes:
Jiangsu, Henan, Xinjiang, Sichuan, Inner Mongol, Hubei, Liaoning, Jilin, Shaanxi, Shandong, Gansu;All of honey sample is through laboratory
Detection of adulterations index is all qualified.Honey sample amounts to 61.
Syrup sample is appointed including sugar beet molasses, rice syrup, cassava syrup, wheat syrup, HFCS and these syrup
The mixing molasses that meaning is mixed to get,.
Adulterated honey sample is essentially from the market through the underproof sample of the adulterated index test in laboratory, syrup
The sample that sample and honey sample arbitrary proportion are mixed to get;Syrup and adulterated honey total sample number amount to 82.
Sample preparation
To the laboratory sample of nodeless mesh, it is stirred for uniformly, to the sample for having crystallization, in the case of closed, being placed in not
Warmed in water-bath more than 60 DEG C, vibration, stirred evenly after sample all melts, be rapidly cooled to room temperature;By the honey after thawing
Sample, filters through 0.10mm-0.14mm apertures nylon filtering cloth, in removing honey after solid impurity, accurately weighs 0.25 g samples
In centrifuge tube, 1mL heavy water is added, dissolving is complete;Add the PBS that 200 μ l concentration are 1.5mol/L(pH=
4.0)With 100 μ l containing the 3- TMS -1- propyl sulfonic acid sodium inner mark solutions that volume fraction is 0.05%, vortex oscillation
3min is centrifuged 10 min to uniform mixing, takes supernatant in nuclear magnetic tube.
The foundation of nuclear-magnetism fingerprints database
Carried out on 400 MHz nuclear magnetic resonance chemical analysers1HNMR is tested, and experimental temperature is 298K, is locked using D2O
, water peak is suppressed using presaturation method;Wherein each parameter setting is as follows in Topspin softwares:PULPROG=
Jresgpprqf, AQ_mod=DQD, TD=32K, NS=8*N, DS=4, TD0=1, SW=6000 Hz, D1=3 s, DE=4
μ s, D8=0.1s, signal is cumulative 240 times, respectively obtains the NMR signal of honey, adulterated honey and syrup sample,
Set up the true and false model method of DPLS-BS-UVE quick discriminating honey
The data processing software carried with nuclear magnetic resonance spectrometer carries out Fourier to the NMR signal of gained to be become to change,
Conversion points are 131072, and index linewidth factor is 1Hz, obtains the nuclear magnetic spectrum of sample.
1. spectrogram pretreatment is pre-processed the original fingerprint collection of illustrative plates of acquisition with Mestrenova softwares, including spectral peak
Alignment, baseline correction and phasing.Internal standard compound is done with 3- TMS -1- propyl sulfonic acid sodium, chemical shift is set to
0.00 carries out spectral peak alignment;Baseline correction selects polynomial fit fit approach;Mutually correction selection global and
Metabonomics algorithms elder generation Automatic Optimal, carries out manual correction for specific region afterwards, makes integrated value as much as possible for just
Value;
2. data are extracted as eliminating influence of the residual water signal to analysis result, will1H chemical shifts δ be 4.69~
The spectral strength in 5.20 intervals is set to zero, and is rejected from data matrix, at the same by interior target chemical shift δ 0.00-0.10 it
Between spectral peak also from data matrix reject.In order to reduce the influence of noise and chemical shift drift, using the side of subsection integral
Method, limit of integration be chemical shift δ 0.10-10.0 between, by every section of width △ δ for 0.01ppm carries out subsection integral, be obtained
939 integrating ranges.Normalized using peak area, obtain the data of integrating peak areas value.In data set, row represents a certainization
The interval peak area of displacement study, is classified as the species of sample, and honey sample is obtained 61 data sets, and syrup and adulterated honey there are
To 82 data sets, for further statistical analysis.
3. the selection of data prediction and observation variable
Data prediction integrates relative value data matrix to the intensity of pure honey sample, syrup, adulterated honey, carries out pre-
Process to eliminate the system change of sample room.Standardized method is used from upscaled, and after data are transformed, all spectral peaks are in classification
In weight it is identical, data are covered caused by can eliminating content difference.
When observation variable is selected, observation variable strongly out-of-bounds is by Hotelling ' s T2 for the selection of observation variable
Range algorithms are obtained.For multidimensional normal distribution, the t2 values of all samples form Hotelling ' s T2 Range figures.Work as t2
During more than critical value(0.01 used as confidential interval), the observation variable is considered as strong outlier, it should pre- to reject, such as
Shown in Fig. 1, Fig. 2.Therefore 5 samples are rejected in true honey group, 7 samples is rejected in false honey group, remaining all sample is used for
The model of next step is set up;
4. the type of the true and false model method class variable Y representative samples of DPLS-BS-UVE quick discriminating honey is set up,
It is " 1 " by pure honey sample setting value, the categorical variable numerical value of syrup and adulterated honey sample is set to " 0 ", threshold value is set to
0.5 ;Step two-dimensional matrix 3. is carried out into DPLS-BS-UVE analyses successively, after sample replaces 2000 times again, mark is looked for
Thing is all higher to the recognition capability and predictive ability of unknown sample, also tends to stabilization, as a result as shown in table 1;
Table 1:The adulterated label obtained by DPLS-BS-UVE methods under different resampling number of times
Therefore, the corresponding nuclear-magnetism segment displacement conduct of P130, P152, P397, P419, P457 and P463 is finally chosen successively
On the maximum variable of true and false honey influence of classifying.By correlation analysis, the factor correlativity coefficient value of each variable is obtained, according to
This, the equation of linear regression of the honey discriminant function that obtains distinguishing the true from the false is as follows:
Y=-0.06290X1-0.07438X2+0.08985X3-0.09160X4-0.07896X5+0.07828X6+0.8595;
Wherein Y represents true honey or false honey, X1、X2、X3、X4、X5And X6The 130th, 152,397,419,457 are represented respectively
With 463 peak areas of subsection integral;If Y value is close to 1 ± 0.5, true honey is determined as, if Y value is close to 0 ± 0.5, differentiated
It is false honey.
6. the true and false model method of the DPLS-BS-UVE quick discriminating honey of foundation enters prediction and incites somebody to action to unknown honey sample
Whether known be that the to be identified 32 part honey sample of adulterated honey is operated by 1. ~ method 3. in step 5, obtains phase
The relative Value Data of intensity integration answered, steps for importing 4. in the discriminating equation that obtains.According to the equation of linear regression set up, meter
Calculation obtains the numerical value of Y, so that quick judgement sample belongs to pure honey or the adulterated honey of syrup.Specifically it is shown in Table 2;
Table 2:Unknown sample Y value result
Sample of the calculated value between 1 ± 0.5 is pure honey sample, and predicted value between 0 ± 0.5 is the adulterated sample of syrup, is pressed
Regular like this, 11 pure honey samples, 21 adulterated honey samples of syrup are obtained for correct classification, differentiate that accuracy is
100%.