CN103499610B

CN103499610B - A kind of intelligence of the characterization of variation of honey otherness based on independent component analysis sense of smell TuPu method extracting method

Info

Publication number: CN103499610B
Application number: CN201310323359.8A
Authority: CN
Inventors: 史波林; 刘宁晶; 赵镭; 汪厚银; 支瑞聪; 裴高璞; 解楠; 张璐璐
Original assignee: China National Institute of Standardization
Current assignee: China National Institute of Standardization
Priority date: 2013-07-30
Filing date: 2013-07-30
Publication date: 2016-01-27
Anticipated expiration: 2033-07-30
Also published as: CN103499610A

Abstract

Based on a characterization of variation of honey otherness intelligence sense of smell TuPu method extracting method for independent component analysis, it is characterized in that principal component analysis (PCA) is classified according to maximum variance between data, i.e. the second moment of data, have ignored the independence of data in High Order Moment; Independent component then utilizes the High Order Moment calculated between data to convert matrix, the associated row between proper vector can be reduced further, strengthen signal compression effect, when independent component is 14, differentiate that accuracy rate is 94.94%, honey sample 75/79, wherein rape honey 22/23, Mel 16/17, acacia honey 37/39.

Description

A kind of intelligence of the characterization of variation of honey otherness based on independent component analysis sense of smell TuPu method extracting method

Technical field

The application relates to a kind of characterization of variation of honey otherness based on independent component analysis intelligence sense of smell TuPu method extracting method.

Background technology

China's honey output occupies first place in the world, and output keeps the trend that increases fast always in recent years, is increased to 40.2 ten thousand tons in 2009, accounts for Gross World Product and also bring up to more than 30% by nearly 20% by 25.2 ten thousand tons of calendar year 2001.But due to the driving of economic interests, current honey market is seriously adulterated, cause adulterated honey to occupy 20% ~ 30% of honey market, the bee product of some regional adulterated fraud accounts for about 50%, badly damaged consumer's interests, affects honey industry and develops in a healthy way, hits the export trade and earn foreign exchange.

Owing to lacking the impact of detection means, cause adulterated strike difficulties, its basic reason is as follows: (1) due to the main matter of honey itself relatively simple for structure, comprise water and carbohydrate content, to adulterated condition of providing convenience, meanwhile, depend merely on detect this several content of material number can not differentiate at all whether adulterated; (2) because honey is by the temperature and humidity of nectariferous plant kind, hive gesture power, sweet time phase length, air, and the various factors such as the processing of honey, storage, crystallization, cause the content range of honey main matter to change greatly, make honey adulteration simple, convenient; (3) detection of adulterations such as C4 costly, cannot detect and law enforcement for reality on a large scale.

Fragrance is one of important attribute that product quality embodies, and product fragrance characterizes needs its objectivity outstanding, authenticity and comprehensive.Current gas chromatography (GC), gas chromatography-mass spectrography (GC-MS) and gas chromatography-smell methods such as distinguishing (GC-O), monomer aroma substance that can only be limited in testing product, and there is the phenomenons such as collaborative, modified tone between these fragrance, be difficult to the flavouring essence quality reflecting sample on the whole.And Intelligent Olfaction System (Electronic Nose) can smell news feature by simulating human, the Global Information of comprehensive characterization fragrance, embodies olfactory characteristic and the overall quality of fragrance, simultaneously more objective than the sense of smell of people, reliable.At present in food freshness, the rotten differentiation of edible oil, the detection of fruits and vegetables degree of ripeness, tea-leaf producing area variety ecotype, drinks brand define etc., carry out correlative study.

Containing more than 300 kind of aromatic substance in honey, therefore it is the important sample that the intelligent sense of smell of research characterizes; Simultaneously different nectar source, its flavor substance of Different sources are different, and honey adulteration whether or quality can embody to some extent on overall fragrance, make fragrance become honey quality and detect and one of important indicator of adulterated discriminating; Absolutely prove and adopt intelligent sense of smell characterization of variation of honey quality to have feasibility, also for honey quality detects and adulterated discriminating provide a kind of fast, economical, accurately and be beneficial to the detection method applied in real time.Therefore select honey to have Practical significance as research object, far-reaching value is had more to its industry healthy development.

Electronic Nose is adopted to carry out product quality differentiation or adulterated discriminatory analysis, its essence is the overall fragrance information utilizing intelligent sense of smell collection of illustrative plates, find the otherness of sample room, its core finds the profile information of otherness between representative sample, i.e. " differentiation information ", also cry " the differentiation profile information of intelligent sense of smell ".But the sensor array of Electronic Nose has cross-sensitivity, namely every root sensor has response in various degree to each fragrance, therefore the aroma-producing substance collection of illustrative plates gathered by Electronic Nose has the feature such as wide spectrum, overlap, be difficult to the naked eye distinguish different sample from collection of illustrative plates separately, need to carry out " signal excavation ", particularly the excavation of differentiation information " between the representative sample ", the otherness information of excavation is more, more contributes to distinguishing product feature and quality efficiently.But also very weak in differentiation information excavating at present, be also the bottleneck of restriction Electronic Nose development.

Summary of the invention

A kind of intelligence of the characterization of variation of honey otherness based on independent component analysis sense of smell TuPu method extracting method, according to the division in western part, China geographic area, south China, North China, East China, northeast, select 5 kinds of different nectar sources as research sample, be respectively: 1) rape honey, pick up from Fuling Chongqing and the Yongchuan District in west area; 2) honey of lychee flowers, picks up from the Nanning of South China; 3) chaste honey, picks up from the ground such as the Miyun Region of Beijing of North China; 4) acacia honey, picks up from the Laiyang Shandong Province in East China; 5) Mel; Gas sensor array is utilized to detect testing sample honey from the Adsorption of different volatile ingredient, it is characterized in that principal component analysis (PCA) is classified according to maximum variance between detection data, the i.e. second moment of data, have ignored the independence of data in High Order Moment; Independent component then utilizes the High Order Moment calculated between data to convert matrix, associated row between further reduction proper vector, strengthen signal compression effect, when independent component is 14, differentiate that accuracy rate is 94.94%, honey sample 75/79, wherein rape honey 22/23, Mel 16/17, acacia honey 37/39.

Accompanying drawing explanation

Fig. 1 abnormity point elimination result: (a) mahalanobis distance differentiates result; B () lever value differentiates result;

Fig. 2 is based on the feature extraction result of variance ratio

The feature point extraction result that Fig. 3 differentiates based on individual event amount

Fig. 4 ant group algorithm process flow diagram

Fig. 5 is based on the feature extraction result of ant group algorithm

Fig. 6 is based on the feature point extraction result of core principle component analysis

Fig. 7 is based on the feature point extraction result of independent component analysis

Fig. 8 searches the support vector machine parameter optimization result of element based on grid

Fig. 9 is based on the support vector machine parameter optimization result of genetic algorithm

Figure 10 is based on the support vector machine parameter optimization result of particle cluster algorithm

Embodiment

1 about sample collection and preparation

For making studied nectar source difference representative, according to the division in China geographic area (western part, south China, North China, East China, northeast), select 5 kinds of different nectar sources as research sample, be respectively: 1) rape honey, pick up from Fuling Chongqing and the Yongchuan District in west area; 2) honey of lychee flowers, picks up from the Nanning of South China; 3) chaste honey, picks up from the ground such as the Miyun Region of Beijing of North China; 4) acacia honey, picks up from the Laiyang Shandong Province in East China; 5) Mel, picks up from the ground such as Jilin Dunhua and Harbin, Heilungkiang in northeast.For ensureing authenticity and the accuracy of experiment sample, avoid the interference of market business honey processing technology, sample is directly buied by beekeeper by China Agriculture Industitute Bee Research Center.

Different reagent bottle is placed in respectively according to different nectar source, Different sources after sample collection.For guaranteeing to study the interference of not examined condition difference, under being stored in-18 DEG C of conditions after sample collection, after treating all samples collection, unification is tested.Before experiment, sample is from after taking out-18 DEG C, and about 60g respectively got by 5 kinds of nectar source samples, is placed in 40 DEG C of constant water bath box, heating water bath 15min, honey sample is melted, preservation at remaining sample continues to be placed in-18 DEG C.For ensureing that sample melts completely, nodeless mesh during heating water bath, need every 3min concussion during water-bath once.After sample water-bath completes, take out under being placed in room temperature and cool more than 1h, until sample temperature is consistent with room temperature (20 DEG C).

2 detection by electronic nose methods

Electronic Nose utilizes gas sensor array to detect testing sample honey from the Adsorption of different volatile ingredient.After honey volatile ingredient and sensor characteristics are adsorbed (comprising physisorption and chemisorption), change semiconductor transducer top layer strength of current.By digital conversion, obtain the response curve of each sample, thus detection analysis is carried out to sample.The present invention adopts Fox4000 type Electronic Nose (AlphaMOS, France), and this Electronic Nose is made up of 18 Metal Oxide Semiconductor Gas Sensing sensors (MOS) and HS100 headspace autosampler.

Instrument concrete operations flow process is as follows:

1) honey sample being cooled to room temperature after water-bath being added volume is as requested in the ml headspace bottle of 10ml.The ml headspace bottle installing sample is placed on pallet.HS100 automatic sampler holds at most 2 pallets, and each pallet can place 32 ml headspace bottle.

2) set instrument testing conditions as requested, comprise head space sampling and detection by electronic nose condition.According to nectar source kind and detection ordering, ml headspace bottle each on pallet is encoded.

3) ml headspace bottle is placed into head space indoor heats according to the condition arranged, and during heating, the concussion of ml headspace bottle interval, ensures headspace gas homogeneity.After head space sample preparation terminates, extract headspace gas, in Injection Detector, and by ml headspace bottle from the indoor taking-up of head space.Fox4000 is the injection of continuous type air-flow, gas enter detect after gas with each sensor generation adsorption and desorption reaction enclosure, and the response curve of each self-generating response.

Simple sample can obtain 18(18 root sensor) * t(detection time) signal matrix.Maximum (little) value of each sensor is analyzed as the response of this sensor by classic method.

3 based on the honey quality modeling method of Electronic Nose information

Utilize the Electronic Nose characteristic information extracted to set up support vector machine discrimination model, the sample in different nectar source is classified.Traditional mode identification method is the progressive theory be based upon on great amount of samples basis, but it is individual in production application, due to the restriction of each side condition, a large amount of sample numbers is often difficult to be ensured preferably, under the condition of small sample, according to traditional statistical basis, be difficult to obtain comparatively ideal results of learning and extensive effect.But support vector machine is applicable to the modeling requirement under condition of small sample, thus pattern-recognition judgement is carried out to different nectar sources sample.

Support vector machine (SupportVectorMachine, SVM) theory is Vapnik(1995) on traditional statistical learning basis, integrated structure principle of minimization risk, the feature for finite sample proposes.The method effectively can reduce the random row of setting parameter in traditional mode model of cognition, overcomes the deficiency of empiric risk and expected risk generation bigger difference in model process of establishing, and concrete SVM is theoretical as follows.

In pattern-recognition, obtain an optimization function f (x, w), make it to unknown sample collection (x _i, y _i) (i=1,2 ..., n; y for specimen number) when assessing, expected risk R (W) is minimum:

Wherein, F(x, y) be joint distribution probability, L (y, f (x, w)) is the loss predicted y with f (x, w) and cause, and be called loss function, for two quasi-mode identification problems, L can be defined as:

In conventional learning algorithms, employing be empiric risk R _emp(W) minimization principle, namely

But in fact, the minimizing of training error is difficult to the optimum efficiency ensureing prediction, often easily occurs the phenomenon of over-fitting, meanwhile, show through further research, experience R _emp(W) there is following relation with practical risk R (W):

Be abbreviated as

Wherein h is the VC dimension of function, and η is confidence level, and n is training sample number.

As can be seen from the above equation, for making the classification function practical risk of design minimum, empiric risk not only to be made to reduce as far as possible, also will increase training set number simultaneously or reducing function VC dimension, could practical risk be reduced.This thought is structural risk minimization.

Based on above theory, to sample set (x _i, y _i) (i=1,2 ..., n; X is the proper vector of sample i, y for specimen number) when differentiating, find discriminant function , W and b is normalized and after equal proportion adjustment, makes can meet for all samples , now the classifying distance of two class samples is spaced apart .Therefore for obtaining prediction effect of better classifying, two class samples should be made to separate as far as possible, namely ask minimum value.Meet point, inner classification plan range is minimum, and they determine optimal classification function, these point be referred to as support vector (SupportVector, SV).

Under this condition, optimization problem can be converted into the problem of optimal classification function:

Optimization problem is converted into dual problem and then can be expressed as:

Wherein α _ifor Lagrange (Lagrange) factor for constraint condition (7), i=1,2 ... n, W are the slope of classification function, and b is the intercept of classification function.

For linearly inseparable problem, V.Vapanik introduces kernel function theory, namely at lower dimensional space, data are passed through in Nonlinear Mapping projection value higher dimensional space, can prove, if select suitable kernel function, can be the data that higher dimensional space neutral line can divide by inseparable for lower dimensional space neutral line data transformations.After introducing kernel function, full scale equation can be converted into:

Wherein K is selected kernel function.

By solving kernel function, finally corresponding classification function can be determined:

The pattern recognition step of SVM entirety can be summarized as a few step:

(1) suitable kernel function K is selected;

(2) solve corresponding optimization method, obtain support vector;

(3) optimal classification function f (x) is obtained

(4) classification differentiated is determined according to the value of sgnf (x);

Parameter optimization in 4 detection by electronic nose honey

4.1 determine parameter to be optimized and level

Detection by electronic nose parameter can divide head space parameter and detected parameters.Wherein detected parameters can be divided into again sample introduction parameter and signals collecting parameter.Consider the detection feature for instrument that detected parameters is reacted, when instrument stabilizer, it is less on the impact of testing result.Head space parameter then affects the generation of sample headspace gas, and headspace gas is then the direct-detection object of Electronic Nose, the testing result that namely directly impact is final.Therefore, emphatically head space parameter is optimized in the present invention.The head space parameter of Electronic Nose mainly comprises head space temperature and head space time, considers that the difference of different sample size in ml headspace bottle also can affect final testing result simultaneously, therefore finally selects sample size, head space temperature and head space time to be optimization object.For selecting optimum combination, orthogonal experiment is utilized to be optimized Three factors.When selecting the varying level of each factor, consider that ml headspace bottle (10ml) is in head space indoor needs concussion heating, touch fluid sample for preventing sample introduction needle and affect instrument performance, sample maximum may not exceed 1/2 of ml headspace bottle.According to the density (being about 1.4g/ml) of honey, determine that three levels of headspace sampling amount are respectively 4g, 5g, 6g.In the level of head space temperature is selected, according to list of references, honey sample character under higher than 68 DEG C of conditions easily changes, and three therefore selected levels are respectively 40,50,60 DEG C.In head space selection of time, consider rate request, honey sample short-term stability in high temperature environments that large sample amount detects and the effumability feature of honey sample, select the shorter head space time, three levels are respectively 120s, 180s, 240s.Finally determine the optimal conditions of Three factors-levels, i.e. sample size 4g, 5g, 6g, head space temperature 40,50,60 DEG C, head space time 120s, 180s, 240s, each factor and level as shown in table 1.The honey sample that research institute is selected is 5 kinds of different nectar sources honey samples, is respectively rape honey, acacia honey, chaste honey, honey of lychee flowers, Mel, every class nectar source 6 increment product, amounts to 30 increment product.

Nominal price experiment table selected by experiment is L ₉(3) ⁴, the design of experiment table is as shown in table 2

All the other testing conditions are as shown in table 3

4.2 optimize the evaluation index and method determined

The present invention turns to guiding so that signal difference is maximum between different honey sample, selects the best detection by electronic nose condition distinguishing effect.By optimum detection condition, expect signal stabilization between similar honey sample, and differ greatly between inhomogeneity honey sample, thus ensure the maximization of e-nose signal difference between honey sample.

(1) similar Almost Sure Sample Stability evaluation index

In orthogonal experiment, often 3 increment product got by the lower 5 kinds of honey of group experiment, by Electronic Nose 18 sensors under calculating this experiment condition to the standard deviation average of Different categories of samples signal, weigh the stability that Electronic Nose under this condition detects similar sample signal.Computing method are as shown in formula 15,16

Wherein p is nectar source kind, C _kfor the stability of kth class sample, m are Electronic Nose number of probes, n _kfor the number of sample in k class nectar source, for the response signal of i-th sample j root sensor in the sample of k class nectar source, for k class nectar source sample is in the response signal average of jth root sensor.(2) otherness of inhomogeneity sample

The otherness of inhomogeneity sample calculates according to the variance of inhomogeneity nectar source sample sample average under identical conditions, and computing method as shown in Equation 17

（17）

Wherein for the signal average of k class sample, for the signal average of all samples

(3) overall assessment index

Research institute expects that the optimum optimizing condition obtained is signal stabilization between similar sample, and has larger difference between inhomogeneity sample.Therefore, the evaluation index q finally determined as shown in Equation 18

4.3 based on the maximized Electronic Nose parameter optimization of signal difference between honey sample result

Using the evaluation index determined as the observed reading of orthogonal experiment, Orthogonal experiment results is as shown in table 4

As can be seen from Table 4, three kinds of factors all have a certain impact to Electronic Nose differentiation effect.Wherein factor A, factor B, namely sample size and head space temperature are directly proportional to signaling zone calibration, and factor C and sample headspace time dead space calibration are inversely proportional to, and the change of blank group is relatively little.This is the increase due to sample size and head space temperature, and in ml headspace bottle, the volatile concentration of testing sample increases gradually, can increase, distinguish effect better for the specific component content distinguished.And along with the increase of head space time, under the environment that temperature is higher, sample volatile ingredient character changes, the differentiation of sample class of such variable effect, therefore distinguishes effect and declines.Control group is stable shows no significant difference between similar sample repeatedly, and testing result is reliable.For analyzing Orthogonal experiment results further, carry out variance analysis to experimental result, variance analysis is as shown in table 5.

As can be seen from the result of variance analysis, three factors all have appreciable impact to the discrimination of sample, wherein factor A and factor B conspicuousness comparatively large (P<0.01), the variance contribution ratio of three kinds of factors is respectively 65.34%, 22.16% and 9.66%.This result shows sample size and head space temperature on the impact of sample area calibration comparatively greatly, and the head space time is less on the impact of sample area calibration.

Based on the above results, select optimum optimization combination condition to be A3B3C1, i.e. sample size 6g, head space temperature 60 C, head space time 120s, best results distinguished by the inhomogeneity sample obtained under this condition.

Exceptional sample point in 5 detection by electronic nose honey is rejected

5.1 exceptional sample point eliminating principles

Before carrying out formal analyses and honey quality modeling to the e-nose signal of honey, for ensureing the stability of analysis result and model, needing to reject the exceptional sample in the overall sample of honey, ensureing the accuracy and the reliability that obtain signal and sample information.Exceptional sample in detection by electronic nose honey comprises the exception (as sample number, classification) of sample information and the exception of testing result.Exceptional sample usually easily affects the variation tendency of overall signal, destroys the stable of disaggregated model and thinks.Therefore the rejecting of exceptional sample is extremely necessary.

Cause the factor of Electronic Nose exceptional sample mainly comprise following some: the error of (a) sample collection is mainly divided with the mistake of collecting sample, and error coded is main; The change of (b) sample shelf time character, due to sample by collect detect analyze there is certain interval, here every under, easily there is the change of chemistry, physical property in the sample of some instability; C error that () operates, comprise sample error, container cleanliness factor etc.; D error that () instrument detects, due to the change of testing environment and sensor properties, even if by the pre-treatment of signal, still have and be difficult to correct detection signal completely, still have part signal and overall average signal to have significant difference.

For obtaining larger sample size, the three kinds of nectar source honey that the present invention is directed to existing market occupation rate larger are analyzed, and rape honey, Mel and acacia honey study the analysis for the exceptional sample point in great amount of samples and rejecting.Wherein 76 rape honeys, 56 Mels, 112 acacia honeys, amount to 244 samples.This group sample is applied equally with the research of 7,8.

5.2 exceptional sample point elimination methods

(1) mahalanobis distance differentiates

Mahalanobis distance (Mahalanobis) is the evaluation index of vectorial intensity in hyperspace, is a kind of important method of multivariate data rejecting outliers.Mahalanobis distance is by calculating the departure degree between the mean vector of sample data and covariance matrix comparative sample signal, and circular is as follows:

Wherein sample mean vector, S is sample covariance matrix. for the mahalanobis distance average of sample, for mahalanobis distance standard deviation, λ is the threshold value accepting scope, x _tbe the proper vector of t sample, T is proper vector average.Hard-threshold λ=3 are set in the present invention. for acceptable mahalanobis distance scope under this threshold condition.

(2) lever value differentiates

The size of sample thick stick embodies model to the degree of dependence of this sample, and lever value is larger, relies on larger, larger on model impact.The sample being usually located at character two ends to be analyzed has larger lever value.Excessive lever value has considerable influence to model, is unfavorable for the stable of model.By the analysis to sample lever value, reject the special sample larger on model impact, thus increase the stability of model.Lever value differentiates that concrete grammar is as follows:

1, the score matrix T of sample to be tested is calculated by PCA;

2, test matrix H is calculated: ;

3, the lever value hi:hi of each sample is i-th diagoned vector in test matrix H, ;

Differentiate similar with mahalanobis distance, lever value differentiates by setting hard-threshold, removes the special sample point had compared with big lever value, thus ensures the stability of later stage forecast model.

Exceptional sample point in 5.3 honey detection by electronic nose rejects validity check

Verify for rejecting effect to sample, Bayes (Bayes) method of discrimination is selected to evaluate the accuracy rate of discrimination model before and after abnormity point elimination, compare and other mode identification methods, Bayes method of discrimination is more simple, without the need to being optimized relevant parameter.The basic thought that Bayes differentiates is that supposition has certain understanding to studied object (totally) before sampling, and conventional prior probability distribution describes this understanding.Then again priori understanding is revised based on the sample extracted, obtain so-called Posterior probability distribution, and various statistical inference is all carried out based on Posterior probability distribution.Bayesian Decision is different from classical statistical method, and its distinguishing feature is exactly when ensureing that risk of policy making is little as far as possible, applies all possible information as far as possible.

Fig. 1 is that the mahalanobis distance of analyzing samples (76 parts of rape honeys, 56 parts of Mels, 112 parts of acacia honeys) differentiates that (a) and lever value differentiate result (b).Utilize mahalanobis distance to differentiate, reject 36,53,76,79,85,99,117,240,244 altogether, amount to 9 abnormity point; And lever differentiates and rejects 36,79,216,240,244 totally 5 abnormity point altogether.

Adopt Bayes to differentiate and pattern-recognition prediction is carried out to the data after the process of two kinds of abnormity point methods, predict the outcome as shown in table 6.Found by table 6, compare lever and differentiate, mahalanobis distance differentiates that the peculiar sample spot rejected is more, but accuracy rate there is no too large difference.This result shows the exceptional sample point of the many rejectings of mahalanobis distance, although difference is larger compared with the population distribution of sample, but to not affecting greatly the result accuracy rate differentiated, therefore, in order to fully take into account the complete characteristic of sample, selecting lever to differentiate, rejecting 5 sample spot, i.e. the 36th, 76,216,240,244 5 exceptional sample, wherein rape honey, each 1 of Mel, acacia honey 3.

6 honey characteristic perfumes are analyzed and honey fragrance simulated system is set up

Application Dynamic headspace (Itex) extracts honey aroma-producing substance in conjunction with circulation collection technology, after chromatographic column end carries out the distribution of 1:1 fragrance content, application gas chromatography mass spectrometry (GC-MS) and gas chromatography-its volatility of measurement of olfaction (GC-Olfactometry, GC-O) technology Simultaneously test are fragrant composition and sensory characteristic.Crystallized honey carries out heating water bath, is then cooled to room temperature rapidly, and in holding chamber, temperature constant state gathers aroma-producing substance.

Wherein in GC-MS, utilize mass spectrum (library searching), Relative Retention Indices (RI) and smell the volatile ingredient of news three kinds of method determination honey, and carry out inner mark method ration.GC-O technology is the method adopting frequency detecting and detected intensity to combine, and preferably smells the GC-O that the person of distinguishing forms evaluate group by 5, determines to represent respectively the characteristic flavor on basis activity fragrance of honey head perfume, front end fragrance, body note and bottom note four volatilization period.

According to the characteristic perfume contamination ratio of four volatilization period, proportioning builds basic honey fragrance simulated system A.On the basis of system A, build the four group systems variant with it.The difference often organizing system and primary structure A is embodied in two aspects, and namely in certain volatilization period or its characteristic perfume content difference, or its characteristic perfume component is different, and the aroma component of other three phases and content are all constant.

The intelligent sense of smell TuPu method of 7 characterization of variation of honey othernesses extracts

7.1 based on the feature extracting method of variance ratio

Calculate variance and kind internal variance ratio in the middle of its sample to each signaling point of every root sensor, the size according to variance ratio is selected signaling point.The computing method of variance are with the evaluation index q in optimal conditions.Different with other embedded feature selecting, variance ratio is selected directly by under more each information point, between kind, variance is selected information point with kind of an internal variance ratio, need by the method for other pattern discriminations, therefore the selection result of the method can not change because selecting different mode recognition methods.But Variance ratio method belongs to the method for exhaustion, larger to the operand of large sample.

Fig. 2 shows the variance ratio of a signaling point.Can find from figure, the variance ratio information point that more namely intermediate diversity is larger concentrates on 900-1200 and 1800-2160, and the 8 to the 10,15-the 18th sensor.For same root sensor, variance differs greatly and a little concentrates on detection time is in the signaling point of 20s to 35s.Be mainly the adsorption time of volatilization gas and sensor in this time period, and in the signaling point in the detection later stage of each sensor and desorption time point, difference is less.Experimental selection variance ratio is greater than the signaling point of 1 as unique point, and the information of signaling point entrained by it meeting this condition can react the difference of different sample room, selects 798 unique points altogether.With this understanding, utilize SVM discrimination model to integrate in the checking of modeling collection and predict than the ratio as 2:1, final differentiation result is 89.8734% (71/79, wherein rape honey 21/23, Mel 14/17, acacia honey 36/39).

7.2 based on the feature extracting method of individual event diagnostic method

One by one pattern-recognition is carried out to all unique points, compares when each signaling point is as the difference differentiating accuracy rate during single features.Mode identification method embeds in feature selecting by the method, by conjunction with method of discrimination, can obtain the ability that each signaling point is predicted sample.Different from front a kind of filtering method, the method comparatively relies on selected mode identification method, and selection result can change a lot with the change of method of discrimination.The discriminant criterion selected in this research is Bayes diagnostic method

Find from Fig. 3, different from variance selection result, in unidirectional amount selection, between different sensors, the difference of accuracy rate is less, and with differing greatly between the information point under detection time different in root sensor.But time point Bayes differentiates that the variation tendency of accuracy rate is roughly consistent with variance ratio variation tendency in different sensors, namely detection initial stage signal differentiation accuracy rate is higher, and concentrate in front 30s detection time of each sensor, the effect detecting the later stage is then poor.Select to differentiate that accuracy rate is greater than the signaling point of 60% as unique point, totally 598 unique points, utilize svm to verify.SVM predictablity rate is 84.8101% (67/79, wherein rape honey 20/23, Mel 13/17, acacia honey 34/39).

7.3 based on the feature extracting method of ant group algorithm

First two algorithm is exhaustive system of selection, needs to calculate one by one each unique point.When unique point is more, calculated amount can be very large, and this is fundamentally its feature extraction for a large amount of signaling point of limit value also.Ant group algorithm belongs to Heuristic Feature system of selection, utilizes the automatic Iterative of algorithm to evolve, and selects to carry out automatic optimal to unique point, until obtain optimal result.

Ant group algorithm (AntColonyOptimization, ACO) is applied to the Path Selection of travelling salesman at first, is namely optimized shortest path.In this experiment, ant group algorithm is applied to the selection of unique point.Algorithm simulation genetic algorithm, utilizes binary coding to encode to each biography proper vector, and this information point is selected in 1 representative, and this information point is given up in 0 representative.Bayes after utilizing each unique point to select differentiates that accuracy rate and selected feature are counted as fitness function, seeks optimum vector combination.This algorithm main innovate point comprises: unique point selects number to add in fitness function by (a), and sets cost parameter, is regulated by parameter, can count to feature as required and differentiate that accuracy rate is accepted or rejected; B (), for keeping away the renewal anisotropy because particular point causes, arranges optimum collection, replaces single optimum point to select with optimal set; C () Pheromone update degree improves to fitness function and is directly proportional, algorithm optimization is effective, then the amplitude that upgrades increases; D (), for accelerating computing velocity, the vector poor to effect accelerates evaporation rate, reduces pheromone concentration, reduce it and calculate interference to the later stage.Algorithm flow as shown in Figure 4.

In ant group algorithm, each Selecting parameter is as follows: final selection ant group's scale (m)=20; Pheromones volatile concentrations (rho)=0.003; Outstanding ant collection (n1)=3; Difference ant collection (n2)=3; Characteristic number punishment ratio (A)=400; The concentration of pheromones when figure below is a final generation.

Namely the 9th, the signaling point of 15-18 root sensor as can be seen from Figure 5, the unique point selected by ant group algorithm to focus mostly near 1000 and 1500 to 2160.As can be seen from result, inspire class algorithm owing to being automatic evolutional algorithm, although algorithm has certain randomness, have good selection to the overall distribution of feature letter signal.With this understanding, final feature of selecting counts 206, and differentiation accuracy rate is 94.94%(75/79, wherein rape honey 22/23, Mel 16/17, acacia honey 37/39).Ant group algorithm result compares first two exhaust algorithm, and differentiate that accuracy rate has a certain upgrade, the characteristic signal point is simultaneously less, has more representativeness.

7.4 based on the feature extracting method of core principle component analysis

First three is planted extracting method and only screens signal itself, and the unique point selected by these class methods has certain chemical sense, can make an explanation preferably in conjunction with chemical results.But some information having selected vector not possess entrained, are difficult to the accuracy rate ensureing that the later stage differentiates in the vector given up more after all.Except extracting directly, utilize dimension reduction method, by matrixing, data message is compressed, effective information is carried out enrichment, can significantly reduce characteristic signal quantity.Core principle component analysis is have selected and independent component analysis two kinds of dimension reduction methods extract in this research.

Kernel function is introduced in principal component analysis (PCA) (PrincipalComponentAnalysis, PCA) by core principle component analysis (KernelPrincipalComponentAnalysis, KPCA).KPCA utilizes kernel function, by data projection in higher dimensional space.Owing to more disperseing each other after data projection, therefore some inseparable signals in lower dimensional space can be distinguished, and extract and have more representational feature and extract.In this experiment, KPCA selects Radial basis kernel function.

Under KPCA, the SVM predictablity rate of different number of principal components as shown in Figure 6.Fig. 6 shows by original signal under KPCA is down to different dimension, the accuracy rate of forecast set.As can be seen from the figure, SVM differentiation accurately takes the lead in increasing with dimension and increasing, and occurs of short duration plateau afterwards, when 25 to 30 dimension, differentiate that accuracy rate increases with dimension and significantly increases, accuracy rate is relatively stable afterwards, after 100 dimensions, the accuracy rate of sample increases with dimension and reduces.This Changing Pattern shows, in dimension lower-order section, each proper vector all carries sample Accurate classification information, and between information, existence of redundant is less, increases dimension and contributes to improving differentiation accuracy rate.When after 50 dimensions, there is redundancy, covering in the information between individual characteristic quantity, and the now increase of dimension declines on promoting the impact differentiating accuracy rate.When the later stage, the redundancy between feature has had influence on differentiation effect, therefore now differentiates that accuracy rate starts to decline.Final selection is as dimension d=81, and predictablity rate is the highest, is 93.67(74/79, wherein rape honey 22/23, Mel 15/17, acacia honey 37/39).

7.5 based on the feature extracting method of independent component analysis

Principal component analysis (PCA) (comprising core principle component analysis) is all classify according to maximum variance between data, i.e. the second moment of data, but have ignored the independence of data in High Order Moment.Independent component (Independentcomponentsanalysis, ICA) then utilizes the High Order Moment calculated between data to convert matrix, can reduce the associated row between proper vector further, strengthens signal compression effect.

The ICA method that this experiment adopts is fastica.Before algorithm carries out, whitening processing is carried out to data.Under different independent component, svm differentiates accuracy rate as shown in Figure 7

The overall variation trend of ICA and KPCA type, but change relaxes relatively, and rate of accuracy reached is few compared with KPCA to intrinsic dimensionality required when stablizing.Result shows, when independent component is 14, differentiation accuracy rate is 94.94%(75/79, wherein rape honey 22/23, Mel 16/17, acacia honey 37/39).

The foundation of 8 support vector cassification model optimization methods

This research and utilization SVM classifier is as the disaggregated model of different nectar sources sample.Wherein, selected kernel function is Radial basis kernel function (RBF):

Wherein r is the form parameter of RBF function, x _iwith x _jfor the sample of two in sample set.

Compare and other kernel functions, select RBF to mainly contain following two kinds of reasons: 1) RBF can complete linear in nonlinear mapping, can prove that linear kernel function is only a kind of special case of RBF by mathematic(al) manipulation; 2) compared to Polynomial kernel function, RBF parameter is less, and model is relatively simple, this guarantees the stability of model.

Meanwhile, consider that being difficult to all training points of requirement meets constraint function (7), introduce slack variable ξ to training points, then constraint function (7) can be changed into

（22）

Then slack variable ξ=(ξ ₁, ξ ₂, ξ ₃... ξ _n) ', embody all training sets by mistake point situation.Therefore introduce penalty c as interval between balanced class and the wrong weighted value dividing degree, then majorized function (6) can be converted to

（23）

In the SVM classifier taking RBF as kernel function, the classifying quality under different parameters (form parameter r and penalty parameter c) has larger difference, and therefore, this research and utilization Different Optimization method, is optimized the penalty coefficient in r in RBF and punishment parameter.The data selected in this research are the data in 7 after ICA dimensionality reduction.

Concrete Optimizing Flow is as follows:

1, divide training set in proportion to collect with checking, ratio is 2:1.By training set according to five folding cross validation methods, 5 non-cross subsets are divided into by training set, select 4 subsets wherein to carry out parameter training in turn, verify with the parameter that a remaining sub-set pair is selected, the classification accuracy of training set under calculating different parameters.

2, according to selected kernel function, setting kernel functional parameter r and penalty parameter c.

3, with selected parameter according to five folding cross-validation methods in 1 to model training, and calculate the accuracy rate of different parameters drag.

4, whether judgment models accuracy rate is up to standard, otherwise change parameter value.

5, repeat 2,3,4 steps, until obtain best model differentiation rate, or reach stopping criterion for iteration.

Utilize different parameter searching methods in the present invention, form parameter r in step 2 and penalty parameter c are optimized, finally obtain optimization model.

8.1 based on the SVM Selecting parameter of grid optimization

Grid optimization utilizes the method for exhaustion, in the span pre-estimated by certain step-length to searching for a little one by one in scope, determine final optimal parameter.With 2 for the truth of a matter, 2 ^-4to 2 ¹⁰between exhaustive search is carried out to r and c.Work as c=5.2780, during r=0.1088, training set sample differentiates that accuracy rate is the highest, is 96.25% as shown in Figure 8.With this understanding, Modling model, utilizes forecast set to test.Final differentiation accuracy rate is 96.20% (76/79, wherein rape honey 23/23, Mel 16/17, acacia honey 37/39).

8.2 based on the SVM Selecting parameter of genetic algorithm

Genetic algorithm (GeneticAlgorithm, GA) carries out heuristic search by using for reference biological evolution theory to data, and this algorithm is taught by the J.Holland of the U.S. the earliest and first proposed for 1975.The fundamental operation process of genetic algorithm is as follows:

1) data initialization: maximum evolutionary generation is set, the number of individuals of stochastic generation and the colony formed thereof.Number of individuals 20 is selected, maximum iteration time 100 generation in this research.

2) individual evaluation: the fitness calculating each individuality in colony, in this research, fitness is the accuracy rate of sample classification.

3) Selecting operation: utilize selection opertor to carry out Stochastic choice to each individuality in colony.Utilize roulette method to select individuality in conjunction with the accuracy rate of individual evaluation in this research, thus individual information higher for fitness can be genetic to the next generation.

4) crossing operation: utilize crossover operator to carry out superposition restructuring to the individuality in individuality, thus produce new individuality, the characteristic information in integrated previous generation's individuality.

5) mutation operator: utilize mutation operator to carry out random variation to individuality by probability, ensure that new individual generation.

Colony obtains colony of future generation after selection, intersection, mutation operator.

6) stop judging: if iterations reaches maximum algebraically or fitness reaches necessary requirement, stop iteration.

After optimizing, the accuracy rate of training set is 96.25%, c=3.2277, r=0.1354, as shown in Figure 9.With this understanding, differentiate that accuracy rate is 97.4684% (77/79, wherein rape honey 23/23, Mel 16/17, acacia honey 38/39).

8.3 based on the SVM Selecting parameter of particle cluster algorithm

Particle swarm optimization algorithm (ParticleSwarmOptimization, PSO) similar with genetic algorithm, all by initialization random individual, but each initial population was intersected and variation without the later stage in PSO, but carry out individuality more by calculating gap between current individual fitness function value and colony's optimal-adaptive value.Compare GA, in PSO, the guiding of optimal anchor direction is only affected by optimum individual, and also not all individuality carries out cross exchanged.Therefore, the speed of convergence of PSO comparatively GA have larger improvement.Iteration 200 generation is selected, population number 20 in this experiment.Figure 10 is PSO optimum results

Optimum results is: the most high-accuracy of training set is 91.25%, c=32.3362, r=0.0100.Under this condition, predictablity rate is 88.61%(71/79, wherein rape honey 21/23, Mel 14/17, acacia honey 36/39).

Compared to genetic algorithm, particle cluster algorithm convergence is faster, about 6 generations, just reach optimum point.But its effect of optimization is poor, very large reason is because colony's optimal value representativeness is more unilateral, has been absorbed in local minimum points.As can be seen from the result of trellis algorithm, under different parameters, SVM differentiates that accuracy rate function is not single convex function.Therefore, with this understanding, although GA speed of convergence is comparatively slow, consider that the overall situation is individual, effect of optimization comparatively PSO is better.

In three kinds of optimized algorithms, trellis algorithm needs certain priori conditions, as the step-length of the span and search of roughly determining r and c.When scope is comparatively large or step-length hour, search efficiency declines.Comparatively speaking, GA algorithm optimization effect is better, can reach optimum solution about 14 generations, and more initial (75/79) of predictablity rate (77/79) after simultaneously optimizing improves.

Research is finally determined to utilize genetic algorithm combination supporting vector machine mode identification method, carries out discriminant classification to the Electronic Nose sensor signal obtained, and after optimizing, final differentiation accuracy rate is 97.46%.

Claims

1. the intelligence of the characterization of variation of honey otherness based on an independent component analysis sense of smell TuPu method extracting method, it is characterized in that the division according to western part, China geographic area, south China, North China, East China, northeast, select 5 kinds of different nectar sources as study sample, be respectively: 1) rape honey, pick up from Fuling Chongqing and the Yongchuan District in west area; 2) honey of lychee flowers, picks up from the Nanning of South China; 3) chaste honey, picks up from the Miyun Region of Beijing of North China; 4) acacia honey, picks up from the Laiyang Shandong Province in East China; 5) Mel, picks up from Jilin Dunhua and the Harbin, Heilungkiang in northeast;

Under being stored in-18 DEG C of conditions after described sample collection, after treating all samples collection, unification is tested, 5 kinds of samples respectively get 60g, be placed in 40 DEG C of constant water bath box, heating water bath 15min, sample is melted, preserves at remaining sample continues to be placed in-18 DEG C, during heating water bath, every 3min concussion once; After sample water-bath completes, take out under being placed in room temperature and cool more than 1h, until sample temperature is consistent with room temperature, then load 6g sample in the ml headspace bottle of 10mL;

Gas sensor array is utilized to detect testing sample from the Adsorption of different volatile ingredient, described gas sensor array adopts Fox4000 type Electronic Nose, and this Electronic Nose is made up of 18 Metal Oxide Semiconductor Gas Sensing sensors and HS100 headspace autosampler; Simple sample can obtain the signal matrix of 18 sensor * t detection time; Detection by electronic nose parameter is Electronic Nose sample size 1.000mL, extraction speed 1000 μ L/s, sample introduction needle temperature 65 DEG C, acquisition time 120s, collection delay 600s;

Electronic Nose utilizes gas sensor array to detect testing sample from the Adsorption of different volatile ingredient, and principal component analysis (PCA) is classified according to maximum variance between detection data, i.e. the second moment of data, have ignored the independence of data in High Order Moment; Independent component then utilizes the High Order Moment calculated between data to convert matrix, associated row between further reduction proper vector, strengthen signal compression effect, based on the differentiation accuracy rate of supporting vector machine model, independent component number is 14, and its corresponding differentiation accuracy rate is 94.94%, sample 75/79, wherein rape honey 22/23, Mel 16/17, acacia honey 37/39.