CN104376325A - Method for building near-infrared qualitative analysis model - Google Patents

Method for building near-infrared qualitative analysis model Download PDF

Info

Publication number
CN104376325A
CN104376325A CN201410599223.4A CN201410599223A CN104376325A CN 104376325 A CN104376325 A CN 104376325A CN 201410599223 A CN201410599223 A CN 201410599223A CN 104376325 A CN104376325 A CN 104376325A
Authority
CN
China
Prior art keywords
near infrared
data
modeling
sample
building
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410599223.4A
Other languages
Chinese (zh)
Inventor
董肖莉
李卫军
覃鸿
张丽萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Semiconductors of CAS
Original Assignee
Institute of Semiconductors of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Semiconductors of CAS filed Critical Institute of Semiconductors of CAS
Priority to CN201410599223.4A priority Critical patent/CN104376325A/en
Publication of CN104376325A publication Critical patent/CN104376325A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • G06F18/21322Rendering the within-class scatter matrix non-singular
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • G06F18/21322Rendering the within-class scatter matrix non-singular
    • G06F18/21326Rendering the within-class scatter matrix non-singular involving optimisations, e.g. using regularisation techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention discloses a method for building a near-infrared qualitative analysis model. The method includes the steps of (1) collecting near-infrared spectroscopic data of a sample, and determining modeling sample data; (2) preprocessing the modeling sample data; (3) conducting partial least square feature extraction on the modeling sample data; (4) conducting orthogonal linear discriminant analysis feature extraction on the modeling sample data; (5) building the qualitative analysis model with a support vector machine method. The method is accurate, efficient, rapid and free of professional operation.

Description

A kind of method for building up of near infrared qualitative analysis model
Technical field
The present invention relates to the modeling method near infrared qualitative analysis field, particularly a kind of method for building up of near infrared qualitative analysis model.
Background technology
Near infrared spectrum (Near Infrared Spectroscopy, NIRS) analyzing is the material information utilizing near infrared spectrum district to comprise, for a kind of analytical approach of organism qualitative and quantitative analysis, having quick, pollution-free, sample does not need pre-service and can detect the advantages such as Multiple components simultaneously.Near-infrared spectrum technique is a kind of indirect analysis technology, is to be realized the qualitative of unknown sample or quantitative test by the foundation of calibration model.As the analytical technology of a kind of green, multipotency, near-infrared spectrum technique is extensively applied to the fields such as agricultural, food, petrochemical complex, medicine, forestry, textile industry, mineralogy and cosmetics.
Along with the fast development of analytical technology, all kinds of detection techniques based near infrared spectrum have also been obtained to be applied widely, and such as, in field of food safety, near-infrared spectral analysis technology can detect the storage time of fresh meat to judge the freshness of fresh meat; The content of some particular matter in milk powder can be detected to guarantee edible safety etc.In drug safety field, near-infrared spectral analysis technology can analyze the content of the Cucumber in certain medicine, can differentiate the effective constituent determination etc. in Chinese herbal medicine, Chinese herbal medicine.The application of visible and near infrared spectrum analytical technology can be more and more extensive, and especially Qualitative Analysis of Near Infrared Spectroscopy technology, can be able to play an increasingly important role in food hygiene field from now on.
In Qualitative Analysis of Near Infrared Spectroscopy, usually comprise several step, comprise collecting sample spectrum, training pattern, taxonomic history etc.The accuracy of Qualitative Analysis of Near Infrared Spectroscopy model will determine the quality of classifying quality.Such as in the objectionable impurities of milk powder detects, if the qualutative model set up can not distinguish melamine and protein (it is identical that both comprise composition), then can cause very serious food security consequence.
The research and apply of current near infrared spectrum in qualitative analysis field is less, and the method for building up of the near infrared qualitative analysis model related to is less, be therefore badly in need of a kind of simple to operate, rapidly and efficiently, qualitative analysis model method for building up that accuracy rate is high.Therefore, in order to solve the problem, process provides a kind of method for building up of near infrared qualitative analysis model.
Summary of the invention
(1) technical matters that will solve
In view of this, fundamental purpose of the present invention is a kind of method setting up near infrared qualitative analysis model in order to provide simple possible.
(2) technical scheme
For achieving the above object, the invention provides a kind of method for building up of near infrared qualitative analysis model, the method comprises:
Step 1: the near infrared spectrum data of collecting sample, and determine modeling sample data;
Step 2: pre-service is carried out to modeling sample data;
Step 3: offset minimum binary (PLS) feature extraction is carried out to modeling sample data;
Step 4: orthogonal linear discriminatory analysis (OLDA) feature extraction is carried out to modeling sample data;
Step 5: adopt support vector machine (SVM) method establishment qualitative analysis model.
In such scheme, the near infrared spectrum data of collecting sample described in step 1 adopts near infrared spectrometer in the near infrared spectrum data of different time collecting sample.Described near infrared spectrometer is the micro spectrometer of test simple grain sample, or tests the conventional spectrograph of whole cup sample, and acquisition mode comprises diffuse reflection or transmission.If there is the multiple stage near infrared spectrometer of same model, then, when the near infrared spectrum data of collecting sample, the external environment condition residing for multiple stage near infrared spectrometer is identical; To same increment originally, require to measure on different near infrared spectrometers at identical Measuring Time point, obtain many corresponding spectroscopic datas.
In such scheme, determination modeling sample data described in step 1, that the data that can contain some uncertain informations are as modeling sample data, to reduce the influence of change model of spectrum to the accuracy of spectral unmixing, those uncertain informations refer to that sample self attributes is different, Spectral acquisition times is different and/or spectra collection instrument is different.
In such scheme, described in step 2, pre-service is carried out to modeling sample data, be remove or reduce the noise of uncertain background information to spectroscopic data, the preprocess method of employing comprises data normalization process, derivative method process, smoothing processing or centralization and standardization.Described uncertain background information refers to the information by near infrared spectrometer instrument state, condition determination and environmental impact.
In such scheme, described in step 3, offset minimum binary feature extraction is carried out to modeling sample data, specifically comprises:
Step 31: offset minimum binary feature extraction is carried out to modeling collection data, obtains offset minimum binary eigenmatrix;
Step 32: utilize the offset minimum binary eigenmatrix obtained, by the modeling collection data transformation after pre-service in offset minimum binary space.
In such scheme, the data of modeling sample described in step 3, refer to the modeling sample data after pre-service.
In such scheme, carry out offset minimum binary feature extraction described in step 31, the process obtaining offset minimum binary eigenmatrix is as follows:
Step 311: standardization is carried out to sample data, even the average of each variable of sample is 0, variance is 1; Sample matrix is made to be X 0, classification information matrix is Y 0; Wherein, X 0be defined as the original spectrum matrix of n bar spectrum p data point, Y 0category attribute matrix for correspondence:
Y 0in, y ij=1 represents that i-th spectrum belongs to jth class, y ij=0 represents that i-th spectrum does not belong to jth class;
Step 312: ask matrix X ' 0y 0covariance matrix C=X ' 0y 0y ' 0x 0, covariance matrix constant is given up;
Step 313: eigenwert and the characteristic of correspondence vector of trying to achieve covariance matrix C, and by the large minispread of proper vector according to eigenwert, get maximum n dimensional feature value characteristic of correspondence vector composition projection matrix W pLS;
Step 314: obtaining new proper vector is: x ' i=x iw ' pLS.
In such scheme, described in step 4, orthogonal linear discriminatory analysis feature extraction is carried out to modeling sample data, specifically comprises:
Step 41: carry out orthogonal linear discriminant analysis feature extraction to the modeling collection data after offset minimum binary feature extraction, obtain orthogonal linear discriminatory analysis eigenmatrix, to utilize this projection matrix by data transformation to orthogonal linear discriminatory analysis space;
Step 42: utilize the orthogonal linear discriminatory analysis eigenmatrix obtained, by the modeling collection data transformation after offset minimum binary feature extraction in orthogonal linear discriminatory analysis space;
Step 43: utilize the modeling collection data transformed in orthogonal linear discriminatory analysis space to carry out modeling.
In such scheme, modeling sample data described in step 4, refer to the modeling sample data after offset minimum binary feature extraction, orthogonal linear discriminatory analysis feature extracting method, compared with traditional linear discriminant analysis, the small sample problem that the latter runs in real world applications can be solved.
In such scheme, carry out orthogonal linear discriminant analysis feature extraction described in step 41, the process obtaining orthogonal linear discriminatory analysis eigenmatrix is as follows:
Step 411: suppose there is C class sample, total number of samples is N, N ibe the i-th class sample number, then define scatter matrix S in class w, scatter matrix S between class bas follows:
S W = Σ i = 1 c Σ j = 1 N i ( x ij - m i ) ( x ij - m i ‾ ) ′ S B = Σ i = 1 N ( m i - m ‾ ) ( m i - m ‾ ) ′
Wherein, be the average of the i-th quasi-mode, for total sample average;
Step 412: the optimization problem of orthogonal linear discriminatory analysis is converted to the optimization problem solving following formula:
W OLDA - opt = arg max W T W = I W T S B W W T S W W
Wherein, w i(i=1,2 ...) correspond to n value characteristic of correspondence vector before the descending sort of following formula eigenwert, and to W be met tw=I:
S Bw=λS Ww;
Step 413: obtain W oLDA-optafter, get maximum n dimensional feature value characteristic of correspondence vector composition projection matrix W oLDA, data conversion Y '=YW ' can be carried out oLDA.
In such scheme, the described orthogonal linear discriminatory analysis eigenmatrix obtained in step 41, compared with linear discriminant analysis eigenmatrix, orthogonal linear discriminatory analysis eigenmatrix is in the process solving transformation matrix, be pairwise orthogonal between proper vector, namely meet W tw=I.
In such scheme, adopt support vector machine method to set up qualitative analysis model described in step 5, specifically comprise:
Step 51: by the modeling sample data x after orthogonal linear discriminant analysis feature extraction 1, x 2..., x n, as model construction of SVM data;
Step 52: determine the class label data y in modeling sample data 1, y 2..., y n, y i∈ {+1 ,-1};
Step 53: arrange the parameters in model construction of SVM process, comprises sorter, kernel function type, determines optimal classification interface with this;
Step 54: utilize this optimal classification interface, unknown sample data are classified.
In such scheme, the support vector machine described in step 5, is a kind of method being applicable to two classification problems, can be applied in the method for building up of qualitative analysis model.
In such scheme, determine described in step 53 that optimal classification interface adopts the mode of linear separability, specifically comprise:
Suppose that the classification interface of this best is: w ' x+b=0
Then discriminant function is: f (x)=w ' x+b,
Therefore:
f ( x ) > 0 , x &Element; w 1 f ( x ) < 0 , x &Element; w 2
Assuming that two class samples are d to the distance that classification interface is minimum, namely there is sample x 1∈ w 1, x 2∈ w 2, make:
f(x 1)=w′x 1+b=d
f(x 2)=w′x 2+b=-d
The right normalization obtains:
w′ dx 1+b d=1
w′ dx 2+b d=-1
Wherein:
w &prime; d = w d , b d = b d
Therefore, can obtain:
w &prime; d ( x 1 - x 2 ) = 2 &DoubleRightArrow; &delta; = w &prime; d ( x 1 - x 2 ) | | w | | = 2 | | w | |
Make class interval maximum, be equivalent to minimum.Problem is converted into a typical optimization problem:
min 1 2 | | w | | 2
s.t y i(w′x i+b)-1≥0
Wherein, constraint condition represents that all samples are correctly classified, and uses Lagrangian Arithmetic can solve this optimization problem, thus obtains best classification interface.
(3) beneficial effect
As can be seen from technique scheme, the present invention has following beneficial effect:
The present invention adopts near infrared spectrum data to set up qualitative analysis model, rapidly and efficiently, harmless pollution-free, to sample without destruction, and simple to operate, need not professional person can realize.The qualitative analysis model set up, can be determined the ownership of unknown material, can differentiate the active component of medicine, auxiliary material, preparation, intermediate product, chemical raw material, wrappage etc. by the spectrum comparing unknown sample and known reference sample sets; Related substances in food or raw material are differentiated etc., therefore can obtain applying very widely in a lot of field.In addition,
Accompanying drawing explanation
Fig. 1 is the method flow diagram setting up near infrared qualitative analysis model provided by the invention.
Fig. 2 is when in the embodiment of the present invention, PLS dimension is 5, the characteristic profile of bidimensional before in PLS space.
Fig. 3 is that in the embodiment of the present invention, PLS drops to 5 dimensions, OLDA is when dropping to 3 dimension, the characteristic profile of bidimensional before in OLDA space.
Fig. 4 is the different impact on discrimination of PLS dimension in the embodiment of the present invention.
embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
The present invention is divided into 5 steps to realize, and Fig. 1 is the method for building up process flow diagram of near infrared qualitative analysis model provided by the invention, and the method comprises:
Step 1: the near infrared spectrum data of collecting sample, and determine modeling sample data;
In this step, the near infrared spectrum data of described collecting sample adopts near infrared spectrometer in the near infrared spectrum data of different time collecting sample.Described near infrared spectrometer is the micro spectrometer of test simple grain sample, or tests the conventional spectrograph of whole cup sample, and acquisition mode comprises diffuse reflection or transmission.If there is the multiple stage near infrared spectrometer of same model, then, when the near infrared spectrum data of collecting sample, the external environment condition residing for multiple stage near infrared spectrometer is identical; To same increment originally, require to measure on different near infrared spectrometers at identical Measuring Time point, obtain many corresponding spectroscopic datas.Described determination modeling sample data, that the data that can contain some uncertain informations are as modeling sample data, to reduce the influence of change model of spectrum to the accuracy of spectral unmixing, those uncertain informations refer to that sample self attributes is different, Spectral acquisition times is different and/or spectra collection instrument is different.
Step 2: pre-service is carried out to modeling sample data;
In this step, described pre-service is carried out to modeling sample data, be remove or reduce the noise of uncertain background information to spectroscopic data, the preprocess method of employing comprises data normalization process, derivative method process, smoothing processing or centralization and standardization.Described uncertain background information refers to the information by near infrared spectrometer instrument state, condition determination and environmental impact.
Step 3: offset minimum binary feature extraction is carried out to modeling sample data;
In this step, described modeling sample data, refer to the modeling sample data after pre-service; Described offset minimum binary feature extraction is carried out to modeling sample data, specifically comprises:
Step 31: offset minimum binary feature extraction is carried out to the modeling collection data after pre-service, obtains offset minimum binary eigenmatrix; Wherein, described in carry out offset minimum binary feature extraction, the process obtaining offset minimum binary eigenmatrix is as follows:
Step 311: standardization is carried out to sample data, even the average of each variable of sample is 0, variance is 1; Sample matrix is made to be X 0, classification information matrix is Y 0; Wherein, X 0be defined as the original spectrum matrix of n bar spectrum p data point, Y 0category attribute matrix for correspondence:
Y 0in, y ij=1 represents that i-th spectrum belongs to jth class, y ij=0 represents that i-th spectrum does not belong to jth class;
Step 312: ask matrix X ' 0y 0covariance matrix C=X ' 0y 0y ' 0x 0, covariance matrix constant is given up;
Step 313: eigenwert and the characteristic of correspondence vector of trying to achieve covariance matrix C, and by the large minispread of proper vector according to eigenwert, get maximum n dimensional feature value characteristic of correspondence vector composition projection matrix W pLS;
Step 314: obtaining new proper vector is: x ' i=x iw ' pLS.
Step 32: utilize the offset minimum binary eigenmatrix obtained, by the modeling collection data transformation after pre-service in offset minimum binary space.
Step 4: orthogonal linear discriminatory analysis feature extraction is carried out to modeling sample data;
In this step, described modeling sample data, refer to the modeling sample data after offset minimum binary feature extraction, orthogonal linear discriminatory analysis feature extracting method, compared with traditional linear discriminant analysis, the small sample problem that the latter runs in real world applications can be solved.
Wherein, described orthogonal linear discriminatory analysis feature extraction is carried out to modeling sample data, specifically comprises:
Step 41: carry out orthogonal linear discriminant analysis feature extraction to the modeling collection data after offset minimum binary feature extraction, obtain orthogonal linear discriminatory analysis eigenmatrix, to utilize this projection matrix by data transformation to orthogonal linear discriminatory analysis space; Wherein, described in carry out orthogonal linear discriminant analysis feature extraction, the process obtaining orthogonal linear discriminatory analysis eigenmatrix is as follows:
Step 411: suppose there is C class sample, total number of samples is N, N ibe the i-th class sample number, then define scatter matrix S in class w, scatter matrix S between class bas follows:
S W = &Sigma; i = 1 c &Sigma; j = 1 N i ( x ij - m i ) ( x ij - m i &OverBar; ) &prime; S B = &Sigma; i = 1 N ( m i - m &OverBar; ) ( m i - m &OverBar; ) &prime;
Wherein, be the average of the i-th quasi-mode, for total sample average;
Step 412: the optimization problem of orthogonal linear discriminatory analysis is converted to the optimization problem solving following formula:
W OLDA - opt = arg max W T W = I W T S B W W T S W W
Wherein, w i(i=1,2 ...) correspond to n value characteristic of correspondence vector before the descending sort of following formula eigenwert, and to W be met tw=I:
S Bw=λS Ww
Step 413: obtain W oLDA-optafter, get maximum n dimensional feature value characteristic of correspondence vector composition projection matrix W oLDA, data conversion Y '=YW ' can be carried out oLDA.
The described orthogonal linear discriminatory analysis eigenmatrix obtained in step 41, compared with linear discriminant analysis eigenmatrix, orthogonal linear discriminatory analysis eigenmatrix, in the process solving transformation matrix, is pairwise orthogonal between proper vector, namely meets W tw=I.
Step 42: utilize the orthogonal linear discriminatory analysis eigenmatrix obtained, by the modeling collection data transformation after offset minimum binary feature extraction in orthogonal linear discriminatory analysis space;
Step 43: utilize the modeling collection data transformed in orthogonal linear discriminatory analysis space to carry out modeling.
Step 5: adopt support vector machine method to set up qualitative analysis model;
In this step, described support vector machine, is a kind of method being applicable to two classification problems, can be applied in the method for building up of qualitative analysis model.Described employing support vector machine method sets up qualitative analysis model, specifically comprises:
Step 51: by the modeling sample data x after orthogonal linear discriminant analysis feature extraction 1, x 2..., x n, as model construction of SVM data;
Step 52: determine the class label data y in modeling sample data 1, y 2..., y n, y i∈ {+1 ,-1};
Step 53: arrange the parameters in model construction of SVM process, comprises sorter, kernel function type, determines optimal classification interface with this; Wherein, the described mode determining optimal classification interface employing linear separability, specifically comprises:
Suppose that the classification interface of this best is: w ' x+b=0
Then discriminant function is: f (x)=w ' x+b,
Therefore:
f ( x ) > 0 , x &Element; w 1 f ( x ) < 0 , x &Element; w 2
Assuming that two class samples are d to the distance that classification interface is minimum, namely there is sample x 1∈ w 1, x 2∈ w 2, make:
f(x 1)=w′x 1+b=d
f(x 2)=w′x 2+b=-d
The right normalization obtains:
w′ dx 1+b d=1
w′ dx 2+b d=-1
Wherein:
w &prime; d = w d , b d = b d
Therefore, can obtain:
w &prime; d ( x 1 - x 2 ) = 2 &DoubleRightArrow; &delta; = w &prime; d ( x 1 - x 2 ) | | w | | = 2 | | w | |
Make class interval maximum, be equivalent to minimum.Problem is converted into a typical optimization problem:
min 1 2 | | w | | 2
s.t y i(w′x i+b)-1≥0
Wherein, constraint condition represents that all samples are correctly classified, and uses Lagrangian Arithmetic can solve this optimization problem, thus obtains best classification interface.
Step 54: utilize this optimal classification interface, unknown sample data are classified.
Embodiment
This experiment is differentiated for the monoploid of corn variety and polyploid, with the experiment of method establishment monoploid provided by the invention and polyploid analysis and identification model, and provides experimental results, to verify the effect of institute of the present invention established model.
Testing corn variety used is Zheng Dan 958, and comprise the monoploid of this kind and the abundant seed of polyploid, experiment porch comprises: the micro spectrometer of cast-iron bracket, JDSU, the gold-plated Lamp cup (focal length 50mm) of 12V & 35W.JDSU micro spectrometer is the spectrometer of a near infrared spectrum that JDSU company provides, and volume is little, lightweight, and its wavelength coverage is 908.1 ~ 1676.2nm, and data length is 125.Be furnished with fan and optical filter in gold-plated Lamp cup, its effect is respectively as bulb heat radiation and visible light, protection spectrometer.
Operating voltage is 6V voltage stabilizing, and the distance of light source and spectrometer is 3cm, vertical irradiation.The pore size of the black lid on spectrometer is medium size (5.2mm).
Open JDSU micro spectrometer all will to start shooting verification, to obtain sample spectral data as far as possible accurately at every turn.The times of collection arranging micro spectrometer is 200 times, integral time is 5000um.
(1) near infrared spectrum data of collecting sample, and determine modeling sample data
Gather the monoploid of Zheng Dan 958 kind and the near infrared spectrum data of polyploid seed.Adopt the method for interleaved acquisition when gathering, namely according to monoploid one-polyploid one-monoploid one-polyploid one-... mode gather simple grain spectrum, middle no parity check.Adopting the reason gathering spectrum is in this way, in automatic test course, the corn seed of the unknown monoploid of a pile and polyploid does not also know that specifically which is monoploid, which is polyploid, therefore when gathering spectrum, the mode that will gather according to this interval, to adapt to the situation when automatic test.
Gather 100 spectrum, i.e. monoploid and each 50 spectroscopic datas of polyploid in such a way.Get this data centralization front 30 for modeling collection, namely front 30 spectroscopic datas of front 30 and the polyploid data set of monoploid data set are as modeling collection, and all the other are as test set.
(2) pre-service is carried out to modeling sample data
To the preprocess method that modeling sample data adopt be: level and smooth (parameter is 9), single order lead (parameter is 9) and normalization.
(3) PLS feature extraction
For determining the best dimension of PLS feature extraction, experiment test carries out the effect differentiated when the intrinsic dimensionality of PLS gets different value respectively, identification result correct recognition rata is weighed.
(4) OLDA feature extraction
The dimension of setting OLDA is 3.
(5) qualitative analysis model is set up with SVM
Use the qualitative analysis model of SVM method establishment monoploid polyploid.
(6) experiment test model
When carrying out PLS feature extraction, different dimensions can produce different impacts to recognition result.In experimentation, get different PLS dimensions on the impact of discrimination as shown in Figure 4.As can be seen from Figure 4, be not that dimension is higher, recognition effect is better.By relatively finding, when the dimension of PLS is 5, recognition effect is best, and the discrimination of monoploid and polyploid can reach 100%.Therefore, when reality uses PLS, dimension is not fixed, and will set according to dimension on the impact of recognition effect.
By the spectrum of modeling collection and test set after PLS feature extraction (dimension is 5), in PLS space, front two dimensional features distributions as shown in Figure 2, and now the discrimination of monoploid and polyploid is 100% and 90%.By the spectrum of modeling collection and test set after PLS (dimension is 5)+OLDA (dimension is 3) feature extraction, before in OLDA space, the distribution of bidimensional as shown in Figure 3, and now the discrimination of monoploid and polyploid is all 100%.Fig. 2 and Fig. 3 contrasts visible, after PLS feature extraction, although monoploid and polyploid have separately distribution space more clearly in space, the Data distribution8 of modeling collection and test set is not very concentrated, even in some kind, distance has exceeded kind spacing, is unfavorable for follow-up discriminatory analysis.And after OLDA feature extraction, the data of modeling collection and test set can fall identical distributed areas substantially, the not concentration phenomenon that distributes is existing better to be improved.
The present invention is setting up near infrared qualitative analysis model, after PLS feature extraction, employ again OLDA method and carry out feature extraction, and use SVM modeling, this cover method can set up the qualitative analysis model of better performances, has certain practical value in actual applications.
Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (17)

1. a method for building up near infrared qualitative analysis model, is characterized in that, the method comprises:
Step 1: the near infrared spectrum data of collecting sample, and determine modeling sample data;
Step 2: pre-service is carried out to modeling sample data;
Step 3: offset minimum binary feature extraction is carried out to modeling sample data;
Step 4: orthogonal linear discriminatory analysis feature extraction is carried out to modeling sample data;
Step 5: adopt support vector machine method to set up qualitative analysis model.
2. the method for building up of near infrared qualitative analysis model according to claim 1, is characterized in that, the near infrared spectrum data of collecting sample described in step 1, is to adopt near infrared spectrometer in the near infrared spectrum data of different time collecting sample.
3. the method for building up of near infrared qualitative analysis model according to claim 2, it is characterized in that, described near infrared spectrometer is the micro spectrometer of test simple grain sample, or tests the conventional spectrograph of whole cup sample, and acquisition mode comprises diffuse reflection or transmission.
4. the method for building up of near infrared qualitative analysis model according to claim 2, it is characterized in that, if there is the multiple stage near infrared spectrometer of same model, then, when the near infrared spectrum data of collecting sample, the external environment condition residing for multiple stage near infrared spectrometer is identical; To same increment originally, require to measure on different near infrared spectrometers at identical Measuring Time point, obtain many corresponding spectroscopic datas.
5. the method for building up of near infrared qualitative analysis model according to claim 1, it is characterized in that, determination modeling sample data described in step 1, that the data that can contain some uncertain informations are as modeling sample data, to reduce the influence of change model of spectrum to the accuracy of spectral unmixing, those uncertain informations refer to that sample self attributes is different, Spectral acquisition times is different and/or spectra collection instrument is different.
6. the method for building up of near infrared qualitative analysis model according to claim 1, it is characterized in that, described in step 2, pre-service is carried out to modeling sample data, be remove or reduce the noise of uncertain background information to spectroscopic data, the preprocess method of employing comprises data normalization process, derivative method process, smoothing processing or centralization and standardization.
7. the method for building up of near infrared qualitative analysis model according to claim 6, is characterized in that, described uncertain background information refers to the information by near infrared spectrometer instrument state, condition determination and environmental impact.
8. the method for building up of near infrared qualitative analysis model according to claim 1, is characterized in that, carries out offset minimum binary feature extraction, specifically comprise described in step 3 to modeling sample data:
Step 31: offset minimum binary feature extraction is carried out to modeling collection data, obtains offset minimum binary eigenmatrix;
Step 32: utilize the offset minimum binary eigenmatrix obtained, by the modeling collection data transformation after pre-service in offset minimum binary space.
9. the method for building up of near infrared qualitative analysis model according to claim 8, is characterized in that, the data of modeling sample described in step 3, refers to the modeling sample data after pre-service.
10. the method for building up of near infrared qualitative analysis model according to claim 8, is characterized in that, carries out offset minimum binary feature extraction described in step 31, and the process obtaining offset minimum binary eigenmatrix is as follows:
Step 311: standardization is carried out to sample data, even the average of each variable of sample is 0, variance is 1; Sample matrix is made to be X 0, classification information matrix is Y 0; Wherein, X 0be defined as the original spectrum matrix of n bar spectrum p data point, Y 0category attribute matrix for correspondence:
X 0 = x 11 x 12 . . . x 1 p x 21 x 22 . . . x 2 p . . . . . . . . . . . . x n 1 x n 2 . . . x np , Y 0 = 1 0 0 . . . 0 1 0 0 . . . 0 0 1 0 . . . 0 0 1 0 . . . 0 0 0 1 . . . 0 . . . . . . . . 1 . . . . 0 0 0 . . . 1
Y 0in, y ij=1 represents that i-th spectrum belongs to jth class, y ij=0 represents that i-th spectrum does not belong to jth class;
Step 312: ask matrix X ' 0y 0covariance matrix C=X ' 0y 0y ' 0x 0, covariance matrix constant is given up;
Step 313: eigenwert and the characteristic of correspondence vector of trying to achieve covariance matrix C, and by the large minispread of proper vector according to eigenwert, get maximum n dimensional feature value characteristic of correspondence vector composition projection matrix W pLS;
Step 314: obtaining new proper vector is: x ' i=x iw ' pLS.
The method for building up of 11. near infrared qualitative analysis models according to claim 1, is characterized in that, carries out orthogonal linear discriminatory analysis feature extraction, specifically comprise described in step 4 to modeling sample data:
Step 41: carry out orthogonal linear discriminant analysis feature extraction to the modeling collection data after offset minimum binary feature extraction, obtain orthogonal linear discriminatory analysis eigenmatrix, to utilize this projection matrix by data transformation to orthogonal linear discriminatory analysis space;
Step 42: utilize the orthogonal linear discriminatory analysis eigenmatrix obtained, by the modeling collection data transformation after offset minimum binary feature extraction in orthogonal linear discriminatory analysis space;
Step 43: utilize the modeling collection data transformed in orthogonal linear discriminatory analysis space to carry out modeling.
The method for building up of 12. near infrared qualitative analysis models according to claim 11, it is characterized in that, modeling sample data described in step 4, refer to the modeling sample data after offset minimum binary feature extraction, orthogonal linear discriminatory analysis feature extracting method, compared with traditional linear discriminant analysis, the small sample problem that the latter runs in real world applications can be solved.
The method for building up of 13. near infrared qualitative analysis models according to claim 11, is characterized in that, carry out orthogonal linear discriminant analysis feature extraction described in step 41, and the process obtaining orthogonal linear discriminatory analysis eigenmatrix is as follows:
Step 411: suppose there is C class sample, total number of samples is N, N ibe the i-th class sample number, then define scatter matrix S in class w, scatter matrix S between class bas follows:
S W = &Sigma; i = 1 c &Sigma; j = 1 N i ( x ij - m i ) ( x ij - m i &OverBar; ) &prime;
S B = &Sigma; i = 1 N ( m i - m &OverBar; ) ( m i - m &OverBar; ) &prime;
Wherein, m i = 1 N i &Sigma; j = 1 N i x ij Be the average of the i-th quasi-mode, m &OverBar; = 1 N &Sigma; i = 1 c &Sigma; j = 1 N i x ij For total sample average;
Step 412: the optimization problem of orthogonal linear discriminatory analysis is converted to the optimization problem solving following formula:
W OLDA - opt = arg max W T W = I W T S B W W T S W W
Wherein, w i(i=1,2 ...) correspond to n value characteristic of correspondence vector before the descending sort of following formula eigenwert, and to W be met tw=I:
S Bw=λS Ww;
Step 413: obtain W oLDA-optafter, get maximum n dimensional feature value characteristic of correspondence vector composition projection matrix W oLDA, data conversion Y '=YW ' can be carried out oLDA.
The method for building up of 14. near infrared qualitative analysis models according to claim 1, it is characterized in that, the described orthogonal linear discriminatory analysis eigenmatrix obtained in step 41, compared with linear discriminant analysis eigenmatrix, orthogonal linear discriminatory analysis eigenmatrix is in the process solving transformation matrix, be pairwise orthogonal between proper vector, namely meet W tw=I.
The method for building up of 15. near infrared qualitative analysis models according to claim 1, is characterized in that, adopt support vector machine method to set up qualitative analysis model, specifically comprise described in step 5:
Step 51: by the modeling sample data x after orthogonal linear discriminant analysis feature extraction 1, x 2..., x n, as model construction of SVM data;
Step 52: determine the class label data y in modeling sample data 1, y 2..., y n, y i∈ {+1 ,-1};
Step 53: arrange the parameters in model construction of SVM process, comprises sorter, kernel function type, determines optimal classification interface with this;
Step 54: utilize this optimal classification interface, unknown sample data are classified.
The method for building up of 16. near infrared qualitative analysis models according to claim 15, is characterized in that, the support vector machine described in step 5, is a kind of method being applicable to two classification problems, can be applied in the method for building up of qualitative analysis model.
The method for building up of 17. near infrared qualitative analysis models according to claim 15, is characterized in that, determines that optimal classification interface adopts the mode of linear separability, specifically comprise described in step 53:
Suppose that the classification interface of this best is: w ' x+b=0
Then discriminant function is: f (x)=w ' x+b,
Therefore:
f ( x ) > 0 , x &Element; w 1 f ( x ) < 0 , x &Element; w 2
Assuming that two class samples are d to the distance that classification interface is minimum, namely there is sample x 1∈ w 1, x 2∈ w 2, make:
f(x 1)=w′x 1+b=d
f(x 2)=w′x 2+b=-d
The right normalization obtains:
w′ dx 1+b d=1
w′ dx 2+b d=-1
Wherein:
w &prime; d = w d , b d = b d
Therefore, can obtain:
w &prime; d ( x 1 - x 2 ) = 2 &DoubleRightArrow; &delta; = w &prime; d ( x 1 - x 2 ) | | w | | = 2 | | w | |
Make class interval maximum, be equivalent to minimum.Problem is converted into a typical optimization problem:
min 1 2 | | w | | 2
s.t y i(w′x i+b)-1≥0
Wherein, constraint condition represents that all samples are correctly classified, and uses Lagrangian Arithmetic can solve this optimization problem, thus obtains best classification interface.
CN201410599223.4A 2014-10-30 2014-10-30 Method for building near-infrared qualitative analysis model Pending CN104376325A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410599223.4A CN104376325A (en) 2014-10-30 2014-10-30 Method for building near-infrared qualitative analysis model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410599223.4A CN104376325A (en) 2014-10-30 2014-10-30 Method for building near-infrared qualitative analysis model

Publications (1)

Publication Number Publication Date
CN104376325A true CN104376325A (en) 2015-02-25

Family

ID=52555221

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410599223.4A Pending CN104376325A (en) 2014-10-30 2014-10-30 Method for building near-infrared qualitative analysis model

Country Status (1)

Country Link
CN (1) CN104376325A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069462A (en) * 2015-07-15 2015-11-18 天津大学 Qualitative determination method for organic chemical based on spectral characteristic refinement and classifier cascading
CN105067558A (en) * 2015-07-22 2015-11-18 中国科学院半导体研究所 Infrared qualitative discrimination feature extraction method
CN106528668A (en) * 2016-10-23 2017-03-22 哈尔滨工业大学深圳研究生院 Second-order metabolic mass spectrometry compound detection method based on visual networks
CN106599819A (en) * 2016-12-06 2017-04-26 中国科学院半导体研究所 Vein identification method based on HOL characteristics and subspace learning
EP3399472A1 (en) * 2017-05-04 2018-11-07 Viavi Solutions Inc. Endpoint detection in manufacturing process by near infrared spectroscopy and machine learning techniques
CN109374573A (en) * 2018-10-12 2019-02-22 乐山师范学院 Cucumber epidermis pesticide residue recognition methods based on near-infrared spectrum analysis
CN109508440A (en) * 2018-11-28 2019-03-22 武汉轻工大学 Construction method, device, equipment and the storage medium of spectrum analysis model
CN109657733A (en) * 2018-12-28 2019-04-19 中国农业科学院农业质量标准与检测技术研究所 Variety discriminating method and system based on constituent structure feature
CN110163276A (en) * 2019-05-15 2019-08-23 浙江中烟工业有限责任公司 A kind of screening technique of near infrared spectrum modeling sample
CN111125629A (en) * 2019-12-25 2020-05-08 温州大学 Domain-adaptive PLS regression model modeling method
WO2021036546A1 (en) * 2019-08-29 2021-03-04 山东科技大学 Near-infrared quantitative analysis model construction method based on biased estimation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101819141A (en) * 2010-04-28 2010-09-01 中国科学院半导体研究所 Maize variety identification method based on near infrared spectrum and information processing
CN104062262A (en) * 2014-07-09 2014-09-24 中国科学院半导体研究所 Crop seed variety authenticity identification method based on near infrared spectrum

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101819141A (en) * 2010-04-28 2010-09-01 中国科学院半导体研究所 Maize variety identification method based on near infrared spectrum and information processing
CN104062262A (en) * 2014-07-09 2014-09-24 中国科学院半导体研究所 Crop seed variety authenticity identification method based on near infrared spectrum

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
曹吾 等: "近红外定性分析模型的稳健性与适应性分析", 《光谱学与光谱分析》 *
李贵滨: "基于SVM的大豆油脂色泽近红外光谱检测算法分析", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
武小红 等: "基于Adaboost+OLDA和近红外光谱的猪肉贮藏时间辨别", 《光谱学与光谱分析》 *
覃鸿 等: "基于DPLS特征提取的LDA方法在玉米近红外光谱定性分析中的应用", 《光谱学与光谱分析》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069462A (en) * 2015-07-15 2015-11-18 天津大学 Qualitative determination method for organic chemical based on spectral characteristic refinement and classifier cascading
CN105067558A (en) * 2015-07-22 2015-11-18 中国科学院半导体研究所 Infrared qualitative discrimination feature extraction method
CN105067558B (en) * 2015-07-22 2018-03-13 中国科学院半导体研究所 Near-infrared Qualitive test feature extracting method
CN106528668A (en) * 2016-10-23 2017-03-22 哈尔滨工业大学深圳研究生院 Second-order metabolic mass spectrometry compound detection method based on visual networks
CN106599819A (en) * 2016-12-06 2017-04-26 中国科学院半导体研究所 Vein identification method based on HOL characteristics and subspace learning
US10984334B2 (en) 2017-05-04 2021-04-20 Viavi Solutions Inc. Endpoint detection in manufacturing process by near infrared spectroscopy and machine learning techniques
EP3399472A1 (en) * 2017-05-04 2018-11-07 Viavi Solutions Inc. Endpoint detection in manufacturing process by near infrared spectroscopy and machine learning techniques
CN109374573A (en) * 2018-10-12 2019-02-22 乐山师范学院 Cucumber epidermis pesticide residue recognition methods based on near-infrared spectrum analysis
CN109374573B (en) * 2018-10-12 2021-07-16 乐山师范学院 Cucumber epidermis pesticide residue identification method based on near infrared spectrum analysis
CN109508440A (en) * 2018-11-28 2019-03-22 武汉轻工大学 Construction method, device, equipment and the storage medium of spectrum analysis model
CN109508440B (en) * 2018-11-28 2023-01-03 武汉轻工大学 Method, device and equipment for constructing spectral analysis model and storage medium
CN109657733A (en) * 2018-12-28 2019-04-19 中国农业科学院农业质量标准与检测技术研究所 Variety discriminating method and system based on constituent structure feature
CN110163276A (en) * 2019-05-15 2019-08-23 浙江中烟工业有限责任公司 A kind of screening technique of near infrared spectrum modeling sample
WO2021036546A1 (en) * 2019-08-29 2021-03-04 山东科技大学 Near-infrared quantitative analysis model construction method based on biased estimation
CN111125629A (en) * 2019-12-25 2020-05-08 温州大学 Domain-adaptive PLS regression model modeling method
CN111125629B (en) * 2019-12-25 2023-04-07 温州大学 Domain-adaptive PLS regression model modeling method

Similar Documents

Publication Publication Date Title
CN104376325A (en) Method for building near-infrared qualitative analysis model
Kiani et al. Integration of computer vision and electronic nose as non-destructive systems for saffron adulteration detection
CN109142317B (en) Raman spectrum substance identification method based on random forest model
CN104374738B (en) A kind of method for qualitative analysis improving identification result based on near-infrared
Pereira et al. Evaluation and identification of blood stains with handheld NIR spectrometer
Zhang et al. A simple identification model for subtle bruises on the fresh jujube based on NIR spectroscopy
CN104062262A (en) Crop seed variety authenticity identification method based on near infrared spectrum
CN103235095A (en) Water-injected meat detection method and device
de Lima et al. Methods of authentication of food grown in organic and conventional systems using chemometrics and data mining algorithms: A review
CN110378374B (en) Tea near infrared spectrum classification method for extracting fuzzy identification information
Jahani et al. Novel application of near-infrared spectroscopy and chemometrics approach for detection of lime juice adulteration
CN108844917A (en) A kind of Near Infrared Spectroscopy Data Analysis based on significance tests and Partial Least Squares
CN106124445A (en) A kind of quick, Undamaged determination genetically engineered soybean method
CN104374739A (en) Identification method for authenticity of varieties of seeds on basis of near-infrared quantitative analysis
CN104568824B (en) Shrimps grade of freshness detection method and device based on Vis/NIR
Yu et al. Identification of wine according to grape variety using near-infrared spectroscopy based on radial basis function neural networks and least-squares support vector machines
CN103411906A (en) Near infrared spectrum qualitative identification method of pearl powder and shell powder
CN103364359A (en) Application of SIMCA pattern recognition method to near infrared spectrum recognition of medicinal material, rhubarb
CN110749565A (en) Method for rapidly identifying storage years of Pu&#39; er tea
CN108489929A (en) Ginseng, Radix Notoginseng and the legal base source Panax polysaccharide of three kinds of American Ginseng discrimination method
Liu et al. Method for identifying transgenic cottons based on terahertz spectra and WLDA
CN105181761A (en) Method for rapidly identifying irradiation absorbed dose of tea by using electronic nose
CN105528580A (en) Hyperspectral curve matching method based on absorption peak characteristic
CN104345045A (en) Chemical pattern recognition and near infrared spectrum-based similar medicinal material identification method
Zhang et al. Rapid authentication of the geographical origin of milk using portable near‐infrared spectrometer and fuzzy uncorrelated discriminant transformation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150225

WD01 Invention patent application deemed withdrawn after publication