WO2011109922A1 - Near-infrared spectrum characteristic subinterval selection method based on simulated annealing-genetic algorithm - Google Patents

Near-infrared spectrum characteristic subinterval selection method based on simulated annealing-genetic algorithm Download PDF

Info

Publication number
WO2011109922A1
WO2011109922A1 PCT/CN2010/000530 CN2010000530W WO2011109922A1 WO 2011109922 A1 WO2011109922 A1 WO 2011109922A1 CN 2010000530 W CN2010000530 W CN 2010000530W WO 2011109922 A1 WO2011109922 A1 WO 2011109922A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
simulated annealing
interval
genetic algorithm
gene
Prior art date
Application number
PCT/CN2010/000530
Other languages
French (fr)
Chinese (zh)
Inventor
邹小波
石吉勇
赵杰文
殷晓平
陈正伟
黄星奕
蔡健荣
陈全胜
Original Assignee
江苏大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 江苏大学 filed Critical 江苏大学
Publication of WO2011109922A1 publication Critical patent/WO2011109922A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light

Definitions

  • the invention relates to a method for selecting a sub-interval of sub-infrared spectroscopy for analyzing the quality of agricultural products and foods, and particularly relates to a method for selecting sub-intervals of near-infrared spectroscopy based on simulated annealing-genetic algorithm.
  • Near-infrared spectroscopy is widely used in agricultural products and food quality analysis due to its fast analysis speed and high efficiency.
  • certain deficiencies in near-infrared spectroscopy such as complex background, low information intensity, overlapping peaks, etc.
  • Conventional spectral analysis methods are analyzed. Therefore, how to effectively extract feature information from a large number of near-infrared spectral data has become the focus of research in this field.
  • the characteristic absorption of the sample in one or several bands of the near-infrared spectrum determines that the wavenumber points adjacent to the high-information wavenumber point have a relatively large amount of information, that is, the near-infrared spectrum data has a certain continuous correlation. According to the characteristics of the near-infrared spectroscopy data, the calculation of the wavelength selection algorithm is reduced, and the efficiency of the algorithm is improved. Generally, the near-infrared full spectrum is divided into several sub-intervals, and wavelength selection is performed in intervals.
  • the classical spectral interval selection algorithm has interval partial least squares method.
  • the algorithm divides the whole spectrum into sub-interval sub-intervals, and calculates the RMSECV (Root Mean Square of Cross Validation) for each sub-interval.
  • An interval with the smallest square root error is used as the modeling interval.
  • the derivative algorithms of the interval partial least squares algorithm are joint interval partial least squares method, forward/backward interval partial least squares algorithm, moving window partial least squares method, etc., compared with classical interval partial least squares algorithm, derivative algorithm Not only the single interval but also the combination of several intervals. Although these algorithms can extract the characteristic information of the spectrum, the process of dividing the subintervals has certain subjectivity.
  • Genetic algorithm is a new subject emerging in the 1970s. It is based on the simulation of the natural selection and natural genetic mechanism of the biological world to solve practical problems. It is a highly parallel, random and adaptive search algorithm. In recent years, some scholars have combined genetic algorithm with classical interval partial least squares algorithm to select the characteristic subinterval of near-infrared spectroscopy, simulate natural evolution processes such as natural genetic variation, and solve the optimal combination of feature subintervals, but still exist. Some shortcomings, such as sub-intervals, often rely on experience and have certain subjectivity; genetic algorithms are prone to premature convergence and fall into local optimal solutions, and cannot guarantee global optimal approximation solutions.
  • the simulated annealing algorithm is a stochastic optimization algorithm based on the Mote Carlo iterative solution strategy. The starting point is based on the similarity between the physical annealing process and the combined optimization.
  • the simulated annealing algorithm starts from a higher initial temperature and uses the Metropolis sampling strategy with probability jump to perform a random search in the candidate solution combination. After the repeated sampling process, the global optimal solution of the problem is finally obtained, which is suitable for solving the large-scale combinatorial optimization problem.
  • the present invention proposes a sub-interval selection method based on simulated annealing-genetic algorithm for near-infrared spectroscopy, which will simulate The core Metropolis acceptance criterion in the annealing algorithm introduces the genetic algorithm, and prevents the premature fall into the local optimal solution on the basis of ensuring the efficiency of the genetic algorithm, so as to obtain the optimal combination of the sub-intervals of the near-infrared spectrum.
  • the technical scheme adopted by the invention is: pre-processing the near-infrared spectrum, dynamically dividing the sub-intervals of the pre-processed near-infrared spectrum, and introducing the Metropolis criterion in the simulated annealing algorithm into the gene exchange and gene selection calculation in the genetic algorithm. Son, using the simulated annealing-genetic algorithm to select the optimal feature subinterval, and finally judging the best subinterval partitioning method and the optimal feature subinterval combination, and establishing the PLS model for the selected optimal feature subinterval.
  • the sub-interval selection method of near-infrared spectroscopy based on simulated annealing-genetic algorithm lays a solid foundation for quickly obtaining spectral models with high precision and strong predictive ability.
  • Figure 1 is a flow chart of the present invention
  • Figure 2 is a schematic diagram of Metropolis acceptance criteria
  • Figure 3 is a schematic diagram of an exchange operator introducing the Metropolis criterion
  • Figure 4 is a schematic diagram of a mutation operator introducing the Metropolis criterion
  • Figure 5 is a graph showing the results of sub-interval selection of simulated annealing-genetic algorithm
  • Figure 6 is a comparison result of the simulated annealing-genetic algorithm and the traditional genetic algorithm modeling effect
  • Figure 7 is a near-infrared spectrum of the flavonoids of cucumber leaves pretreated by standard orthogonal transformation.
  • the invention firstly pretreats the near-infrared spectrum, and processes the agricultural product and the food raw near red with an appropriate de-noising method.
  • the denoising method includes standard orthogonal variation, multivariate scatter correction, centralization, first-order/second-order derivative preprocessing methods, etc.
  • the spectral pre-processing process also includes the division of the correction set and the prediction set sample.
  • the pre-processed near-infrared spectrum is dynamically divided into sub-intervals. When sub-intervals are divided, the number of sub-intervals changes dynamically within a range [m, n]. The subsequent processing of the algorithm will select the number of optimal feature subintervals in the range of [m, n].
  • the full spectrum is equally divided into subintervals, if the total number of points is divided by i is equal to p, and there is a remainder q, then the number of wave points in each subinterval of the first q subintervals is p+l, and the number of wavenumber points in each subinterval in the remaining subintervals is p.
  • the Metropol is criterion in the simulated annealing algorithm refers to a judgment rule used in the simulated annealing algorithm to judge the importance of the new solution and the old solution.
  • the Metropol is criterion judges which solution in the old solution and the new solution is an important solution according to the objective function value corresponding to the old solution and the new solution. If the new solution is considered to be an important solution, replace the old solution with the new solution into the next iteration; Then keep the old solution unchanged.
  • the invention introduces the Metropol is criterion in the above simulated annealing algorithm into the gene exchange and gene selection operator in the genetic algorithm, which is called "simulated annealing-genetic algorithm", that is, in the traditional gene exchange operator and the gene mutation operator, the parent
  • the chromosome generates the progeny chromosome through gene exchange or gene mutation, and introduces the Metropolis criterion to judge the parent chromosome (corresponding to the old solution X) and the progeny chromosome (corresponding to the importance of the new solution, if the progeny chromosome is more important than the parent chromosome, then accept the progeny chromosome, Otherwise the child chromosome is rejected.
  • Simulated annealing-genetic algorithm was used to select the spectral optimal feature subinterval, combined with the gene exchange, gene mutation operator and other operators of traditional genetic algorithm introduced by Metropolis criterion, and the optimal feature subinterval was selected for the near-infrared spectrum after subinterval. Intelligently judge the optimal sub-interval division method and the optimal feature sub-interval combination, establish the PLS model of the correction set and the prediction set for the selected optimal feature subinterval, and calculate the correction set rms error and the prediction set rms error. , modeling parameters such as correction set correlation coefficient and prediction set correlation coefficient.
  • the present invention divides the subinterval and selects the optimal feature subinterval, and needs to set the following parameters:
  • the maximum number of sub-intervals I f refers to dividing the full spectrum into If sub-intervals at most.
  • the objective function ⁇ ⁇ ⁇ The role of the objective function is to judge the quality of the current solution X. In general, f (the higher the value of 3 ⁇ 4, the better the quality of the current solution X. The goal of the simulated annealing-genetic algorithm of the present invention is the preferred feature.
  • Population size refers to the number of chromosomes in the population and the number of genes in each chromosome.
  • the number of genes is generally determined according to the parameters of the actual problem. For the feature sub-interval selection problem, the number of chromosomes is generally selected from 30 to 100, and the number of genes is equal to the number of sub-intervals.
  • Gene exchange probability p. In the process of gene exchange, the proportion of chromosome individuals involved in gene exchange accounts for the total number of chromosomes, and the probability of gene exchange is generally set to 0.65 ⁇ 0.9.
  • Probability of gene mutation p D In the process of gene mutation, the chromosome individual involved in the gene mutation accounts for the ratio of the chromosome always, and the probability of gene mutation is generally set to 0.001 to 0.1.
  • Initialization temperature t. Corresponding to the initial temperature during solid annealing, the initial temperature is usually set to 200 to 1000 degrees.
  • Temperature decay function g(d) used to control the temperature cooling rate during solid annealing.
  • + 1 t k g(a )- ⁇
  • is usually in the range of 0.5 ⁇ 0.99.
  • End temperature t f When the annealing temperature reaches the end temperature, the solid will reach a certain stable state, and the solid annealing process is finished. Generally, the annealing temperature t f is about 0 degree.
  • the number of subintervals is i
  • the near-infrared spectrum is divided into i sub-intervals, and binary gene coding is performed, and the number of genes is the number of sub-intervals i.
  • Chromosome initialization randomly generating an initial population of a given size.
  • the temperature decay function g ( a ) slowly decreases, whenever the temperature t decreases, the fitness of each individual in the population is calculated, the parent chromosome is selected by the chromosome selection operator, and the gene exchange is performed according to the improved gene exchange operator.
  • the same solution is used to calculate the optimal solution corresponding to the total number of new subintervals, and the above process is repeated until the total number of subintervals i is greater than the end window width If.
  • the optimal solutions corresponding to the total number of sub-intervals i' e [I0, If] have been obtained. From these optimization solutions, the solution with the largest value of the objective function is selected as Xi , X, which is the global optimal feature of the near-infrared spectrum. Sub-interval set, subscript i is the total number of sub-intervals corresponding to the optimal solution. Finally, the correction set and the prediction set model are established according to the selected global optimal solution.
  • FIG. 2 shows the process by which the Metropolis Code judges the importance of the new solution.
  • the new solution transition probability pt is compared with the random probability density function re [0, 1]. If pt>r holds, the new solution is accepted, otherwise the old solution remains unchanged.
  • the specific judgment process is as follows: (1) The Metropol is criterion first calculates the objective function value f (x) f (y) corresponding to the old solution X new solution y; (2) Produces the random probability density function value ⁇ "; (3) Calculates the new (4) Compare the new solution transition probability Pi with the value of the random probability function value r. If it is greater than or equal to r, replace the old solution with a new solution. Otherwise, the old solution remains unchanged.
  • the Metropol is criterion can not only accept the optimization solution, but also accept the deterioration solution with a certain probability, which provides a guarantee for avoiding the algorithm falling into the local optimal solution.
  • Figure 3 shows the improved gene exchange operator and gene mutation operator flow chart, because the improved gene exchange operator is similar to the improved gene mutation operator, taking the improved gene exchange operator as an example. Details the workflow of this operator. Based on the traditional genetic operation, the Metropolis acceptance criterion in simulated annealing algorithm is introduced. The probability of positive mutation is increased on the basis of the original, the probability of negative mutation is reduced, and the algorithm can jump out of the local optimal solution to the global optimal solution. . The exchange operator randomly selects the parents of the parents from the parent group (denoted as Pi), and generates new generations of children by gene exchange (reported as Ci to calculate their fitness values and f(Ci) respectively, and judge whether to accept the new generation according to the Metropolis criterion. The individual judgment process is shown in Figure 3.
  • Figure 7 shows the near-infrared spectrum of 100 pieces of cucumber leaves pretreated by standard orthogonal change, spectral range lOOOC ⁇ OOcm— 1 , the number of scans is 32; the wave number interval is 7. 712cm—'; the resolution is 16cm- 1 .
  • the spectrum of 70 leaves was used as a correction set, and the near infrared spectrum of the remaining 30 leaves was used as a prediction set.
  • the minimum and maximum number of subintervals are 30, 60, the number of populations is 60, the probability of gene exchange is 0.9, the probability of gene mutation is 0.01, the initial temperature is 200, the end temperature is 0.1, and the temperature attenuation coefficient is 0.95.
  • the simulated annealing-genetic algorithm is used to select the feature subinterval. The specific process is as follows:
  • the number of chromosomes in the population is 60, and the number of genes per chromosome is 30, and the population is initialized;
  • step (6) Decrease the temperature according to the temperature decay function. If the temperature is not equal to the end temperature, repeat steps (3) to (4). If it is equal to the end temperature, perform step (6).
  • the number of subintervals is increased by 1. If the number of subintervals is not equal to the maximum number of subintervals of 60, repeat (1:) ⁇ (5). If the number of subintervals is equal to the maximum number of subintervals of 60, execute step (7). .
  • Figure 5 shows the results of sub-interval selection of near-infrared spectroscopy of cucumber leaf lutein using simulated annealing-genetic algorithm.
  • Figure 6 is a comparison of simulated annealing-genetic algorithm and traditional genetic algorithm modeling of cucumber leaf lutein model.
  • the abscissa is the number of modeling times
  • the ordinate is the calibration set correlation coefficient of the spectral model
  • the curve with the ⁇ mark The calibration set correlation coefficient obtained by modeling the near-infrared spectrum of cucumber leaf lutein by simulated annealing-genetic algorithm
  • the curve with mouth mark is the correction set correlation of the near-infrared model of cucumber leaf lutein obtained by traditional genetic algorithm. coefficient. It can be seen from Figure 6 that the spectral model obtained by simulated annealing-genetic algorithm is superior to the spectral model established by traditional genetic algorithm.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Genetics & Genomics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Physiology (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)
  • Aiming, Guidance, Guns With A Light Source, Armor, Camouflage, And Targets (AREA)
  • Radiation Pyrometers (AREA)
  • Spectrometry And Color Measurement (AREA)

Abstract

A near-infrared spectrum characteristic subinterval selection method based on a simulated annealing-genetic algorithm comprises pre-treating a near-infrared spectrum first, and then dividing the pretreated near-infrared spectrum into subintervals dynamically, introducing the Metropolis rule of the simulated annealing algorithm into the gene exchange operator and the gene selection operator of the genetic algorithm, selecting an optimal characteristic subinterval using the simulated annealing-genetic algorithm, finally judging an optimal subinterval division manner and an optimal characteristic subinterval combination, establishing a partial least squares (PLS) model for the selected optimal characteristic subinterval.

Description

基于模拟退火-遗传算法近红外光谱特征子区间选择方法 技术领域  Sub-interval selection method for near-infrared spectroscopy based on simulated annealing-genetic algorithm
本发明涉及一种对农产品、食品品质进行分析的近红外光谱特征子区间的选择方法, 特指一种基于模拟退火-遗传算法的近红外光谱特征子区间的选择方法。  The invention relates to a method for selecting a sub-interval of sub-infrared spectroscopy for analyzing the quality of agricultural products and foods, and particularly relates to a method for selecting sub-intervals of near-infrared spectroscopy based on simulated annealing-genetic algorithm.
背景技术 Background technique
近红外光谱因分析速度快、效率高等特点越来越广泛地运用到农产品、食品品质分 析中, 但近红外光谱也存在一定的不足, 如背景复杂、 信息强度低, 谱峰重叠等, 难以 用常规的谱图解析方法解析, 因此, 如何有效地从大量近红外光谱数据中提取特征信息 成为本领域研究的重点。  Near-infrared spectroscopy is widely used in agricultural products and food quality analysis due to its fast analysis speed and high efficiency. However, there are certain deficiencies in near-infrared spectroscopy, such as complex background, low information intensity, overlapping peaks, etc. Conventional spectral analysis methods are analyzed. Therefore, how to effectively extract feature information from a large number of near-infrared spectral data has become the focus of research in this field.
样品在近红外光谱的某个或者某几个波段发生特征吸收,决定了高信息量波数点邻 近的波数点具有较髙的信息量, 即近红外光谱数据具有一定的连续相关性。根据近红外 光谱数据的这一特点, 兼顾减少波长选择算法计算量, 提高算法效率等要求, 通常将近 红外全光谱分成若干个子区间, 以区间为单位进行波长选择。经典光谱区间选择算法有 间隔偏最小二乘法, 该算法将全光谱分成若午个子区间, 分别计算各个子区间的交互验 证均方根误差值 RMSECV (Root Mean Square of Cross Validation), 将交换验证均方 根误差最小的一个区间作为建模区间。间隔偏最小二乘算法的衍生算法有联合区间偏最 小二乘法、 向前 /向后区间偏最小二乘算法、 移动窗口偏最小二乘法等, 同经典区间偏 最小二乘算法相比, 衍生算法不仅考察单一区间, 还有几个区间的联合。这些算法虽然 能提取光谱的特征信息, 但划分子区间的过程具有一定的主观性。  The characteristic absorption of the sample in one or several bands of the near-infrared spectrum determines that the wavenumber points adjacent to the high-information wavenumber point have a relatively large amount of information, that is, the near-infrared spectrum data has a certain continuous correlation. According to the characteristics of the near-infrared spectroscopy data, the calculation of the wavelength selection algorithm is reduced, and the efficiency of the algorithm is improved. Generally, the near-infrared full spectrum is divided into several sub-intervals, and wavelength selection is performed in intervals. The classical spectral interval selection algorithm has interval partial least squares method. The algorithm divides the whole spectrum into sub-interval sub-intervals, and calculates the RMSECV (Root Mean Square of Cross Validation) for each sub-interval. An interval with the smallest square root error is used as the modeling interval. The derivative algorithms of the interval partial least squares algorithm are joint interval partial least squares method, forward/backward interval partial least squares algorithm, moving window partial least squares method, etc., compared with classical interval partial least squares algorithm, derivative algorithm Not only the single interval but also the combination of several intervals. Although these algorithms can extract the characteristic information of the spectrum, the process of dividing the subintervals has certain subjectivity.
遗传算法是 20世纪 70年代兴起的一门新兴学科,它基于对生物界自然选择和自然 遗传机制的模拟来解决实际问题,是一种具有高度的并行、随机和自适应性的搜索算法。 近年来有学者将遗传算法同经典间隔偏最小二乘算法相结合,用于选择近红外光谱的特 征子区间, 模拟自然界遗传变异等自然进化过程, 求解特征子区间的最优组合, 但仍然 存在一些不足, 如划分子区间往往依靠经验进行, 具有一定的主观性; 遗传算法容易发 生过早收敛而陷入局部最优解, 不能确保得到全局最优近似解等。  Genetic algorithm is a new subject emerging in the 1970s. It is based on the simulation of the natural selection and natural genetic mechanism of the biological world to solve practical problems. It is a highly parallel, random and adaptive search algorithm. In recent years, some scholars have combined genetic algorithm with classical interval partial least squares algorithm to select the characteristic subinterval of near-infrared spectroscopy, simulate natural evolution processes such as natural genetic variation, and solve the optimal combination of feature subintervals, but still exist. Some shortcomings, such as sub-intervals, often rely on experience and have certain subjectivity; genetic algorithms are prone to premature convergence and fall into local optimal solutions, and cannot guarantee global optimal approximation solutions.
模拟退火算法是基于 Mote Carlo迭代求解策略的一种随机寻优算法, 其出发点是 基于物理退火过程与组合优化之间的相似性。模拟退火算法由一较高初温开始, 利用具 有概率突跳性的 Metropolis抽样策略在待选解组合中进行随机搜索, 伴随温度不断下 降重复抽样过程, 最终得到问题的全局最优解, 适用解决大规模组合优化问题。 The simulated annealing algorithm is a stochastic optimization algorithm based on the Mote Carlo iterative solution strategy. The starting point is based on the similarity between the physical annealing process and the combined optimization. The simulated annealing algorithm starts from a higher initial temperature and uses the Metropolis sampling strategy with probability jump to perform a random search in the candidate solution combination. After the repeated sampling process, the global optimal solution of the problem is finally obtained, which is suitable for solving the large-scale combinatorial optimization problem.
发明内容 Summary of the invention
为克服现有技术中近红外光谱划分子区间具有一定主观性的不足,确保得到全局最 优近似解,本发明提出了一种基于模拟退火-遗传算法近红外光谱特征子区间选择方法, 将模拟退火算法中的核心 Metropolis接受准则引入遗传算法, 在保证遗传算法执行效 率的基础上防止过早的陷入局部最优解, 从而得到近红外光谱特征子区间的最优组合。  In order to overcome the deficiencies of the sub-intervals of the near-infrared spectroscopy in the prior art, and to ensure the global optimal approximate solution, the present invention proposes a sub-interval selection method based on simulated annealing-genetic algorithm for near-infrared spectroscopy, which will simulate The core Metropolis acceptance criterion in the annealing algorithm introduces the genetic algorithm, and prevents the premature fall into the local optimal solution on the basis of ensuring the efficiency of the genetic algorithm, so as to obtain the optimal combination of the sub-intervals of the near-infrared spectrum.
本发明采用的技术方案是: 先对近红外光谱进行预处理, 再对预处理后的近红外光 谱动态划分子区间, 将模拟退火算法中的 Metropolis准则引入遗传算法中的基因交换 和基因选择算子, 使用模拟退火 -遗传算法选择最优特征子区间, 最后判断最佳子区间 划分方式和最优特征子区间组合, 对入选的最优特征子区间建立 PLS模型。  The technical scheme adopted by the invention is: pre-processing the near-infrared spectrum, dynamically dividing the sub-intervals of the pre-processed near-infrared spectrum, and introducing the Metropolis criterion in the simulated annealing algorithm into the gene exchange and gene selection calculation in the genetic algorithm. Son, using the simulated annealing-genetic algorithm to select the optimal feature subinterval, and finally judging the best subinterval partitioning method and the optimal feature subinterval combination, and establishing the PLS model for the selected optimal feature subinterval.
本发明釆用上述技术方案后得到以下效果:  The present invention achieves the following effects by using the above technical solutions:
1、将模拟退火算法中的 Metropolis准则引入交换算子和变异算子, 通过改进后的 变异和交换算子产生高质量的子代个体, 既提高了群体总体的适应度水平, 又为种群进 化提供了足够动力。  1. Introduce the Metropolis criterion in the simulated annealing algorithm into the exchange operator and mutation operator, and generate high-quality offspring individuals through the improved mutation and exchange operator, which not only improves the overall fitness level of the population, but also the evolution of the population. Provides enough power.
2、 Metropolis准则的引入有效地解决了传统遗传算法过早收敛、 陷入局部最优解 的不足; 动态划分光谱子区间, 有效地避免了建模过程中依靠经验人工指定光谱子区间 总数带来的不足。  2. The introduction of Metropolis criterion effectively solves the problem of premature convergence and local optimal solution of traditional genetic algorithm; dynamically dividing the spectral subinterval, effectively avoiding the artificially specified total number of spectral subintervals in the modeling process. insufficient.
3、 基于模拟退火 -遗传算法的近红外光谱特征子区间选择方法为快速得到精度高、 预测能力强的光谱模型打下了坚实的基础。  The sub-interval selection method of near-infrared spectroscopy based on simulated annealing-genetic algorithm lays a solid foundation for quickly obtaining spectral models with high precision and strong predictive ability.
附图说明 DRAWINGS
以下结合附图和具体实施方式对本发明作进一步详细说明。  The present invention will be further described in detail below in conjunction with the drawings and specific embodiments.
图 1是本发明流程图;  Figure 1 is a flow chart of the present invention;
图 2是 Metropolis接受准则示意图;  Figure 2 is a schematic diagram of Metropolis acceptance criteria;
图 3是引入 Metropolis准则的交换算子示意图;  Figure 3 is a schematic diagram of an exchange operator introducing the Metropolis criterion;
图 4是引入 Metropolis准则的变异算子示意图;  Figure 4 is a schematic diagram of a mutation operator introducing the Metropolis criterion;
图 5是模拟退火-遗传算法特征子区间选择结果图;  Figure 5 is a graph showing the results of sub-interval selection of simulated annealing-genetic algorithm;
图 6是模拟退火-遗传算法与传统遗传算法建模效果比较结果图;  Figure 6 is a comparison result of the simulated annealing-genetic algorithm and the traditional genetic algorithm modeling effect;
图 7是标准正交变换预处理后的黄瓜叶叶黄素近红外光谱图。  Figure 7 is a near-infrared spectrum of the flavonoids of cucumber leaves pretreated by standard orthogonal transformation.
具体实施方式 Detailed ways
本发明先对近红外光谱进行预处理, 用适当的消噪方法处理农产品、食品原始近红 光谱后得到的光谱, 消噪方法包括标准正交变化、 多元散射校正、 中心化、 一阶 /二阶 导数预处理方法等; 同时光谱预处理过程还包括对校正集和预测集样本的划分。对预处 理后的近红外光谱动态划分子区间, 划分子区间时, 子区间数在一个范围 [m, n]内动态 变化。 算法后续处理将在 [m, n]范围内选取最优特征子区间数, 当光谱子区间数为 k≡ [m,n]时, 全光谱等分为 /个子区间, 如果总波数点数除以 i等于 p, 存在余数 q, 则前 q个子区间中每个子区间波数点个数为 p+l, 剩余子区间中每个子区间波数点个数为 p。 The invention firstly pretreats the near-infrared spectrum, and processes the agricultural product and the food raw near red with an appropriate de-noising method. After the spectrum, the denoising method includes standard orthogonal variation, multivariate scatter correction, centralization, first-order/second-order derivative preprocessing methods, etc. The spectral pre-processing process also includes the division of the correction set and the prediction set sample. The pre-processed near-infrared spectrum is dynamically divided into sub-intervals. When sub-intervals are divided, the number of sub-intervals changes dynamically within a range [m, n]. The subsequent processing of the algorithm will select the number of optimal feature subintervals in the range of [m, n]. When the number of spectral subintervals is k≡ [m, n], the full spectrum is equally divided into subintervals, if the total number of points is divided by i is equal to p, and there is a remainder q, then the number of wave points in each subinterval of the first q subintervals is p+l, and the number of wavenumber points in each subinterval in the remaining subintervals is p.
模拟退火算法中 Metropol is准则是指模拟退火算法中用于判断新解、 旧解重要性 的一种判断法则。 Metropol is准则根据旧解、新解对应的目标函数值, 判断旧解、新解 中哪个解是重要解, 如果新解被认为是重要解, 则用新解取代旧解进入下一次迭代; 反 之则维持旧解不变。 针对最优特征子区间问题, 假设旧解 x、 新解 y对应的目标函数值 分别为 f 和 θλ那么基于如下 Metropolis准则判断旧解 x、 新解 y的重要性: 当 f (y)>f(x) , 新解为重要解, 否则判断下式 Pt =ejrp( ^^ ) >re [0, l]是否成立, 其中 是新解转移概率, r由范围为 0〜1的均勾概率密度函数随机产生; 如果上式成 立, 则认为新解 y是重要解, 否则认为旧解 Jf是重要解。 The Metropol is criterion in the simulated annealing algorithm refers to a judgment rule used in the simulated annealing algorithm to judge the importance of the new solution and the old solution. The Metropol is criterion judges which solution in the old solution and the new solution is an important solution according to the objective function value corresponding to the old solution and the new solution. If the new solution is considered to be an important solution, replace the old solution with the new solution into the next iteration; Then keep the old solution unchanged. For the optimal feature subinterval problem, assume that the objective function values corresponding to the old solution x and the new solution y are f and θλ respectively, then judge the importance of the old solution x and the new solution y based on the following Metropolis criteria: When f (y)>f (x), the new solution is an important solution, otherwise it is judged whether the following formula Pt = ejr p( ^^ ) > r e [0, l] holds, where is the new solution transition probability, r is the range of 0~1 The probability density function is randomly generated; if the above formula holds, the new solution y is considered to be an important solution, otherwise the old solution Jf is considered to be an important solution.
本发明将上述模拟退火算法中的 Metropol is准则引入遗传算法中的基因交换和基 因选择算子, 称为 "模拟退火-遗传算法", 即在传统基因交换算子和基因变异算子时, 父辈染色体通过基因交换或者基因变异产生子代染色体, 引入 Metropolis准则判断父 辈染色体(对应旧解 X)和子代染色体(对应新解 的重要性, 如果子代染色体比父 辈染色体重要, 则接受子代染色体, 否则拒绝子代染色体。  The invention introduces the Metropol is criterion in the above simulated annealing algorithm into the gene exchange and gene selection operator in the genetic algorithm, which is called "simulated annealing-genetic algorithm", that is, in the traditional gene exchange operator and the gene mutation operator, the parent The chromosome generates the progeny chromosome through gene exchange or gene mutation, and introduces the Metropolis criterion to judge the parent chromosome (corresponding to the old solution X) and the progeny chromosome (corresponding to the importance of the new solution, if the progeny chromosome is more important than the parent chromosome, then accept the progeny chromosome, Otherwise the child chromosome is rejected.
使用模拟退火 -遗传算法选择光谱最优特征子区间,结合引入 Metropolis准则的基 因交换、基因变异算子和传统遗传算法的其它算子, 对划分子区间后的近红外光谱选择 最优特征子区间, 智能判断最佳子区间划分方式和最优特征子区间组合, 对入选的最优 特征子区间建立校正集和预测集的 PLS模型, 并计算校正集均方根误差、预测集均方根 误差、 校正集相关系数、 预测集相关系数等建模参数。  Simulated annealing-genetic algorithm was used to select the spectral optimal feature subinterval, combined with the gene exchange, gene mutation operator and other operators of traditional genetic algorithm introduced by Metropolis criterion, and the optimal feature subinterval was selected for the near-infrared spectrum after subinterval. Intelligently judge the optimal sub-interval division method and the optimal feature sub-interval combination, establish the PLS model of the correction set and the prediction set for the selected optimal feature subinterval, and calculate the correction set rms error and the prediction set rms error. , modeling parameters such as correction set correlation coefficient and prediction set correlation coefficient.
本发明划分子区间以及选择最优特征子区间需要设置如下参数:  The present invention divides the subinterval and selects the optimal feature subinterval, and needs to set the following parameters:
( 1 )最小子区间数 I。: 指至少将全光谱划分为 I。个子区间。  (1) The minimum number of subintervals I. : Refers to dividing at least the full spectrum into I. Sub-interval.
( 2 )最大子区间数 If: 指至多将全光谱划分为 If个子区间。 (2) The maximum number of sub-intervals I f: refers to dividing the full spectrum into If sub-intervals at most.
(3) 目标函数 ί χλ 目标函数的作用是判断当前解 X的质量, 一般情况下 f (¾值 越高, 代表当前解 X的质量越好。本发明模拟退火-遗传算法的目标是优选特征子区间, 当前被选入所有特征子区间被看成是当前解 x, 目标函数定义为 f(¾=l/(l+RMSECV), 其中 RMSECV为选入的所有区间 PLS模型对应的交互验证均方根误差值。 (3) The objective function ί χ λ The role of the objective function is to judge the quality of the current solution X. In general, f (the higher the value of 3⁄4, the better the quality of the current solution X. The goal of the simulated annealing-genetic algorithm of the present invention is the preferred feature. Subinterval, The currently selected all feature subintervals are considered to be the current solution x, and the objective function is defined as f(3⁄4=l/(l+RMSECV), where RMSECV is the interactive verification root mean square error corresponding to all selected interval PLS models. value.
(4)基因编码: 由于遗传算法不能直接处理二位近红外光谱数据, 需要通过二进制 编码将它们表示成遗传空间的基因型串结构数据, 1表示对应位置的基因被选中, 0则 表示对应位置的基因没有被选择。 如 0011001101, 表示染色体共有 10个基因, 其中第 3、 4、 7、 8、 10位置对应基因被选中。  (4) Gene coding: Since the genetic algorithm cannot directly process the two-bit near-infrared spectroscopy data, it needs to represent them as genotype string structure data of the genetic space by binary coding, 1 indicates that the corresponding position of the gene is selected, and 0 indicates the corresponding position. The genes were not selected. For example, 0011001101, it means that there are 10 genes in the chromosome, and the corresponding genes in the 3rd, 4th, 7th, 8th and 10th positions are selected.
(5)种群大小: 指种群中染色体的个数和每个染色体中基因的多少, 其中基因的多 少一般根据实际问题的参数决定。针对特征子区间选择问题,一般选择染色体数为 30〜 100, 基因的个数等于子区间的个数。  (5) Population size: refers to the number of chromosomes in the population and the number of genes in each chromosome. The number of genes is generally determined according to the parameters of the actual problem. For the feature sub-interval selection problem, the number of chromosomes is generally selected from 30 to 100, and the number of genes is equal to the number of sub-intervals.
(6)基因交换概率 p。:基因交换过程中,参与基因交换的染色体个体占染色体总数 的比率, 一般设置基因交换概率为 0.65〜0.9.  (6) Gene exchange probability p. : In the process of gene exchange, the proportion of chromosome individuals involved in gene exchange accounts for the total number of chromosomes, and the probability of gene exchange is generally set to 0.65~0.9.
(7)基因变异概率 pD:基因变异过程中,参与基因变异的染色体个体占染色体总是 的比率, 一般设置基因变异概率为 0.001〜0.1。 (7) Probability of gene mutation p D: In the process of gene mutation, the chromosome individual involved in the gene mutation accounts for the ratio of the chromosome always, and the probability of gene mutation is generally set to 0.001 to 0.1.
(8)初始化温度 t。:对应于固体退火过程中的初始温度,通常设置初始温度为 200〜 1000度。  (8) Initialization temperature t. : Corresponding to the initial temperature during solid annealing, the initial temperature is usually set to 200 to 1000 degrees.
(9) 温度衰减函数 g(d ) : 用于控制固体退火过程中的温度冷却速率, 通常设 +1=tkg(a )-α , α取值范围通常为 0.5〜0.99。 (9) Temperature decay function g(d) : used to control the temperature cooling rate during solid annealing. Usually, + 1 = t k g(a )-α , α is usually in the range of 0.5~0.99.
(10) 结束温度 tf: 当退火温度达到结束温度时, 固体将达到某一稳定状态, 固体 退火过程结束, 一般设置退火温度 tf为 0度左右。 (10) End temperature t f: When the annealing temperature reaches the end temperature, the solid will reach a certain stable state, and the solid annealing process is finished. Generally, the annealing temperature t f is about 0 degree.
对划分特征子区间后的近红外光谱选择最优特征子区间采用如下处理步骤:  The following processing steps are adopted to select the optimal feature subinterval for the near-infrared spectrum after dividing the feature subinterval:
(1)子区间数为 i时, 将近红外光谱划分为 i个子区间, 进行二进制基因编码, 基 因个数为子区间数 i。  (1) When the number of subintervals is i, the near-infrared spectrum is divided into i sub-intervals, and binary gene coding is performed, and the number of genes is the number of sub-intervals i.
(2) 染色体初始化, 随机产生给定大小的初始群体。  (2) Chromosome initialization, randomly generating an initial population of a given size.
(3)温度为 t时, 计算群体中染色体的目标函数 / · 6 , 采用选择算子选择适应度 髙的个体, 淘汰适应度低的个体, 实现种群的优胜劣汰。  (3) When the temperature is t, calculate the objective function of the chromosome in the population / · 6 , use the selection operator to select the individual with fitness ,, and eliminate the individuals with low fitness to achieve the survival of the fittest.
(4)采用将模拟退火算法中的 Metropolis准则引入遗传算法中的改进后的基因交 换算子、 基因变异算子进行基因交换和基因变异操作。  (4) The improved gene crossover and gene mutation operator for introducing the Metropolis criterion in the simulated annealing algorithm into the genetic algorithm for gene exchange and gene mutation operations.
(5)根据温度衰减函数 g(ci )减低温度 t, 如果 t不等于结束温度 tf, 重复执行步 骤 (3) 〜 (4), 如果等于结束温度 tf, 则执行步骤 (6)。 (5) reducing the temperature t of the temperature decay function g (ci), if t is not equal to the end temperature t F, repeating steps (3) to (4), if the temperature t equal to the end F, the step (6).
(6)子区间数 2 '增加 1, 如果 i不等于最大子区间数 If, 则重复执行(1) 〜(5), 如果 i等于最大子区间数 If, 则执行步骤(7)。 (6) The number of subintervals 2 ' increases by 1. If i is not equal to the maximum number of subintervals I f , then repeat (1) ~ (5), If i is equal to the maximum number of subintervals I f , then step (7) is performed.
( 7 )判断最佳子区间总数及选中的最优特征子区间。  (7) Determine the total number of best subintervals and the selected optimal subintervals.
本发明基于模拟退火-遗传算法近红外光谱特征子区间选择方法的具体步骤如图 1 所示, 对近红外光谱进行预处理后, 当子区间总数 i=Io时, 对全光谱进行特征子区间 划分, 并进行二进制基因编码, 随机初始化种群。 确定初始种群后, 退火温度 ί从起始 温度 t。开始, 根据温度衰减函数 g ( a )徐徐降低, 每当温度 t降低时, 计算种群中每个 个体的适应度, 通过染色体选择算子选择父辈染色体, 根据改进后的基因交换算子进行 基因交换, 根据改进后的基因变异算子进行基因变异, 反复执行上述过程至退火温度达 到结束温度时, 保存子区间总数 i = 10时对应的最优解, 并按 i = i+1的方式递增, 按照相同步骤计算新子区间总数对应的最优解,重复上述过程直到子区间总数 i大于结 束窗口宽度 If。此时己经得到了不同子区间总数 i' e [I0, If]对应的优化解,从这些优 化解中选择目标函数值最大的解记为 Xi, X,为近红外光谱的全局最优特征子区间集合, 下标 i为取得最优解时对应的子区间总数。最后, 根据选出的全局最优解建立校正集与 预测集模型。 The specific steps of the sub-interval selection method of the near-infrared spectroscopy characteristic of the simulated annealing-genetic algorithm are shown in Fig. 1. After the pre-processing of the near-infrared spectrum, when the total number of sub-intervals i=Io, the characteristic sub-interval of the full spectrum is performed. Divide, and perform binary gene coding to randomly initialize the population. After determining the initial population, the annealing temperature ί is from the starting temperature t. Initially, according to the temperature decay function g ( a ) slowly decreases, whenever the temperature t decreases, the fitness of each individual in the population is calculated, the parent chromosome is selected by the chromosome selection operator, and the gene exchange is performed according to the improved gene exchange operator. Gene mutation according to the improved gene mutation operator, and repeatedly performing the above process until the annealing temperature reaches the end temperature, and the corresponding optimal solution is saved when the total number of sub-intervals i = 10, and is incremented by i = i+1. The same solution is used to calculate the optimal solution corresponding to the total number of new subintervals, and the above process is repeated until the total number of subintervals i is greater than the end window width If. At this point, the optimal solutions corresponding to the total number of sub-intervals i' e [I0, If] have been obtained. From these optimization solutions, the solution with the largest value of the objective function is selected as Xi , X, which is the global optimal feature of the near-infrared spectrum. Sub-interval set, subscript i is the total number of sub-intervals corresponding to the optimal solution. Finally, the correction set and the prediction set model are established according to the selected global optimal solution.
如图 2, 显示了 Metropolis准则判断新解旧解重要性的过程。将新解转移概率 pt 和随机概率密度函数 re [0, 1]进行比较, 如果 pt>r成立则表示新解被接受, 否则维持 旧解不变。具体判断过程如下: (1 ) Metropol is准则首先计算旧解 X新解 y对应的目标 函数值 f (x) f (y) ; (2) 生产随机概率密度函数值 ί"; (3)计算新解转移概率; (4) 比 较新解转移概率 Pi和随机概率函数值 r的大小, 若 大于或者等于 r, 则用新解代替 旧解, 否则, 旧解保持不变。根据此准则可以得出如下: Metropol is准则不但可以接受 优化解, 而且能够以一定的概率接受恶化解, 为避免算法陷入局部最优解提供了保障。  Figure 2 shows the process by which the Metropolis Code judges the importance of the new solution. The new solution transition probability pt is compared with the random probability density function re [0, 1]. If pt>r holds, the new solution is accepted, otherwise the old solution remains unchanged. The specific judgment process is as follows: (1) The Metropol is criterion first calculates the objective function value f (x) f (y) corresponding to the old solution X new solution y; (2) Produces the random probability density function value ί"; (3) Calculates the new (4) Compare the new solution transition probability Pi with the value of the random probability function value r. If it is greater than or equal to r, replace the old solution with a new solution. Otherwise, the old solution remains unchanged. According to this criterion, As follows: The Metropol is criterion can not only accept the optimization solution, but also accept the deterioration solution with a certain probability, which provides a guarantee for avoiding the algorithm falling into the local optimal solution.
图 3 图 4显示了改进后的基因交换算子和基因变异算子流程图,因为改进后的基因 交换算子和改进后的基因变异算子类似, 以改进后的基因交换算子为例, 详细说明该算 子的工作流程。 在传统遗传操作的基础上引进了模拟退火算法中的 Metropolis接受准 则, 在原来基础上增大正变异发生的概率, 减小负变异发生概率, 保证算法能跳出局部 最优解向全局最优解收敛。交换算子从父辈群体中随机选取父辈个体 (记为 Pi),通过基 因交换产生子代新个体(记为 Ci 分别计算它们的适应度值 和 f(Ci) , 按照 Metropolis准则判断是否接受新产生的个体。 具体判断过程如图 3所示。  Figure 3 Figure 4 shows the improved gene exchange operator and gene mutation operator flow chart, because the improved gene exchange operator is similar to the improved gene mutation operator, taking the improved gene exchange operator as an example. Details the workflow of this operator. Based on the traditional genetic operation, the Metropolis acceptance criterion in simulated annealing algorithm is introduced. The probability of positive mutation is increased on the basis of the original, the probability of negative mutation is reduced, and the algorithm can jump out of the local optimal solution to the global optimal solution. . The exchange operator randomly selects the parents of the parents from the parent group (denoted as Pi), and generates new generations of children by gene exchange (reported as Ci to calculate their fitness values and f(Ci) respectively, and judge whether to accept the new generation according to the Metropolis criterion. The individual judgment process is shown in Figure 3.
实施例 Example
如图 7 为经过标准正交变化预处理后的 100 片黄瓜叶近红外光谱图, 光谱范围 lOOOC^^OOcm—1, 扫描次数为 32次; 波数间隔为 7. 712cm—'; 分辨率为 16cm-1 。其中 70 片叶子的光谱作为校正集, 剩余 30片叶子的近红外光谱作为预测集。 设定最小、 最大 子区间数分别为 30、 60, 群体个数为 60, 基因交换概率 0. 9, 基因变异概率 0. 01 , 初 始温度 200, 结束温度 0. 1 ,温度衰减系数 0. 95,采用模拟退火-遗传算法选择特征子区 间, 具体过程如下: Figure 7 shows the near-infrared spectrum of 100 pieces of cucumber leaves pretreated by standard orthogonal change, spectral range lOOOC^^OOcm— 1 , the number of scans is 32; the wave number interval is 7. 712cm—'; the resolution is 16cm- 1 . The spectrum of 70 leaves was used as a correction set, and the near infrared spectrum of the remaining 30 leaves was used as a prediction set. The minimum and maximum number of subintervals are 30, 60, the number of populations is 60, the probability of gene exchange is 0.9, the probability of gene mutation is 0.01, the initial temperature is 200, the end temperature is 0.1, and the temperature attenuation coefficient is 0.95. The simulated annealing-genetic algorithm is used to select the feature subinterval. The specific process is as follows:
(1) 当子区间数为 30时, 将全光谱划分为 30个子区伺, 并进行二进制编码;  (1) When the number of subintervals is 30, the full spectrum is divided into 30 sub-areas and binary coded;
(2) 群体染色体数为 60, 每条染色体基因数为 30, 对群体进行初始化操作;  (2) The number of chromosomes in the population is 60, and the number of genes per chromosome is 30, and the population is initialized;
(3) 当温度为 200时, 计算群体中染色体适应度, 采用选择算子选择适应度高的个 体, 淘汰适应度低的个体, 实现种群的优胜劣汰。  (3) When the temperature is 200, the chromosome fitness in the population is calculated, and the selection operator is used to select individuals with high fitness, and the individuals with low fitness are eliminated to achieve the survival of the fittest.
(4) 采用改进后的基因交换算子、 基因变异算子进行基因交换和基因变异操作。  (4) Using the improved gene exchange operator and gene mutation operator for gene exchange and gene mutation operations.
(5) 根据温度衰减函数减低温度, 如果温度不等于结束温度, 重复执行步骤(3)〜 (4 ), 如果等于结束温度, 则执行步骤(6)。  (5) Decrease the temperature according to the temperature decay function. If the temperature is not equal to the end temperature, repeat steps (3) to (4). If it is equal to the end temperature, perform step (6).
(6) 子区间数增加 1 ,如果子区间数不等于最大子区间数 60,则重复执行(1:)〜(5), 如果子区间数等于最大子区间数 60, 则执行步骤(7)。  (6) The number of subintervals is increased by 1. If the number of subintervals is not equal to the maximum number of subintervals of 60, repeat (1:)~(5). If the number of subintervals is equal to the maximum number of subintervals of 60, execute step (7). .
(7) 判断得到子区间数为 40时, 选择了 7个子区间, 分别为第 3、 5、 14、 18、 21、 32、 33子区间, 建立的模型最优。  (7) When it is judged that the number of subintervals is 40, seven subintervals are selected, which are sub-intervals of 3, 5, 14, 18, 21, 32, and 33, and the established model is optimal.
图 5 为采用模拟退火-遗传算法对黄瓜叶叶黄素近红外光谱特征子区间选择结果。 图 6为模拟退火-遗传算法与传统遗传算法建模黄瓜叶叶黄素模型效果比较结果,图 6中横坐标为建模次数, 纵坐标为光谱模型的校正集相关系数, 带厶标志的曲线为采用 模拟退火-遗传算法对黄瓜叶叶黄素近红外光谱建模得到的校正集相关系数, 带口标志 的曲线为采用传统遗传算法得到的黄瓜叶叶黄素近红外模型对应的校正集相关系数。从 图 6中可以看出, 模拟退火-遗传算法得到的光谱模型要优于传统遗传算法建立的光谱 模型。  Figure 5 shows the results of sub-interval selection of near-infrared spectroscopy of cucumber leaf lutein using simulated annealing-genetic algorithm. Figure 6 is a comparison of simulated annealing-genetic algorithm and traditional genetic algorithm modeling of cucumber leaf lutein model. In Figure 6, the abscissa is the number of modeling times, the ordinate is the calibration set correlation coefficient of the spectral model, and the curve with the 厶 mark. The calibration set correlation coefficient obtained by modeling the near-infrared spectrum of cucumber leaf lutein by simulated annealing-genetic algorithm, and the curve with mouth mark is the correction set correlation of the near-infrared model of cucumber leaf lutein obtained by traditional genetic algorithm. coefficient. It can be seen from Figure 6 that the spectral model obtained by simulated annealing-genetic algorithm is superior to the spectral model established by traditional genetic algorithm.

Claims

权利 要求 、一种基于模拟退火-遗传算法近红外光谱特征子区间选择方法, 其特征是: 先对近红 外光谱进行预处理, 再对预处理后的近红外光谱动态划分子区间, 将模拟退火算法 中的 Metropolis准则引入遗传算法中的基因交换和基因选择算子, 使用模拟退火- 遗传算法选择最优特征子区间, 最后判断最佳子区间划分方式和最优特征子区间组 合, 对入选的最优特征子区间建立 PLS模型。 、根据权利要求 1所述的基于模拟退火-遗传算法近红外光谱特征子区间选择方法, 其 特征是:划分子区间以及使用模拟退火-遗传算法选择最优特征子区间需设置的参数 为: 最小子区间数 I。、 最大子区间数 If、 目标函数/ ^入 基因编码、 种群大小、 基 因交换概率 Ρ。、 基因变异概率 ρ„、 初始化温度 t。、 温度衰减函数 g( a )和结束温度 、根据权利要求 2所述的基于模拟退火-遗传算法近红外光谱特征子区间选择方法, 其 特征是: 选择最优特征子区间采用如下步骤: Claims, a near-infrared spectrum feature sub-interval selection method based on simulated annealing-genetic algorithm, which is characterized by: first preprocessing the near-infrared spectrum, and then dynamically dividing the pre-processed near-infrared spectrum into sub-intervals, and then simulated annealing The Metropolis criterion in the algorithm introduces the gene exchange and gene selection operators in the genetic algorithm, uses simulated annealing-genetic algorithm to select the optimal characteristic sub-interval, and finally determines the optimal sub-interval division method and optimal characteristic sub-interval combination. The optimal characteristic sub-interval is used to establish a PLS model. . The near-infrared spectrum characteristic sub-interval selection method based on simulated annealing-genetic algorithm according to claim 1, which is characterized by: dividing the sub-intervals and using the simulated annealing-genetic algorithm to select the optimal characteristic sub-interval. The parameters that need to be set are: The most The number of small subintervals I. , the maximum number of subintervals If, the objective function / ^ enters the gene encoding, the population size, and the gene exchange probability P. , gene mutation probability ρ„, initialization temperature t., temperature attenuation function g(a) and end temperature, the near-infrared spectrum feature sub-interval selection method based on simulated annealing-genetic algorithm according to claim 2, which is characterized by: selection The optimal characteristic sub-interval adopts the following steps:
( 1 )子区间数为 i时, 将近红外光谱划分为 i个子区间进行二进制基因编码, 基因 个数为子区间数 (1) When the number of subintervals is i, the near-infrared spectrum is divided into i subintervals for binary gene encoding, and the number of genes is the number of subintervals.
(2 )染色体初始化, 随机产生给定大小的初始群体; (2) Chromosome initialization, randomly generate an initial population of a given size;
(3 )温度为 t时, 计算群体中染色体的目标函数 ffe),采用选择算子选择适应度高 的个体, 淘汰适应度低的个体, 实现种群的优胜劣汰; (3) When the temperature is t, calculate the objective function ffe) of the chromosomes in the population, use the selection operator to select individuals with high fitness, and eliminate individuals with low fitness to achieve the survival of the fittest in the population;
(4)采用将模拟退火算法中的 Metropolis准则引入遗传算法中的改进后的基因交 换算子、 基因变异算子进行基因交换和基因变异操作; (4) Use the improved gene exchange operator and gene mutation operator that introduce the Metropolis criterion in the simulated annealing algorithm into the genetic algorithm to perform gene exchange and gene mutation operations;
(5)根据温度衰减函数 g ( a )减低温度 t, 若 t不等于结束温度 tf, 重复执行步骤 (3)〜(4), 若等于结束温度 tf, 则执行步骤(6); (5) Reduce the temperature t according to the temperature attenuation function g (a). If t is not equal to the end temperature t f , repeat steps (3) to (4). If it is equal to the end temperature t f , perform step (6);
(6)子区间数 2 '增加 1, 若 i不等于最大子区间数 If, 则重复执行(1 )〜 (5), 若 i等于最大子区间数 If, 则执行步骤 (7); (6) The number of subintervals 2' is increased by 1. If i is not equal to the maximum number of subintervals If, then (1) to (5) are repeated. If i is equal to the maximum number of subintervals If , then step (7) is performed;
(7)判断最佳子区间总数及选中的最优特征子区间。 (7) Determine the total number of optimal subintervals and the selected optimal characteristic subinterval.
、根据权利要求 3所述的基于模拟退火 -遗传算法近红外光谱特征子区间选择方法, 其 特征是: 步骤(4)所述改进后的基因交换算子、 基因变异算子进行基因交换和基因 变异操作的方法是: 增大正变异发生的概率, 减小负变异发生概率, 交换算子从父 辈群体中随机选取父辈个体, 通过基因交换产生子代新个体, 分别计算其适应度值, 按照 Metropolis准则判断是否接受新产生的个体。 . The near-infrared spectrum feature sub-interval selection method based on simulated annealing-genetic algorithm according to claim 3, which is characterized by: the improved gene exchange operator and gene mutation operator described in step (4) perform gene exchange and gene The method of mutation operation is: increase the probability of positive mutation and reduce the probability of negative mutation. The exchange operator randomly selects parent individuals from the parent population, generates new offspring individuals through genetic exchange, and calculates their fitness values respectively. Determine whether to accept newly created individuals according to Metropolis guidelines.
、根据权利要求 1所述的基于模拟退火-遗传算法近红外光谱特征子区间选择方法, 其 特征是: 所述模拟退火算法中的 Metropolis准则是根据旧解、新解对应的目标函数 值判断旧解、 新解中哪个解是重要解, 若新解被认为是重要解, 则用新解取代旧解 进入下一次迭代; 反之则维持旧解不变。 . The near-infrared spectrum characteristic sub-interval selection method based on simulated annealing-genetic algorithm according to claim 1, characterized by: The Metropolis criterion in the simulated annealing algorithm is to judge the old solution based on the objective function values corresponding to the new solution. Which solution among the solution and the new solution is the important solution? If the new solution is considered to be the important solution, the new solution will replace the old solution and enter the next iteration; otherwise, the old solution will remain unchanged.
、根据权利要求 1所述的基于模拟退火-遗传算法近红外光谱特征子区间选择方法, 其 特征是: 划分的子区间数在一个范围 [m,n]内动态变化, 使用模拟退火 -遗传算法选 择最优特征子区间在范围 [m,n]内选取, 当子区间数为 ce [mn]时, 全光谱等分为 /c 个子区间, 若总波数点数除以/等于 p, 存在余数9, 则前 q个子区间中每个子区间 波数点个数为 P+l, 剩余子区间中每个子区间波数点个数为 P。 . The near-infrared spectrum feature sub-interval selection method based on simulated annealing-genetic algorithm according to claim 1, characterized by: the number of divided sub-intervals changes dynamically within a range [m, n], using simulated annealing-genetic algorithm Select the optimal characteristic sub-interval within the range [m, n]. When the number of sub-intervals is ce [ m , n ], the full spectrum is equally divided into /c sub-intervals. If the total number of wavenumber points divided by / is equal to p, there is The remainder is 9, then the number of wave number points in each sub-interval in the first q sub-intervals is P+l, and the number of wave number points in each sub-interval in the remaining sub-intervals is P.
PCT/CN2010/000530 2010-03-12 2010-04-19 Near-infrared spectrum characteristic subinterval selection method based on simulated annealing-genetic algorithm WO2011109922A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010123931.2 2010-03-12
CN2010101239312A CN101832909B (en) 2010-03-12 2010-03-12 Selection method for subintervals of near infrared spectral characteristics based on simulated annealing-genetic algorithm

Publications (1)

Publication Number Publication Date
WO2011109922A1 true WO2011109922A1 (en) 2011-09-15

Family

ID=42717054

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/000530 WO2011109922A1 (en) 2010-03-12 2010-04-19 Near-infrared spectrum characteristic subinterval selection method based on simulated annealing-genetic algorithm

Country Status (2)

Country Link
CN (1) CN101832909B (en)
WO (1) WO2011109922A1 (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102183467B (en) * 2011-01-24 2012-07-25 中国科学院长春光学精密机械与物理研究所 Modeling method for grading quality of Xinjiang red dates in near infrared range
CN102928382B (en) * 2012-11-12 2015-04-22 江苏大学 Near-infrared spectral characteristic wavelength selecting method based on improved simulated annealing algorithm
CN105046003B (en) * 2015-07-23 2018-06-29 王家俊 The spectral signature interval selection of simulated annealing-genetic algorithm and spectrum encryption method
CN105160184B (en) * 2015-09-08 2017-11-28 合肥工业大学 A kind of analysis method of multicomponent melt interdiffusion coefficient
CN105224961B (en) * 2015-11-04 2018-05-11 中国电子科技集团公司第四十一研究所 A kind of infrared spectrum feature extracting and matching method of high resolution
CN106021859B (en) * 2016-05-09 2018-10-16 吉林大学 The controlled-source audiomagnetotellurics method one-dimensional inversion method of improved adaptive GA-IAGA
CN106198446A (en) * 2016-06-22 2016-12-07 中国热带农业科学院热带作物品种资源研究所 The method of L-Borneol content near infrared spectrum quick test Herba Blumeae Balsamiferae leaf powder
CN106198433B (en) * 2016-06-27 2019-04-12 中国科学院合肥物质科学研究院 Infrared spectroscopy method for qualitative analysis based on LM-GA algorithm
CN108882256B (en) * 2018-08-21 2020-11-10 广东电网有限责任公司 Method and device for optimizing coverage of wireless sensor network node
CN109145873B (en) * 2018-09-27 2022-03-22 广东工业大学 Spectral Gaussian peak feature extraction algorithm based on genetic algorithm
CN109507143B (en) * 2018-10-29 2019-12-31 黑龙江八一农垦大学 Near infrared spectrum synchronous rapid detection method for physical and chemical indexes of biogas slurry
CN109540836A (en) * 2018-11-30 2019-03-29 济南大学 Near infrared spectrum pol detection method and system based on BP artificial neural network
CN111125629B (en) * 2019-12-25 2023-04-07 温州大学 Domain-adaptive PLS regression model modeling method
CN112881333B (en) * 2021-01-13 2022-03-04 江南大学 Near infrared spectrum wavelength screening method based on improved immune genetic algorithm
CN117494630B (en) * 2023-12-29 2024-04-26 珠海格力电器股份有限公司 Register time sequence optimization method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040033617A1 (en) * 2002-08-13 2004-02-19 Sonbul Yaser R. Topological near infrared analysis modeling of petroleum refinery products
CN1657907A (en) * 2005-03-23 2005-08-24 江苏大学 Agricultural products, food near-infrared spectral specragion selection method
JP2005291704A (en) * 2003-11-10 2005-10-20 New Industry Research Organization Visible light/near infrared spectral analysis method
CN101078685A (en) * 2007-05-17 2007-11-28 常熟雷允上制药有限公司 Method for quickly on-line detection of traditional Chinese medicine Kuhuang injection effective ingredient using near infra red spectrum

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101520412A (en) * 2009-03-23 2009-09-02 中国计量学院 Near infrared spectrum analyzing method based on isolated component analysis and genetic neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040033617A1 (en) * 2002-08-13 2004-02-19 Sonbul Yaser R. Topological near infrared analysis modeling of petroleum refinery products
JP2005291704A (en) * 2003-11-10 2005-10-20 New Industry Research Organization Visible light/near infrared spectral analysis method
CN1657907A (en) * 2005-03-23 2005-08-24 江苏大学 Agricultural products, food near-infrared spectral specragion selection method
CN101078685A (en) * 2007-05-17 2007-11-28 常熟雷允上制药有限公司 Method for quickly on-line detection of traditional Chinese medicine Kuhuang injection effective ingredient using near infra red spectrum

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHU XIAOLI ET AL.: "Variable selection for partial least squares modeling by genetic algorithm", CHINESE JOURNAL OF ANALYTICAL CHEMISTRY, vol. 29, no. 4, April 2001 (2001-04-01), pages 437 - 442 *
GU XIAOYU ET AL.: "Application of wavelength selection algorithm to measure the effective component of Chinese medicine based on near-infrared spectroscopy", SPECTROSCOPY AND SPECTRAL ANALYSIS, vol. 26, no. 9, September 2006 (2006-09-01), pages 1618 - 1620 *
ZHU SHIPING ET AL.: "Region selection method of near infrared spectrum based on genetic algorithm", TRANSACTIONS OF THE CHINESE SOCIETY FOR AGRICULTRAL MACHINERY, vol. 35, no. 5, September 2004 (2004-09-01), pages 152 - 156 *

Also Published As

Publication number Publication date
CN101832909B (en) 2012-01-18
CN101832909A (en) 2010-09-15

Similar Documents

Publication Publication Date Title
WO2011109922A1 (en) Near-infrared spectrum characteristic subinterval selection method based on simulated annealing-genetic algorithm
CN110363344B (en) Probability integral parameter prediction method for optimizing BP neural network based on MIV-GP algorithm
CN111007040B (en) Near infrared spectrum rapid evaluation method for rice taste quality
CN110046378B (en) Selective hierarchical integration Gaussian process regression soft measurement modeling method based on evolutionary multi-objective optimization
CN111191835B (en) IES incomplete data load prediction method and system based on C-GAN migration learning
CN102626557B (en) Molecular distillation process parameter optimizing method based on GA-BP (Genetic Algorithm-Back Propagation) algorithm
CN104573820A (en) Genetic algorithm for solving project optimization problem under constraint condition
CN113049507A (en) Multi-model fused spectral wavelength selection method
CN107894710B (en) Principal component analysis modeling method for temperature of cracking reaction furnace
CN107729988B (en) Blue algae bloom prediction method based on dynamic deep belief network
CN114004153A (en) Penetration depth prediction method based on multi-source data fusion
CN106296434A (en) A kind of Grain Crop Yield Prediction method based on PSO LSSVM algorithm
CN108107724A (en) Model Predictive Control parameter method for on-line optimization based on genetic algorithm
CN111126560A (en) Method for optimizing BP neural network based on cloud genetic algorithm
CN114841468A (en) Gasoline quality index prediction and cause analysis method
CN107577918A (en) The recognition methods of CpG islands, device based on genetic algorithm and hidden Markov model
CN101806728B (en) Method for selecting characteristic wavelength of near-infrared spectrum based on simulated annealing algorithm
CN110119842A (en) A kind of micro-capacitance sensor short-term load forecasting method
CN107688702B (en) Lane colony algorithm-based river channel flood flow evolution law simulation method
CN105136688A (en) Improved changeable size moving window partial least square method used for analyzing molecular spectrum
CN111709519B (en) Deep learning parallel computing architecture method and super-parameter automatic configuration optimization thereof
Ji et al. An improved simulated annealing genetic algorithm of EEG feature selection in sleep stage
CN106295667B (en) A kind of method and its application selecting optimal spectrum based on genetic algorithm
CN114117917B (en) Multi-objective optimization ship magnetic dipole array modeling method
CN115114838A (en) Spectral characteristic wavelength selection method based on particle swarm algorithm thought and simulated annealing strategy

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10847181

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10847181

Country of ref document: EP

Kind code of ref document: A1