CN113657028B - Online aerosol optical thickness prediction method based on multi-source information - Google Patents
Online aerosol optical thickness prediction method based on multi-source information Download PDFInfo
- Publication number
- CN113657028B CN113657028B CN202110898847.6A CN202110898847A CN113657028B CN 113657028 B CN113657028 B CN 113657028B CN 202110898847 A CN202110898847 A CN 202110898847A CN 113657028 B CN113657028 B CN 113657028B
- Authority
- CN
- China
- Prior art keywords
- wind speed
- eigenvalues
- output
- humidity
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000003287 optical effect Effects 0.000 title claims abstract description 72
- 239000000443 aerosol Substances 0.000 title claims abstract description 56
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000013507 mapping Methods 0.000 claims abstract description 47
- 238000004364 calculation method Methods 0.000 claims abstract description 22
- 239000012634 fragment Substances 0.000 claims abstract description 18
- 238000013528 artificial neural network Methods 0.000 claims abstract description 15
- 230000006870 function Effects 0.000 claims abstract description 14
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 11
- 238000000605 extraction Methods 0.000 claims abstract description 10
- 238000000513 principal component analysis Methods 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 58
- 239000011159 matrix material Substances 0.000 claims description 51
- 230000001186 cumulative effect Effects 0.000 claims description 17
- 238000005457 optimization Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 230000007613 environmental effect Effects 0.000 claims description 3
- 230000004927 fusion Effects 0.000 abstract description 4
- 230000008054 signal transmission Effects 0.000 abstract description 2
- 241000257303 Hymenoptera Species 0.000 description 10
- 230000003993 interaction Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 5
- 238000012937 correction Methods 0.000 description 3
- 238000013480 data collection Methods 0.000 description 3
- 230000008033 biological extinction Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000000053 physical method Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000002310 reflectometry Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computer Hardware Design (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Algebra (AREA)
- Geometry (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域Technical Field
本发明属于大气信道光信号传输技术领域,具体涉及一种基于多源信息的气溶胶光学厚度在线预测方法。The invention belongs to the technical field of atmospheric channel optical signal transmission, and in particular relates to an online prediction method for aerosol optical thickness based on multi-source information.
背景技术Background Art
光学参数估计是光信息传输的关键要素,在大气信道中,由于受到湍流的影响,往往造成光斑偏移、到达角起伏现象,给光学参数估计带来困难。气溶胶光学厚度是判断光学参数的重要支撑,对于确定光学参数数值和变化规律具有重要意义。通过物理方法进行模型建立和反演计算,会造成难以建模或无法求解的瓶颈,影响光学参数估计精度,如何通过数据驱动的方式进行气溶胶光学厚度(Aerosol Optical Depth,AOD)在线预测,对于准确估计光学参数,设计光学通信系统有重要意义。Optical parameter estimation is a key element in optical information transmission. In atmospheric channels, due to the influence of turbulence, the light spot is often offset and the arrival angle fluctuates, which makes it difficult to estimate the optical parameters. Aerosol optical depth is an important support for judging optical parameters, and is of great significance for determining the values and changing laws of optical parameters. Model establishment and inversion calculations using physical methods will cause bottlenecks that are difficult to model or cannot be solved, affecting the accuracy of optical parameter estimation. How to predict aerosol optical depth (AOD) online in a data-driven way is of great significance for accurately estimating optical parameters and designing optical communication systems.
中国专利《基于气溶胶地基数据的AOD垂直订正效果评价方法及系统》公开号106446307A,提出了一种气溶胶数据计算方法,包括估算近地面气溶胶消光系数和基于能见度仪观测数据反演气溶胶消光系数,利用模拟方法对AOD进行垂直修正,但是并没有考虑不同天气对于气溶胶厚度的影响,无法对AOD的变化趋势进行预测。Chinese patent "AOD vertical correction effect evaluation method and system based on aerosol ground-based data" publication number 106446307A proposes a method for calculating aerosol data, including estimating the near-ground aerosol extinction coefficient and inverting the aerosol extinction coefficient based on visibility meter observation data, and using a simulation method to vertically correct the AOD. However, it does not consider the impact of different weather conditions on aerosol thickness, and is unable to predict the changing trend of AOD.
中国专利《基于逆方差加权平均的卫星AOD数据的融合方法及系统》公开号106407634A,提供基于逆方差加权平均的卫星AOD数据的融合方法及系统。通过该发明能够提高卫星气溶胶厚度数据覆盖率和融合精度要求。中国专利《一种融合多源特征地理参数的卫星AOD产品校正方法》公开号109213964A,公开了一种气溶胶校正方法,通过随机森林和校正阈值,修正卫星测量的参数,为研究大气PM2.5有效浓度提供参考。但是这两个方法难以满足在线变化的条件,对于阈值的选取有很强的依赖性。Chinese patent "Method and system for fusion of satellite AOD data based on inverse variance weighted average" Publication No. 106407634A provides a method and system for fusion of satellite AOD data based on inverse variance weighted average. This invention can improve the coverage rate and fusion accuracy requirements of satellite aerosol thickness data. Chinese patent "A method for correcting satellite AOD products by integrating multi-source characteristic geographic parameters" Publication No. 109213964A discloses an aerosol correction method, which corrects the parameters measured by satellite through random forests and correction thresholds to provide a reference for studying the effective concentration of atmospheric PM2.5. However, these two methods are difficult to meet the conditions of online changes and have a strong dependence on the selection of thresholds.
中国专利《基于PM2.5和PM10的气溶胶光学厚度估计方法》公开号106096246A,提供了一种气溶胶估算方法,通过万有引力神经网络模型进行估算,解决反演精度不高,难以实时获取的问题,然而该方法并没有考虑测量数据变化和模型更新的关系,无法满足模型自动调整。中国专利《一种气溶胶光学厚度反演方法》公开号110186823A,通过历史数据建立双向反射率分布函数参数数据库,通过查表法,得到对应的气溶胶光学厚度值,但是这种方法无法计算模型的泛化性能,计算结果只在特定的假设条件下有效。China's patent publication number 106096246A, "Aerosol optical thickness estimation method based on PM2.5 and PM10", provides an aerosol estimation method that estimates through a gravitational neural network model to solve the problem of low inversion accuracy and difficulty in real-time acquisition. However, this method does not consider the relationship between measurement data changes and model updates, and cannot meet the automatic adjustment of the model. China's patent publication number 110186823A, "A method for inverting aerosol optical thickness", establishes a bidirectional reflectivity distribution function parameter database through historical data, and obtains the corresponding aerosol optical thickness value through a table lookup method, but this method cannot calculate the generalization performance of the model, and the calculation results are only valid under specific assumptions.
针对不同需求,极限学习机建模方式被广泛应用,例如中国专利《一种基于ELM-CHMM的电源故障预测方法》公开号109615003A,中国专利《一种融合KPCA和ELM的无线传感器网络入侵检测系统及方法》公开号109818798A,由于极限学习机由于收敛速度快和解析表达明确的特点,有效支撑了复杂系统的软件功能模块。To meet different needs, extreme learning machine modeling is widely used, such as the Chinese patent "A power fault prediction method based on ELM-CHMM" publication number 109615003A, and the Chinese patent "A wireless sensor network intrusion detection system and method integrating KPCA and ELM" publication number 109818798A. Due to the characteristics of fast convergence speed and clear analytical expression, the extreme learning machine effectively supports the software functional modules of complex systems.
研究一种基于多源信息的气溶胶光学厚度在线预测方法,不仅能够有效估计光学参数,也能促进大气光通信系统的设计,具有实际应用价值。Studying an online prediction method for aerosol optical depth based on multi-source information can not only effectively estimate optical parameters, but also promote the design of atmospheric optical communication systems, which has practical application value.
发明内容Summary of the invention
本发明提供一种基于多源信息的气溶胶光学厚度在线预测方法,充分考虑了多源数据信息和数据的变化趋势,为多种条件下的光学参数在线估计提供了解决方案。The present invention provides an online prediction method for aerosol optical thickness based on multi-source information, which fully considers the multi-source data information and the change trend of the data, and provides a solution for online estimation of optical parameters under various conditions.
本发明采取的技术方案是,包括下列步骤:The technical solution adopted by the present invention comprises the following steps:
(1)建立多源信息采集系统,用于实现大气环境参数和光学参数的采集及存储;(1) Establish a multi-source information acquisition system to collect and store atmospheric environment parameters and optical parameters;
(2)对大气环境参数进行特征提取,得到风速特征WSD、温度特征TED、湿度特征HUD、天空背景亮度特征LUD,组成输入参数x,气溶胶光学厚度为输出参数y;(2) Extract the characteristics of atmospheric environmental parameters to obtain wind speed characteristics WS D , temperature characteristics TE D , humidity characteristics HU D , and sky background brightness characteristics LU D , which constitute the input parameter x, and the aerosol optical thickness is the output parameter y;
(3)采用数据驱动思想,建立基于动态极限学习机(ELM)的非线性映射模型,确定了输入参数和输出参数的关系,利用人工蜂群算法进行优化,确定了非线性映射模型输入权重和偏置的最优解;(3) Using data-driven thinking, a nonlinear mapping model based on a dynamic extreme learning machine (ELM) was established, the relationship between input parameters and output parameters was determined, and the artificial bee colony algorithm was used for optimization to determine the optimal solution for the input weight and bias of the nonlinear mapping model;
(4)对非线性映射模型进行在线更新,计算实时到达数据片段的真实值和预测值的误差,根据误差定量更新非线性映射模型,解析计算非线性映射模型的输出权重,得到下一个数据片段气溶胶光学厚度预测值,实现气溶胶光学厚度在线预测,通过特征提取、非线性模型、模型更新策略,将多源特征进行融合,实现了气溶胶光学厚度在线预测。(4) The nonlinear mapping model is updated online, the error between the real value and the predicted value of the real-time data segment is calculated, the nonlinear mapping model is quantitatively updated according to the error, the output weight of the nonlinear mapping model is analytically calculated, and the predicted value of the aerosol optical thickness of the next data segment is obtained to realize the online prediction of the aerosol optical thickness. Through feature extraction, nonlinear model and model updating strategy, the multi-source features are integrated to realize the online prediction of the aerosol optical thickness.
本发明所述步骤(1)中多源信息采集系统包括数据采集设备、信息交互终端、数据库,数据采集设备包括激光雷达、地面气象站、背景辐射计,以及对应的数据接口,数据库采用SQL Server存储数据,信息交互终端与数据采集设备进行信息交互,并将采集到的数据信息保存到数据库中;其中地面气象站、背景辐射计和激光雷达用于实现大气环境参数和光学参数的采集,大气环境参数包括地面气象站提供的大气数据风速WS、温度TE、湿度HU,以及背景辐射计采集的天空背景亮度LU,光学参数为气溶胶光学厚度,由激光雷达采集。The multi-source information acquisition system in step (1) of the present invention includes a data acquisition device, an information exchange terminal, and a database. The data acquisition device includes a laser radar, a ground meteorological station, a background radiometer, and a corresponding data interface. The database uses SQL Server to store data. The information exchange terminal exchanges information with the data acquisition device and saves the collected data information in the database. The ground meteorological station, the background radiometer, and the laser radar are used to realize the collection of atmospheric environment parameters and optical parameters. The atmospheric environment parameters include the atmospheric data wind speed WS, temperature TE, humidity HU provided by the ground meteorological station, and the sky background brightness LU collected by the background radiometer. The optical parameter is the aerosol optical thickness, which is collected by the laser radar.
本发明所述步骤(2)中大气环境参数特征提取方法如下:The atmospheric environment parameter feature extraction method in step (2) of the present invention is as follows:
对数据采集系统中存储的大气数据进行主成分分析,包括风速WS、温度TE、湿度HU、天空背景亮度LU,考虑M天的累积效果,第K天风速样本片段WSK写成如下的矩阵形式:The atmospheric data stored in the data acquisition system are subjected to principal component analysis, including wind speed WS, temperature TE, humidity HU, and sky background brightness LU. Considering the cumulative effect of M days, the wind speed sample fragment WS K on the Kth day is written in the following matrix form:
其中,wsKM表示第K个样本片段中第M天的风速值,对风速样本矩阵片段WSK中的每一个风速值进行标准化处理:Among them, ws KM represents the wind speed value of the Mth day in the Kth sample segment, and each wind speed value in the wind speed sample matrix segment WS K is standardized:
其中,ws′km表示第k个样本片段中第m天的风速标准化值,μm、σm分别表示第m天风速样本片段的均值和方差,处理后得到风速标准化矩阵WS′K,并计算风速特征值λWS,1,λWS,2,…λWS,M对应的风速特征向量ηWS,1,ηWS,2,…ηWS,M,将风速特征值从大到小进行排序,计算风速特征值的贡献度,当所选定风速特征值贡献度大于阈值θ1时,确定主成分个数Num1;Wherein, ws′ km represents the normalized wind speed value of the mth day in the kth sample segment, μ m and σ m represent the mean and variance of the wind speed sample segment of the mth day, respectively. After processing, the wind speed normalization matrix WS′ K is obtained, and the wind speed eigenvalues λ WS,1 ,λ WS,2 ,…λ WS,M corresponding to the wind speed eigenvalues η WS,1 ,η WS,2 ,…η WS,M are calculated. The wind speed eigenvalues are sorted from large to small, and the contribution of the wind speed eigenvalues is calculated. When the contribution of the selected wind speed eigenvalue is greater than the threshold θ 1 , the number of principal components Num 1 is determined;
对应的风速特征向量集合即为风速特征 The corresponding wind speed feature vector set is the wind speed feature
考虑M天的累积效果,对第K天温度样本片段TEK进行标准化处理,得到温度标准化矩阵TE′K,计算温度特征值λTE,1,λTE,2,…λTE,M对应的温度特征向量ηTE,1,ηTE,2,…ηTE,M,将温度特征值从大到小进行排序,计算温度特征值的贡献度,当所选定温度特征值贡献度大于阈值θ2时,确定主成分个数Num2;Considering the cumulative effect of M days, the temperature sample fragment TE K of the Kth day is standardized to obtain the temperature standardized matrix TE′ K , and the temperature eigenvalues λ TE,1 ,λ TE,2 ,…λ TE,M corresponding to the temperature eigenvalues η TE,1 ,η TE,2 ,…η TE,M are calculated. The temperature eigenvalues are sorted from large to small, and the contribution of the temperature eigenvalues is calculated. When the contribution of the selected temperature eigenvalue is greater than the threshold θ 2 , the number of principal components Num 2 is determined;
对应的温度特征向量集合即为温度特征 The corresponding temperature feature vector set is the temperature feature
考虑M天的累积效果,对第K天湿度样本片段HUK进行标准化处理,得到湿度标准化矩阵HU′K,计算湿度特征值λHU,1,λHU,2,…λHU,M对应的湿度特征向量ηHU,1,ηHU,2,…ηHU,M,将湿度特征值从大到小进行排序,计算湿度特征值的贡献度,当所选定湿度特征值贡献度大于阈值θ3时,确定主成分个数Num3 Considering the cumulative effect of M days, the humidity sample segment HU K of the Kth day is standardized to obtain the humidity standardized matrix HU′ K , and the humidity eigenvalues λ HU,1 ,λ HU,2 ,…λ HU,M corresponding to the humidity eigenvalues η HU,1 ,η HU,2 ,…η HU,M are calculated. The humidity eigenvalues are sorted from large to small, and the contribution of the humidity eigenvalues is calculated. When the contribution of the selected humidity eigenvalue is greater than the threshold θ 3 , the number of principal components Num 3 is determined.
对应的湿度特征向量集合即为湿度特征 The corresponding humidity feature vector set is the humidity feature
考虑M天的累积效果,对第K天天空背景亮度样本片段LUK进行标准化处理,得到天空背景亮度标准化矩阵LU′K,计算天空背景亮度特征值λLU,1,λLU,2,…λLU,M对应的天空背景亮度特征向量ηLU,1,ηLU,2,…ηLU,M,将天空背景亮度特征值从大到小进行排序,计算天空背景亮度特征值的贡献度,当所选定天空背景亮度特征值贡献度大于阈值θ4时,确定主成分个数Num4;Considering the cumulative effect of M days, the sky background brightness sample fragment LU K of the Kth day is standardized to obtain the sky background brightness standardized matrix LU′ K , the sky background brightness eigenvalues λ LU,1 ,λ LU,2 ,…λ LU,M corresponding to the sky background brightness eigenvalues η LU,1 ,η LU,2 ,…η LU,M are calculated, the sky background brightness eigenvalues are sorted from large to small, the contribution of the sky background brightness eigenvalues is calculated, and when the contribution of the selected sky background brightness eigenvalue is greater than the threshold θ 4 , the number of principal components Num 4 is determined;
对应的天空背景亮度特征向量集合即为天空背景亮度特征 The corresponding sky background brightness feature vector set is the sky background brightness feature
输入参数x即为风速特征TED、温度特征TED、湿度特征HUD、天空背景亮度特征LUD的集合,即x=(WSD,TED,HUD,LUD)。The input parameter x is a set of wind speed feature TE D , temperature feature TE D , humidity feature HU D , and sky background brightness feature LU D , that is, x=(WS D , TE D , HU D , LU D ).
本发明所述步骤(3)中非线性映射模型的构建包含输入层、隐层、输出层的3层神经网络结构,考虑新到达的样本数据,数据采样长度为N,初始阶段,输入层为N个节点,隐层为L个节点,输出层为1个节点,对输入参数x和输出参数y进行采样与合并,得到样本数据xi代表x的第i个输入参数,yi代表y的第i个输出参数,非线性映射模型输入输出关系如下:The construction of the nonlinear mapping model in step (3) of the present invention includes a three-layer neural network structure of an input layer, a hidden layer, and an output layer. Considering the newly arrived sample data, the data sampling length is N. In the initial stage, the input layer has N nodes, the hidden layer has L nodes, and the output layer has 1 node. The input parameter x and the output parameter y are sampled and merged to obtain the sample data. Xi represents the i-th input parameter of x, and Yi represents the i-th output parameter of y. The input-output relationship of the nonlinear mapping model is as follows:
其中,a和b分别为输入权重和偏置,β为输出层权重,L为神经网络隐层节点数目,为激活函数,从整体上考虑节点间的映射关系,根据极限学习机理论,输入层节点和隐层之间的权重和偏置随机选取,输出权重β是整个网络中唯一需要求解的变量,通过Moore-Penrose广义逆计算,公式(7)可以表示成如下的矩阵形式:Among them, a and b are input weight and bias respectively, β is the output layer weight, L is the number of hidden layer nodes of the neural network, is the activation function. Considering the mapping relationship between nodes as a whole, according to the extreme learning machine theory, the weights and biases between the input layer nodes and the hidden layer are randomly selected. The output weight β is the only variable that needs to be solved in the entire network. Through Moore-Penrose generalized inverse calculation, formula (7) can be expressed in the following matrix form:
y=Hβ······························(8)y=Hβ······························(8)
其中,y=[y1,y2,...,yN]T代表了当前片段的气溶胶光学厚度,引入非线性模型中隐层矩阵H进行计算,H为N行L列的随机矩阵,具体表达式为:Among them, y = [y 1 ,y 2 ,...,y N ] T represents the aerosol optical thickness of the current segment. The hidden matrix H is introduced into the nonlinear model for calculation. H is a random matrix with N rows and L columns. The specific expression is:
其中,为激活函数,实现非线性映射,保证了函数在区间内存在导数,为了进一步计算输出权重β,计算H的Moore-Penrose广义逆 in, To activate the function, nonlinear mapping is realized, which ensures that the function has derivatives in the interval. In order to further calculate the output weight β, the Moore-Penrose generalized inverse of H is calculated.
H+=(HTH)-1HT·················(10)H + =(H T H) -1 H T ··················(10)
采用求解Moore-Penrose广义逆的形式,得到输出权重β的最优解β*:By solving the Moore-Penrose generalized inverse, we can obtain the optimal solution β * of the output weight β:
β*=H+y····························(11)β * =H + y·····························(11)
进一步优化输入权重和偏置,对权重和偏置进行字符串编码,得到候选解向量集合E=(a,b),具体形式如下:Further optimize the input weights and biases, encode the weights and biases into strings, and obtain the candidate solution vector set E = (a, b), which is in the following form:
候选向量集合的边界为最大值和最小值的解向量,在这里,表示为Emax={max(e)|e∈E}和Emin={min(e)|e∈E},其中,e表示每个元素都满足边界条件的一组集合,根据人工蜂群算法进行优化,在初始条件下,输入权重、偏置的解向量表示为:The boundaries of the candidate vector set are the maximum and minimum solution vectors, which are expressed here as E max = {max(e)|e∈E} and E min = {min(e)|e∈E}, where e represents a set of elements that satisfy the boundary conditions. According to the artificial bee colony algorithm, the solution vector of the input weight and bias under the initial conditions is expressed as:
Eh,j=Emin,j+ω×(Emax,j-Emin,j)···················(13)E h,j =E min,j +ω×(E max,j -E min,j )···················(13)
Emin,j和Emax,j分别表示第j个输入解向量的最大值和最小值,ω均匀分布于区间[-1,1],进行最优解向量搜索,得到如下的更新策略:E min,j and E max,j represent the maximum and minimum values of the jth input solution vector respectively. ω is evenly distributed in the interval [-1,1]. The optimal solution vector search is performed and the following update strategy is obtained:
其中,Eh,j表示当前的解向量,表示新的可行的解向量,h、j和r均为指示标识,uh,j在区间[-1,1]范围内随机变化,计算适应度,选择最优的解向量:Among them, E h,j represents the current solution vector, Represents a new feasible solution vector, h, j and r are all indicators, u h, j changes randomly in the interval [-1,1], calculates the fitness, and selects the optimal solution vector:
其中,num表示迭代次数,完成迭代后,最优的解向量将被更新,得到最优的输入权重a*、偏置b*,于是,非线性模型的表达式最优解为:Among them, num represents the number of iterations. After the iteration is completed, the optimal solution vector will be updated to obtain the optimal input weight a * and bias b * . Therefore, the optimal solution of the expression of the nonlinear model is:
本发明所述步骤(4)中非线性映射模型更新方法是:The nonlinear mapping model updating method in step (4) of the present invention is:
在初始阶段,得到输出权重新的样本片段到达后,计算实时到达数据片段的真实值和预测值的误差,表示为:In the initial stage, the output weights are obtained New sample snippet After arrival, the error between the true value and the predicted value of the real-time arriving data segment is calculated, expressed as:
其中,N0和N1分别表示当前样本片段的起始位置和截止位置,H1和y1分别表示新到达片段的隐层输出和标签集合,β1表示当前非线性模型输出权重,在当前输出权重的基础上,得到β1表达式为:Among them, N 0 and N 1 represent the starting position and end position of the current sample segment, H 1 and y 1 represent the hidden layer output and label set of the newly arrived segment, β 1 represents the current nonlinear model output weight. Based on the current output weight, the expression of β 1 is:
将上面的表达式分为两部分考虑,引入中间变量根据矩阵乘法结合律,前一部分矩阵的表达式为:Divide the above expression into two parts and introduce intermediate variables According to the associative law of matrix multiplication, the expression of the first part of the matrix is:
在当前的片段状态下,H1和K0均为已知条件,进一步代入到整体的输出层权重表达式中,可以得到模型更新阶段的输出权重β1的表达式:In the current fragment state, H 1 and K 0 are both known conditions. Substituting them into the overall output layer weight expression, we can get the expression of the output weight β 1 in the model update phase:
进一步进行迭代计算,就得到了第(k+1)片段对应的中间变量Kk+1表达式Further iterative calculations yield the intermediate variable K k+1 expression corresponding to the (k+1)th segment:
进一步进行迭代计算,就得到了第(k+1)片段对应的非线性映射模型输出权重βk+1的最优解:Further iterative calculations are performed to obtain the optimal solution for the output weight β k+1 of the nonlinear mapping model corresponding to the (k+1)th segment:
在线更新后的模型表达式为:The model expression after online update is:
将输入参数x代入到更新后的模型,就可以计算得到气溶胶光学厚度y,实现气溶胶光学厚度的在线预测。By substituting the input parameter x into the updated model, the aerosol optical depth y can be calculated, thus realizing the online prediction of aerosol optical depth.
本发明建立了多源数据采集系统,通过信息交互终端,实现了数据采集设备到数据库的信息交互。数据采集设备实现了风速、温度、湿度、天空背景亮度等数据的采集,并通过数据接口传递到信息交互终端,通过数据处理操作,实现数据存储。在多源数据采集系统基础上,构建了非线性映射模型,设计神经网络结构,并实现神经网络权重和偏置的优化,得到气溶胶光学厚度的映射模型。通过实时采集的数据进行非线性模型更新,实现对气溶胶光学厚度的在线预测方法。The present invention establishes a multi-source data acquisition system, and realizes information interaction from the data acquisition device to the database through the information interaction terminal. The data acquisition device realizes the collection of data such as wind speed, temperature, humidity, sky background brightness, etc., and transmits them to the information interaction terminal through the data interface, and realizes data storage through data processing operations. Based on the multi-source data acquisition system, a nonlinear mapping model is constructed, a neural network structure is designed, and the optimization of the neural network weights and biases is realized to obtain a mapping model of aerosol optical thickness. The nonlinear model is updated through the real-time collected data to realize an online prediction method for aerosol optical thickness.
本发明优点是:The advantages of the present invention are:
(1)本发明考虑了多源数据采集系统,存储了环境参数和光学参数,通过数据驱动的思想建立了非线性映射模型,实现了气溶胶光学参数在线预测,突破了物理模型驱动方式造成的精度不足和难以计算的瓶颈。(1) The present invention takes into account a multi-source data acquisition system, stores environmental parameters and optical parameters, establishes a nonlinear mapping model through the idea of data-driven, realizes the online prediction of aerosol optical parameters, and breaks through the bottleneck of insufficient accuracy and difficulty in calculation caused by the physical model-driven method.
(2)通过主成分分析进行特征提取,利用动态极限学习机构建神经网络,优化了权重和偏置,得到了可适应在线环境的输入、输出关系。(2) Feature extraction is performed through principal component analysis, and a neural network is constructed using a dynamic extreme learning machine. The weights and biases are optimized to obtain an input-output relationship that can adapt to the online environment.
(3)依托于多源信息处理方案,根据当前数据片段定量调整模型,避免了模型的重新学习,实现整个预测过程的自动化,为复杂系统智能化处理提供解决方案,对工程实践有一定的普适性。(3) Relying on a multi-source information processing solution, the model is quantitatively adjusted according to the current data fragment, avoiding the need to relearn the model and automating the entire prediction process, providing a solution for the intelligent processing of complex systems and having a certain degree of universality in engineering practice.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本发明多源数据采集系统;FIG1 is a multi-source data acquisition system of the present invention;
图2是本发明非线性映射模型;FIG2 is a nonlinear mapping model of the present invention;
图3是本发明的流程图。FIG. 3 is a flow chart of the present invention.
具体实施方式DETAILED DESCRIPTION
包括下列步骤:The following steps are involved:
(1)建立多源信息采集系统,用于实现大气环境参数和光学参数的采集及存储;(1) Establish a multi-source information acquisition system to collect and store atmospheric environment parameters and optical parameters;
(2)对大气环境参数进行特征提取,得到风速特征WSD、温度特征TED、湿度特征HUD、天空背景亮度特征LUD,组成输入参数x,气溶胶光学厚度为输出参数y;(2) Extract the characteristics of atmospheric environment parameters to obtain wind speed characteristics WS D , temperature characteristics TE D , humidity characteristics HU D , and sky background brightness characteristics LU D , which constitute the input parameter x, and the aerosol optical thickness is the output parameter y;
(3)采用数据驱动思想,建立基于动态极限学习机(ELM)的非线性映射模型,确定了输入参数和输出参数的关系,利用人工蜂群算法进行优化,确定了非线性映射模型输入权重和偏置的最优解;(3) Using data-driven thinking, a nonlinear mapping model based on a dynamic extreme learning machine (ELM) was established, the relationship between input parameters and output parameters was determined, and the artificial bee colony algorithm was used for optimization to determine the optimal solution for the input weight and bias of the nonlinear mapping model;
(4)对非线性映射模型进行在线更新,计算实时到达数据片段的真实值和预测值的误差,根据误差定量更新非线性映射模型,解析计算非线性映射模型的输出权重,得到下一个数据片段气溶胶光学厚度预测值,实现气溶胶光学厚度在线预测,通过特征提取、非线性模型、模型更新策略,将多源特征进行融合,实现了气溶胶光学厚度在线预测。(4) The nonlinear mapping model is updated online, the error between the real value and the predicted value of the real-time data segment is calculated, the nonlinear mapping model is quantitatively updated according to the error, the output weight of the nonlinear mapping model is analytically calculated, and the predicted value of the aerosol optical thickness of the next data segment is obtained to realize the online prediction of aerosol optical thickness. Through feature extraction, nonlinear model and model updating strategy, multi-source features are integrated to realize the online prediction of aerosol optical thickness.
本发明所述步骤(1)中多源信息采集系统包括数据采集设备、信息交互终端、数据库,数据采集设备包括激光雷达、地面气象站、背景辐射计,以及对应的数据接口,数据库采用SQL Server存储数据,信息交互终端与数据采集设备进行信息交互,并将采集到的数据信息保存到数据库中;其中地面气象站、背景辐射计和激光雷达用于实现大气环境参数和光学参数的采集,大气环境参数包括地面气象站提供的大气数据风速WS、温度TE、湿度HU,以及背景辐射计采集的天空背景亮度LU,光学参数为气溶胶光学厚度,由激光雷达采集;The multi-source information acquisition system in step (1) of the present invention includes a data acquisition device, an information exchange terminal, and a database. The data acquisition device includes a laser radar, a ground meteorological station, a background radiometer, and a corresponding data interface. The database uses SQL Server to store data. The information exchange terminal exchanges information with the data acquisition device and saves the collected data information into the database. The ground meteorological station, the background radiometer, and the laser radar are used to collect atmospheric environment parameters and optical parameters. The atmospheric environment parameters include atmospheric data wind speed WS, temperature TE, humidity HU provided by the ground meteorological station, and sky background brightness LU collected by the background radiometer. The optical parameter is aerosol optical thickness, which is collected by the laser radar.
本发明所述步骤(2)中大气环境参数特征提取方法如下:The atmospheric environment parameter feature extraction method in step (2) of the present invention is as follows:
对数据采集系统中存储的大气数据进行主成分分析,包括风速WS、温度TE、湿度HU、天空背景亮度LU,考虑M天的累积效果,第K天风速样本片段WSK写成如下的矩阵形式:The atmospheric data stored in the data acquisition system are subjected to principal component analysis, including wind speed WS, temperature TE, humidity HU, and sky background brightness LU. Considering the cumulative effect of M days, the wind speed sample fragment WS K on the Kth day is written in the following matrix form:
其中,wsKM表示第K个样本片段中第M天的风速值,对风速样本矩阵片段WSK中的每一个风速值进行标准化处理:Among them, ws KM represents the wind speed value of the Mth day in the Kth sample segment, and each wind speed value in the wind speed sample matrix segment WS K is standardized:
其中,ws′km表示第k个样本片段中第m天的风速标准化值,μm、σm分别表示第m天风速样本片段的均值和方差,处理后得到风速标准化矩阵WS′K,并计算风速特征值λWS,1,λWS,2,…λWS,M对应的风速特征向量ηWS,1,ηWS,2,…ηWS,M,将风速特征值从大到小进行排序,计算风速特征值的贡献度,当所选定风速特征值贡献度大于阈值θ1时,确定主成分个数Num1;Wherein, ws′ km represents the normalized wind speed value of the mth day in the kth sample segment, μ m and σ m represent the mean and variance of the wind speed sample segment of the mth day, respectively. After processing, the wind speed normalization matrix WS′ K is obtained, and the wind speed eigenvalues λ WS,1 ,λ WS,2 ,…λ WS,M corresponding to the wind speed eigenvalues η WS,1 ,η WS,2 ,…η WS,M are calculated. The wind speed eigenvalues are sorted from large to small, and the contribution of the wind speed eigenvalues is calculated. When the contribution of the selected wind speed eigenvalue is greater than the threshold θ 1 , the number of principal components Num 1 is determined;
对应的风速特征向量集合即为风速特征 The corresponding wind speed feature vector set is the wind speed feature
考虑M天的累积效果,对第K天温度样本片段TEK进行标准化处理,得到温度标准化矩阵TE′K,计算温度特征值λTE,1,λTE,2,…λTE,M对应的温度特征向量ηTE,1,ηTE,2,…ηTE,M,将温度特征值从大到小进行排序,计算温度特征值的贡献度,当所选定温度特征值贡献度大于阈值θ2时,确定主成分个数Num2;Considering the cumulative effect of M days, the temperature sample fragment TE K of the Kth day is standardized to obtain the temperature standardized matrix TE′ K , and the temperature eigenvalues λ TE,1 ,λ TE,2 ,…λ TE,M corresponding to the temperature eigenvalues η TE,1 ,η TE,2 ,…η TE,M are calculated. The temperature eigenvalues are sorted from large to small, and the contribution of the temperature eigenvalues is calculated. When the contribution of the selected temperature eigenvalue is greater than the threshold θ 2 , the number of principal components Num 2 is determined;
对应的温度特征向量集合即为温度特征 The corresponding temperature feature vector set is the temperature feature
考虑M天的累积效果,对第K天湿度样本片段HUK进行标准化处理,得到湿度标准化矩阵HU′K,计算湿度特征值λHU,1,λHU,2,…λHU,M对应的湿度特征向量ηHU,1,ηHU,2,…ηHU,M,将湿度特征值从大到小进行排序,计算湿度特征值的贡献度,当所选定湿度特征值贡献度大于阈值θ3时,确定主成分个数Num3 Considering the cumulative effect of M days, the humidity sample segment HU K of the Kth day is standardized to obtain the humidity standardized matrix HU′ K , and the humidity eigenvalues λ HU,1 ,λ HU,2 ,…λ HU,M corresponding to the humidity eigenvalues η HU,1 ,η HU,2 ,…η HU,M are calculated. The humidity eigenvalues are sorted from large to small, and the contribution of the humidity eigenvalues is calculated. When the contribution of the selected humidity eigenvalue is greater than the threshold θ 3 , the number of principal components Num 3 is determined.
对应的湿度特征向量集合即为湿度特征 The corresponding humidity feature vector set is the humidity feature
考虑M天的累积效果,对第K天天空背景亮度样本片段LUK进行标准化处理,得到天空背景亮度标准化矩阵LU′K,计算天空背景亮度特征值λLU,1,λLU,2,…λLU,M对应的天空背景亮度特征向量ηLU,1,ηLU,2,…ηLU,M,将天空背景亮度特征值从大到小进行排序,计算天空背景亮度特征值的贡献度,当所选定天空背景亮度特征值贡献度大于阈值θ4时,确定主成分个数Num4;Considering the cumulative effect of M days, the sky background brightness sample fragment LU K of the Kth day is standardized to obtain the sky background brightness standardized matrix LU′ K , the sky background brightness eigenvalues λ LU,1 ,λ LU,2 ,…λ LU,M corresponding to the sky background brightness eigenvalues η LU,1 ,η LU,2 ,…η LU,M are calculated, the sky background brightness eigenvalues are sorted from large to small, the contribution of the sky background brightness eigenvalues is calculated, and when the contribution of the selected sky background brightness eigenvalue is greater than the threshold θ 4 , the number of principal components Num 4 is determined;
对应的天空背景亮度特征向量集合即为天空背景亮度特征 The corresponding sky background brightness feature vector set is the sky background brightness feature
输入参数x即为风速特征TED、温度特征TED、湿度特征HUD、天空背景亮度特征LUD的集合,即x=(WSD,TED,HUD,LUD);The input parameter x is the set of wind speed feature TE D , temperature feature TE D , humidity feature HU D , and sky background brightness feature LU D , that is, x=(WS D , TE D , HU D , LU D );
本发明所述步骤(3)中非线性映射模型的构建包含输入层、隐层、输出层的3层神经网络结构,考虑新到达的样本数据,数据采样长度为N,初始阶段,输入层为N个节点,隐层为L个节点,输出层为1个节点,对输入参数x和输出参数y进行采样与合并,得到样本数据xi代表x的第i个输入参数,yi代表y的第i个输出参数,非线性映射模型输入输出关系如下:The construction of the nonlinear mapping model in step (3) of the present invention includes a three-layer neural network structure of an input layer, a hidden layer, and an output layer. Considering the newly arrived sample data, the data sampling length is N. In the initial stage, the input layer has N nodes, the hidden layer has L nodes, and the output layer has 1 node. The input parameter x and the output parameter y are sampled and merged to obtain the sample data. Xi represents the i-th input parameter of x, and Yi represents the i-th output parameter of y. The input-output relationship of the nonlinear mapping model is as follows:
其中,a和b分别为输入权重和偏置,β为输出层权重,L为神经网络隐层节点数目,为激活函数,从整体上考虑节点间的映射关系,根据极限学习机理论,输入层节点和隐层之间的权重和偏置随机选取,输出权重β是整个网络中唯一需要求解的变量,通过Moore-Penrose广义逆计算,公式(7)可以表示成如下的矩阵形式:Among them, a and b are input weight and bias respectively, β is the output layer weight, L is the number of hidden layer nodes of the neural network, is the activation function. Considering the mapping relationship between nodes as a whole, according to the extreme learning machine theory, the weights and biases between the input layer nodes and the hidden layer are randomly selected. The output weight β is the only variable that needs to be solved in the entire network. Through Moore-Penrose generalized inverse calculation, formula (7) can be expressed in the following matrix form:
y=Hβ·····························(8)y=Hβ·····························(8)
其中,y=[y1,y2,...,yN]T代表了当前片段的气溶胶光学厚度,引入非线性模型中隐层矩阵H进行计算,H为N行L列的随机矩阵,具体表达式为:Among them, y = [y 1 ,y 2 ,...,y N ] T represents the aerosol optical thickness of the current segment. The hidden matrix H is introduced into the nonlinear model for calculation. H is a random matrix with N rows and L columns. The specific expression is:
其中,为激活函数,实现非线性映射,保证了函数在区间内存在导数,为了进一步计算输出权重β,计算H的Moore-Penrose广义逆 in, To activate the function, nonlinear mapping is realized, which ensures that the function has derivatives in the interval. In order to further calculate the output weight β, the Moore-Penrose generalized inverse of H is calculated.
H+=(HTH)-1HT·················(10)H + =(H T H) -1 H T ··················(10)
采用求解Moore-Penrose广义逆的形式,得到输出权重β的最优解β*:By solving the Moore-Penrose generalized inverse, we can obtain the optimal solution β * of the output weight β:
β*=H+y····························(11)β * =H + y·····························(11)
进一步优化输入权重和偏置,对权重和偏置进行字符串编码,得到候选解向量集合E=(a,b),具体形式如下:Further optimize the input weights and biases, encode the weights and biases into strings, and obtain the candidate solution vector set E = (a, b), which is in the following form:
候选向量集合的边界为最大值和最小值的解向量,在这里,表示为Emax={max(e)|e∈E}和Emin={min(e)|e∈E},其中,e表示每个元素都满足边界条件的一组集合,根据人工蜂群算法进行优化,在初始条件下,输入权重、偏置的解向量表示为:The boundaries of the candidate vector set are the maximum and minimum solution vectors, which are expressed here as E max = {max(e)|e∈E} and E min = {min(e)|e∈E}, where e represents a set of elements that satisfy the boundary conditions. According to the artificial bee colony algorithm, the solution vector of the input weight and bias under the initial conditions is expressed as:
Eh,j=Emin,j+ω×(Emax,j-Emin,j)···················(13)E h,j =E min,j +ω×(E max,j -E min,j )···················(13)
Emin,j和Emax,j分别表示第j个输入解向量的最大值和最小值,ω均匀分布于区间[-1,1],进行最优解向量搜索,得到如下的更新策略:E min,j and E max,j represent the maximum and minimum values of the jth input solution vector respectively. ω is evenly distributed in the interval [-1,1]. The optimal solution vector search is performed and the following update strategy is obtained:
其中,Eh,j表示当前的解向量,表示新的可行的解向量,h、j和r均为指示标识。uh,j在区间[-1,1]范围内随机变化。计算适应度,选择最优的解向量:Among them, E h,j represents the current solution vector, represents a new feasible solution vector, h, j and r are all indicators. u h,j changes randomly in the interval [-1,1]. Calculate the fitness and select the optimal solution vector:
其中,num表示迭代次数,完成迭代后,最优的解向量将被更新,得到最优的输入权重a*、偏置b*,于是,非线性模型的表达式最优解为:Among them, num represents the number of iterations. After the iteration is completed, the optimal solution vector will be updated to obtain the optimal input weight a * and bias b * . Therefore, the optimal solution of the expression of the nonlinear model is:
本发明所述步骤(4)中非线性映射模型更新方法是:The nonlinear mapping model updating method in step (4) of the present invention is:
在初始阶段,得到输出权重新的样本片段到达后,计算实时到达数据片段的真实值和预测值的误差,表示为:In the initial stage, the output weights are obtained New sample snippet After arrival, the error between the true value and the predicted value of the real-time arriving data segment is calculated, expressed as:
其中,N0和N1分别表示当前样本片段的起始位置和截止位置,H1和y1分别表示新到达片段的隐层输出和标签集合,β1表示当前非线性模型输出权重,在当前输出权重的基础上,得到β1表达式为:Among them, N 0 and N 1 represent the starting position and end position of the current sample segment, H 1 and y 1 represent the hidden layer output and label set of the newly arrived segment, β 1 represents the current nonlinear model output weight. Based on the current output weight, the expression of β 1 is:
将上面的表达式分为两部分考虑,引入中间变量根据矩阵乘法结合律,前一部分矩阵的表达式为:Divide the above expression into two parts and introduce intermediate variables According to the associative law of matrix multiplication, the expression of the first part of the matrix is:
在当前的片段状态下,H1和K0均为已知条件,进一步代入到整体的输出层权重表达式中,可以得到模型更新阶段的输出权重β1的表达式:In the current fragment state, H 1 and K 0 are both known conditions. Substituting them into the overall output layer weight expression, we can get the expression of the output weight β 1 in the model update phase:
进一步进行迭代计算,就得到了第(k+1)片段对应的中间变量Kk+1表达式Further iterative calculations yield the intermediate variable K k+1 expression corresponding to the (k+1)th segment:
进一步进行迭代计算,就得到了第(k+1)片段对应的非线性映射模型输出权重βk+1的最优解:Further iterative calculations are performed to obtain the optimal solution for the output weight β k+1 of the nonlinear mapping model corresponding to the (k+1)th segment:
在线更新后的模型表达式为:The model expression after online update is:
将输入参数x代入到更新后的模型,就可以计算得到气溶胶光学厚度y,实现气溶胶光学厚度的在线预测。在整个求解过程中,非线性模型不需要重新学习,根据到达的数据就能对模型权重进行更新,实现气溶胶光学厚度在线预测功能。Substituting the input parameter x into the updated model, the aerosol optical thickness y can be calculated, realizing the online prediction of aerosol optical thickness. During the entire solution process, the nonlinear model does not need to be relearned, and the model weights can be updated according to the received data to realize the online prediction function of aerosol optical thickness.
下面结合附图对本发明进行具体说明。The present invention will be described in detail below with reference to the accompanying drawings.
(1)、搭建多源数据采集系统,具体形式如图1所示,激光雷达采集气溶胶光学厚度信息,地面气象站采集风速、温度、湿度信息,背景辐射计采集天空背景亮度信息,信息交互终端为普通计算机,可以与数据采集设备实现信息交互。数据库中加载了SQLServer软件,通过维护数据表实现数据存储功能;(1) Build a multi-source data acquisition system. The specific form is shown in Figure 1. The laser radar collects aerosol optical thickness information, the ground meteorological station collects wind speed, temperature, and humidity information, and the background radiometer collects sky background brightness information. The information exchange terminal is an ordinary computer that can realize information exchange with the data acquisition equipment. SQLServer software is loaded in the database, and the data storage function is realized by maintaining the data table;
激光雷达通过RJ45网络接口与信息交互终端相连,背景辐射计和地面气象站采用RS232接口与信息交互终端进行串口通信,在信息采集的过程中,信息交互终端向数据采集设备发送采集命令,数据采集设备通过数据接口将数据信息传递到信息交互终端,实现信息交互,The laser radar is connected to the information interaction terminal through the RJ45 network interface. The background radiometer and the ground meteorological station use the RS232 interface to communicate with the information interaction terminal. During the information collection process, the information interaction terminal sends a collection command to the data collection device, and the data collection device transmits the data information to the information interaction terminal through the data interface to realize information interaction.
(2)、实现特征提取。将采集到的数据进行整理,通过主成分分析分别考虑风速、温度、湿度、天空背景亮度的累积效果,以1天为数据采集周期,每间隔半小时为1个采样点,每天为48个样本向量,考虑当前时间节点前7天的数据,形成48行7列的样本片段矩阵;(2) Feature extraction. The collected data is sorted and the cumulative effects of wind speed, temperature, humidity, and sky background brightness are considered through principal component analysis. The data collection period is one day, and each half-hour interval is one sampling point. There are 48 sample vectors per day. Considering the data of the 7 days before the current time node, a sample segment matrix with 48 rows and 7 columns is formed;
对数据采集系统中存储的大气数据进行主成分分析,包括风速WS、温度TE、湿度HU、天空背景亮度LU,考虑M天的累积效果,第K天风速样本片段WSK可以写成如下的矩阵形式:The principal component analysis of the atmospheric data stored in the data acquisition system, including wind speed WS, temperature TE, humidity HU, and sky background brightness LU, is performed. Considering the cumulative effect of M days, the wind speed sample fragment WS K on the Kth day can be written in the following matrix form:
其中,wsKM表示第K个样本片段中第M天的风速值,对风速样本矩阵片段WSK中的每一个风速值进行标准化处理:Among them, ws KM represents the wind speed value of the Mth day in the Kth sample segment, and each wind speed value in the wind speed sample matrix segment WS K is standardized:
其中,ws′km表示第k个样本片段中第m天的风速标准化值,μm、σm分别表示第m天风速样本片段的均值和方差,处理后得到风速标准化矩阵WS′K,并计算风速特征值λWS,1,λWS,2,…λWS,M对应的风速特征向量ηWS,1,ηWS,2,…ηWS,M,将风速特征值从大到小进行排序,计算风速特征值的贡献度,当所选定风速特征值贡献度大于阈值θ1=0.90时,确定主成分个数Num1;Wherein, ws′ km represents the normalized wind speed value of the mth day in the kth sample segment, μ m and σ m represent the mean and variance of the wind speed sample segment of the mth day, respectively. After processing, the wind speed normalized matrix WS′ K is obtained, and the wind speed characteristic vectors η WS,1 ,η WS,2 ,…η WS,M corresponding to the wind speed characteristic values λ WS,1 ,λ WS,2 ,…λ WS,M are calculated. The wind speed characteristic values are sorted from large to small, and the contribution of the wind speed characteristic values is calculated. When the contribution of the selected wind speed characteristic value is greater than the threshold value θ 1 =0.90, the number of principal components Num 1 is determined;
对应的风速特征向量集合即为风速特征 The corresponding wind speed feature vector set is the wind speed feature
考虑M天的累积效果,对第K天温度样本片段TEK进行标准化处理,得到温度标准化矩阵TE′K,计算温度特征值λTE,1,λTE,2,…λTE,M对应的温度特征向量ηTE,1,ηTE,2,…ηTE,M,将温度特征值从大到小进行排序,计算温度特征值的贡献度,当所选定温度特征值贡献度大于阈值θ2=0.85时,确定主成分个数Num2 Considering the cumulative effect of M days, the temperature sample fragment TE K of the Kth day is standardized to obtain the temperature standardized matrix TE′ K , and the temperature eigenvalues λ TE,1 ,λ TE,2 ,…λ TE,M corresponding to the temperature eigenvalues η TE,1 ,η TE,2 ,…η TE,M are calculated. The temperature eigenvalues are sorted from large to small, and the contribution of the temperature eigenvalues is calculated. When the contribution of the selected temperature eigenvalue is greater than the threshold value θ 2 =0.85, the number of principal components Num 2 is determined.
对应的温度特征向量集合即为温度特征 The corresponding temperature feature vector set is the temperature feature
考虑M天的累积效果,对第K天湿度样本片段HUK进行标准化处理,得到湿度标准化矩阵HU′K,计算湿度特征值λHU,1,λHU,2,…λHU,M对应的湿度特征向量ηHU,1,ηHU,2,…ηHU,M,将湿度特征值从大到小进行排序,计算湿度特征值的贡献度,当所选定湿度特征值贡献度大于阈值θ3=0.85时,确定主成分个数Num3 Considering the cumulative effect of M days, the humidity sample segment HU K of the Kth day is standardized to obtain the humidity standardized matrix HU′ K , and the humidity eigenvalues λ HU,1 ,λ HU,2 ,…λ HU,M corresponding to the humidity eigenvalues η HU,1 ,η HU,2 ,…η HU,M are calculated. The humidity eigenvalues are sorted from large to small, and the contribution of the humidity eigenvalues is calculated. When the contribution of the selected humidity eigenvalue is greater than the threshold value θ 3 =0.85, the number of principal components Num 3 is determined.
对应的湿度特征向量集合即为湿度特征 The corresponding humidity feature vector set is the humidity feature
考虑M天的累积效果,对第K天天空背景亮度样本片段LUK进行标准化处理,得到天空背景亮度标准化矩阵LU′K,计算天空背景亮度特征值λLU,1,λLU,2,…λLU,M对应的天空背景亮度特征向量ηLU,1,ηLU,2,…ηLU,M,将天空背景亮度特征值从大到小进行排序,计算天空背景亮度特征值的贡献度,当所选定天空背景亮度特征值贡献度大于阈值θ4=0.80时,确定主成分个数Num4;Considering the cumulative effect of M days, the sky background brightness sample fragment LU K of the Kth day is standardized to obtain the sky background brightness standardized matrix LU′ K , and the sky background brightness eigenvalues λ LU,1 ,λ LU,2 ,…λ LU,M corresponding to the sky background brightness eigenvalues η LU,1 ,η LU,2 ,…η LU,M are calculated. The sky background brightness eigenvalues are sorted from large to small, and the contribution of the sky background brightness eigenvalues is calculated. When the contribution of the selected sky background brightness eigenvalue is greater than the threshold value θ 4 =0.80, the number of principal components Num 4 is determined;
对应的天空背景亮度特征向量集合即为天空背景亮度特征 The corresponding sky background brightness feature vector set is the sky background brightness feature
输入参数x即为风速特征TED、温度特征TED、湿度特征HUD、天空背景亮度特征LUD的集合,即x=(WSD,TED,HUD,LUD);The input parameter x is the set of wind speed feature TE D , temperature feature TE D , humidity feature HU D , and sky background brightness feature LU D , that is, x=(WS D , TE D , HU D , LU D );
(3)采用数据驱动思想,建立基于动态极限学习机(ELM)的非线性映射模型,确定了输入参数和输出参数的关系,利用人工蜂群算法进行优化,确定了非线性映射模型输入权重和偏置的最优解;其中:(3) Using data-driven thinking, a nonlinear mapping model based on a dynamic extreme learning machine (ELM) was established, the relationship between input parameters and output parameters was determined, and the artificial bee colony algorithm was used for optimization to determine the optimal solution for the input weight and bias of the nonlinear mapping model;
非线性映射模型的构建包含输入层、隐层、输出层的3层神经网络结构,考虑新到达的样本数据,数据采样长度为48,初始阶段,输入层为48个节点,隐层为50个节点,输出层为1个节点,其中,输入层节点和隐层之间为全相关连接,隐层和输出层之间为全相关连接,输入层权重为48行50列的矩阵,偏置为1行50列的矩阵,对输入参数x和输出参数y进行采样与合并,得到样本数据xi代表x的第i个输入参数,yi代表y的第i个输出参数,非线性映射模型输入输出关系如下:The construction of the nonlinear mapping model includes a three-layer neural network structure of input layer, hidden layer, and output layer. Considering the newly arrived sample data, the data sampling length is 48. In the initial stage, the input layer has 48 nodes, the hidden layer has 50 nodes, and the output layer has 1 node. Among them, the input layer nodes and the hidden layer are fully connected, and the hidden layer and the output layer are fully connected. The input layer weight is a matrix of 48 rows and 50 columns, and the bias is a matrix of 1 row and 50 columns. The input parameters x and output parameters y are sampled and merged to obtain sample data. Xi represents the i-th input parameter of x, and Yi represents the i-th output parameter of y. The input-output relationship of the nonlinear mapping model is as follows:
其中,a和b分别为输入权重和偏置,β为输出层权重,L为神经网络隐层节点数目,为激活函数,从整体上考虑节点间的映射关系,根据极限学习机理论,输入层节点和隐层之间的权重和偏置随机选取,输出权重β是整个网络中唯一需要求解的变量,通过Moore-Penrose广义逆计算,公式(7)可以表示成如下的矩阵形式:Among them, a and b are input weight and bias respectively, β is the output layer weight, L is the number of hidden layer nodes of the neural network, is the activation function. Considering the mapping relationship between nodes as a whole, according to the extreme learning machine theory, the weights and biases between the input layer nodes and the hidden layer are randomly selected. The output weight β is the only variable that needs to be solved in the entire network. Through Moore-Penrose generalized inverse calculation, formula (7) can be expressed in the following matrix form:
y=Hβ·····················(8)y=Hβ·····················(8)
其中,y=[y1,y2,...,yN]T代表了当前片段的气溶胶光学厚度,引入非线性模型中隐层矩阵H进行计算,H为48行50列的随机矩阵,具体表达式为:Among them, y = [y 1 ,y 2 ,...,y N ] T represents the aerosol optical thickness of the current segment. The hidden matrix H is introduced into the nonlinear model for calculation. H is a random matrix with 48 rows and 50 columns. The specific expression is:
其中,非线性模型的输入参数可以表示为x=[x1,x2,...,xN]T,融合了多个特征;Among them, the input parameters of the nonlinear model can be expressed as x = [x 1 , x 2 , ..., x N ] T , which integrates multiple features;
计算输出权重。输入层节点和隐层之间的权重和偏置随机选取,隐层和输出层之间的权重通过M-P广义逆计算。通过M-P广义逆得到输出权重,隐层矩阵H的广义逆 Calculate the output weights. The weights and biases between the input layer nodes and the hidden layer are randomly selected, and the weights between the hidden layer and the output layer are calculated using the MP generalized inverse. The output weights are obtained using the MP generalized inverse, and the generalized inverse of the hidden layer matrix H is
H+=(HTH)-1HT·········(10)H + =(H T H) -1 H T ··········(10)
满足误差最小的条件,β是整个网络中唯一需要求解的变量,y=[y1,y2,...,yN]T代表了当前片段的气溶胶光学厚度值。通过M-P广义逆,得到输出权重的表达式;To meet the minimum error condition, β is the only variable that needs to be solved in the entire network, and y = [y 1 ,y 2 ,...,y N ] T represents the aerosol optical thickness value of the current segment. Through the MP generalized inverse, the expression of the output weight is obtained;
β*=H+y····························(11)β * =H + y·····························(11)
进一步对权重和偏置进行优化,将权重和偏置进行字符串编码,得到候选向量集合,具体形式如下:The weights and biases are further optimized, and the weights and biases are string-encoded to obtain a set of candidate vectors, which are in the following form:
候选向量集合的边界为最大值和最小值的解向量,e表示每个元素都满足边界条件的一组集合,Emax={max(e)|e∈E}和Emin={min(e)|e∈E}。根据人工蜂群算法进行优化,人工蜂群算法类比蜜蜂进行食物查找,实现多参数优化,主要考虑三种形式,包括通过雇佣蜂、跟随蜂和侦查蜂,三种蜜蜂写作完成信息搜索任务。在进行最优的权重和偏置求解的过程中,雇佣蜂首先进行信息的查找,并将查找的结果进行反馈,跟随蜂接收到反馈回来的信息,然后继续在最优向量周围搜索,如果发现更优解,则进行更新;否则,放弃当前解,侦查蜂继续寻找。最终雇佣蜂实现最优解附近局部搜索,侦查蜂完成全局搜索,整个算法收敛,通过优化得到非线性映射模型权重和偏置的最优解向量;The boundaries of the candidate vector set are the solution vectors of the maximum and minimum values, e represents a set of sets whose elements satisfy the boundary conditions, E max = {max(e)|e∈E} and E min = {min(e)|e∈E}. The optimization is performed according to the artificial bee colony algorithm. The artificial bee colony algorithm is analogous to bees searching for food to achieve multi-parameter optimization. Three forms are mainly considered, including employed bees, follower bees and scout bees. The three types of bees complete the information search task. In the process of solving the optimal weights and biases, the employed bees first search for information and feedback the results of the search. The follower bees receive the feedback information and continue to search around the optimal vector. If a better solution is found, it is updated; otherwise, the current solution is abandoned and the scout bees continue to search. Finally, the employed bees realize the local search near the optimal solution, the scout bees complete the global search, the entire algorithm converges, and the optimal solution vector of the weights and bias of the nonlinear mapping model is obtained through optimization;
在得到第j个输入解向量的最大值和最小值Emin,j和Emax,j的基础上,计算初始条件下的边界组合,权重解向量表示为:Based on the maximum and minimum values E min,j and E max,j of the jth input solution vector, the boundary combination under the initial conditions is calculated, and the weighted solution vector is expressed as:
Eh,j=Emin,j+ω×(Emax,j-Emin,j)··················(13)E h,j =E min,j +ω×(E max,j -E min,j )···················(13)
确定更新参数,此时得到在区间[-1,1]范围内随机变化的uh,j,然后通过最优向量搜索,得到候选向量更新策略:Determine the update parameters, and get u h,j that changes randomly in the interval [-1,1]. Then, through the optimal vector search, get the candidate vector update strategy:
通过随机变量uh,j的迭代,实现不同权重和偏置条件下适应度的计算:Through the iteration of random variables u h,j , the fitness calculation under different weights and bias conditions is realized:
在这里,迭代次数num为50,完成迭代后,最优的解向量将被更新,得到最优的输入权重a*、偏置b*,综合神经网络各层的权重和偏置,得到输入参数和输出参数的关系模型:Here, the number of iterations num is 50. After the iteration is completed, the optimal solution vector will be updated to obtain the optimal input weight a * and bias b * . By integrating the weights and biases of each layer of the neural network, the relationship model between the input parameters and the output parameters is obtained:
初始阶段神经网络输入层为48个节点,隐层为50个节点,输出层为1个节点。确定非线性映射模型的所有参数,就得到了输入参数和输出参数关系,即风速、温度、湿度、天空背景亮度与气溶胶光学厚度的解析关系,初始阶段得到的输出权重表示为: In the initial stage, the neural network input layer has 48 nodes, the hidden layer has 50 nodes, and the output layer has 1 node. After determining all the parameters of the nonlinear mapping model, the relationship between the input parameters and the output parameters is obtained, that is, the analytical relationship between wind speed, temperature, humidity, sky background brightness and aerosol optical thickness. The output weight obtained in the initial stage is expressed as:
(4)、进行模型更新,新的样本片段到达后,根据采集到的数据片段计算样本的近似误差,目标函数表示为:(4) Update the model and add new sample segments After arrival, the approximate error of the sample is calculated based on the collected data fragments, and the objective function is expressed as:
N0和N1分别表示当前样本片段的起始位置和截止位置,H1和Y1分别表示新到达片段的隐层输出和标签集合,β1表表示当前非线性模型权重。在当前输出权重的基础上,得到输出层权重表达式为:N 0 and N 1 represent the starting position and end position of the current sample segment, respectively. H 1 and Y 1 represent the hidden layer output and label set of the newly arrived segment, respectively. β 1 represents the current nonlinear model weight. Based on the current output weight, the output layer weight expression is:
分别考虑表达式的前后两部分,引入中间变量通过矩阵乘法结合律,计算前一部分矩阵的表达式为:Consider the front and back parts of the expression separately and introduce intermediate variables By using the associative law of matrix multiplication, the expression for calculating the first part of the matrix is:
考虑β1表达式的后一部分,利用矩阵乘法原理进行展开,得到:Consider the latter part of the expression of β 1 and expand it using the principle of matrix multiplication to obtain:
在当前输出权重的基础上,将新的输入参数代入到非线性模型中,通过矩阵变换得到输出权重的表达式,在当前时刻,H1和K0均为已知条件,计算模型更新阶段的输出权重为:On the basis of the current output weight, the new input parameters are substituted into the nonlinear model, and the expression of the output weight is obtained through matrix transformation. At the current moment, H1 and K0 are both known conditions, and the output weight of the calculation model update stage is:
在计算过程中,引入中间变量随机矩阵K,实现当前片段输入参数和输出参数的映射表达,进一步得到了每一次迭代之后,第(k+1)片段的表达式:During the calculation process, the intermediate variable random matrix K is introduced to realize the mapping expression of the input parameters and output parameters of the current segment, and the expression of the (k+1)th segment after each iteration is further obtained:
计算气溶胶光学厚度,将新片段的输入参数信息代入到非线性模型中,通过神经网络计算,得到输出参数数值;Calculate the aerosol optical thickness, substitute the input parameter information of the new segment into the nonlinear model, and obtain the output parameter value through neural network calculation;
最后通过反归一化,就得到了最终气溶胶光学厚度的预测结果。Finally, through inverse normalization, the final prediction result of aerosol optical thickness is obtained.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110898847.6A CN113657028B (en) | 2021-08-05 | 2021-08-05 | Online aerosol optical thickness prediction method based on multi-source information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110898847.6A CN113657028B (en) | 2021-08-05 | 2021-08-05 | Online aerosol optical thickness prediction method based on multi-source information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113657028A CN113657028A (en) | 2021-11-16 |
CN113657028B true CN113657028B (en) | 2023-10-20 |
Family
ID=78490420
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110898847.6A Active CN113657028B (en) | 2021-08-05 | 2021-08-05 | Online aerosol optical thickness prediction method based on multi-source information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113657028B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114279915B (en) * | 2021-12-24 | 2024-08-27 | 青岛镭测创芯科技有限公司 | Atmospheric particulate concentration inversion method and related components |
CN115081557A (en) * | 2022-08-22 | 2022-09-20 | 北华航天工业学院 | Night aerosol optical thickness estimation method and system based on ground monitoring data |
CN116843456B (en) * | 2023-08-29 | 2023-11-07 | 北京燕知信科技服务有限公司 | Financial big data processing method and system based on artificial intelligence |
CN117350440B (en) * | 2023-12-04 | 2024-07-12 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Space-time prediction model and method for optical thickness of regional aerosol |
CN118211052B (en) * | 2024-04-11 | 2024-11-01 | 中国华云气象科技集团有限公司 | Multi-parameter meteorological data analysis and prediction method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106933585A (en) * | 2017-03-07 | 2017-07-07 | 吉林大学 | A kind of self-adapting multi-channel interface system of selection under distributed cloud environment |
CN111861784A (en) * | 2020-06-01 | 2020-10-30 | 大唐东北电力试验研究院有限公司 | Photovoltaic power generation short-term prediction method based on artificial bee colony optimization neural network |
CN111859800A (en) * | 2020-07-15 | 2020-10-30 | 河海大学 | A method for spatiotemporal estimation and prediction of PM2.5 concentration distribution |
-
2021
- 2021-08-05 CN CN202110898847.6A patent/CN113657028B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106933585A (en) * | 2017-03-07 | 2017-07-07 | 吉林大学 | A kind of self-adapting multi-channel interface system of selection under distributed cloud environment |
CN111861784A (en) * | 2020-06-01 | 2020-10-30 | 大唐东北电力试验研究院有限公司 | Photovoltaic power generation short-term prediction method based on artificial bee colony optimization neural network |
CN111859800A (en) * | 2020-07-15 | 2020-10-30 | 河海大学 | A method for spatiotemporal estimation and prediction of PM2.5 concentration distribution |
Non-Patent Citations (2)
Title |
---|
Application of an improved artificial bee colony algorithm to inverse problem of aerosol optical constants from spectral measurement data;Zhenzong He et al.,;Optik;316-329 * |
无线光通信中的大气影响机理及抑制技术研究;陈纯毅;中国博士学位论文全文数据库 信息科技辑;I136-42 * |
Also Published As
Publication number | Publication date |
---|---|
CN113657028A (en) | 2021-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113657028B (en) | Online aerosol optical thickness prediction method based on multi-source information | |
CN112561191A (en) | Prediction model training method, prediction method, device, apparatus, program, and medium | |
CN112464567B (en) | Intelligent data assimilation method based on variational assimilation framework | |
CN113240170A (en) | Air quality prediction method based on seasonal cyclic neural network | |
CN110070224A (en) | A kind of Air Quality Forecast method based on multi-step recursive prediction | |
CN111914488B (en) | Data area hydrologic parameter calibration method based on antagonistic neural network | |
CN110543978A (en) | Traffic flow data prediction method and device based on wavelet neural network | |
CN107703554A (en) | The warm and humid profile Inversion System of multichannel millimeter wave radiometer and its inversion method | |
CN114694379B (en) | Traffic flow prediction method and system based on self-adaptive dynamic graph convolution | |
CN105787184A (en) | Atmospheric aerosol optical depth estimation method based on PM2.5 | |
CN118316021B (en) | Distributed photovoltaic power prediction method and system based on AP clustering and transfer learning | |
CN112766603A (en) | Traffic flow prediction method, system, computer device and storage medium | |
CN114219127A (en) | Photovoltaic output probability distribution prediction method based on Bayes-long and short-term memory neural network | |
CN112446550B (en) | Short-term building load probability density prediction method | |
CN109214591A (en) | A kind of xylophyta ground biomass prediction technique and system | |
CN115333621B (en) | A spot centroid prediction method based on fusion of spatio-temporal features in a distributed framework | |
CN114692971B (en) | Crop yield prediction method and device based on yield difference | |
CN119809419A (en) | An intelligent evaluation method for unmanned equipment based on GAT and MLP | |
CN118172674B (en) | Remote sensing estimation method and system for reset value of residential building | |
CN118095570B (en) | Intelligent load forecasting method, system, electronic equipment, medium and chip for substation area | |
CN112561203A (en) | Method and system for realizing water level early warning based on clustering and GRU | |
CN118017474A (en) | Wind power prediction method considering wind power plant space-time correlation and prediction error distribution characteristics | |
CN118397826A (en) | Traffic flow prediction method based on graph neural network and distribution characterization matching | |
Peng et al. | Meteorological satellite operation prediction using a BiLSTM deep learning model | |
CN113642785B (en) | Method, system and equipment for long-term prediction of space debris track based on priori information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |