CN113657028B

CN113657028B - Online aerosol optical thickness prediction method based on multi-source information

Info

Publication number: CN113657028B
Application number: CN202110898847.6A
Authority: CN
Inventors: 于海洋; 陈纯毅; 胡小娟; 张丽媛
Original assignee: Changchun University of Science and Technology
Current assignee: Changchun University of Science and Technology
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2023-10-20
Anticipated expiration: 2041-08-05
Also published as: CN113657028A

Abstract

The invention relates to an aerosol optical thickness online prediction method based on multi-source information, and belongs to the technical field of atmospheric channel optical signal transmission. The method comprises the steps of establishing a multi-source data acquisition system, carrying out feature extraction on atmospheric environment parameters, constructing a nonlinear mapping model based on a dynamic extreme learning machine, carrying out feature extraction through principal component analysis, and realizing fusion of features such as wind speed, temperature, humidity, sky background brightness and the like, wherein the output parameters of the model are aerosol optical thickness. The method has the advantages that the parameter estimation is carried out by using an artificial bee colony algorithm, so that the optimal selection of the weight and the bias is realized; according to the data fragments which arrive in real time, a nonlinear mapping model is updated, an online aerosol optical thickness prediction function is realized, a single hidden layer neural network based on a dynamic extreme learning machine is formed, the bottleneck of insufficient precision and difficult calculation caused by a physical model driving mode is broken through, and a solution is provided for constructing online estimation of optical parameters under various conditions.

Description

An online prediction method for aerosol optical depth based on multi-source information

技术领域Technical Field

本发明属于大气信道光信号传输技术领域，具体涉及一种基于多源信息的气溶胶光学厚度在线预测方法。The invention belongs to the technical field of atmospheric channel optical signal transmission, and in particular relates to an online prediction method for aerosol optical thickness based on multi-source information.

背景技术Background Art

光学参数估计是光信息传输的关键要素，在大气信道中，由于受到湍流的影响，往往造成光斑偏移、到达角起伏现象，给光学参数估计带来困难。气溶胶光学厚度是判断光学参数的重要支撑，对于确定光学参数数值和变化规律具有重要意义。通过物理方法进行模型建立和反演计算，会造成难以建模或无法求解的瓶颈，影响光学参数估计精度，如何通过数据驱动的方式进行气溶胶光学厚度(Aerosol Optical Depth，AOD)在线预测，对于准确估计光学参数，设计光学通信系统有重要意义。Optical parameter estimation is a key element in optical information transmission. In atmospheric channels, due to the influence of turbulence, the light spot is often offset and the arrival angle fluctuates, which makes it difficult to estimate the optical parameters. Aerosol optical depth is an important support for judging optical parameters, and is of great significance for determining the values and changing laws of optical parameters. Model establishment and inversion calculations using physical methods will cause bottlenecks that are difficult to model or cannot be solved, affecting the accuracy of optical parameter estimation. How to predict aerosol optical depth (AOD) online in a data-driven way is of great significance for accurately estimating optical parameters and designing optical communication systems.

中国专利《基于气溶胶地基数据的AOD垂直订正效果评价方法及系统》公开号106446307A，提出了一种气溶胶数据计算方法，包括估算近地面气溶胶消光系数和基于能见度仪观测数据反演气溶胶消光系数，利用模拟方法对AOD进行垂直修正，但是并没有考虑不同天气对于气溶胶厚度的影响，无法对AOD的变化趋势进行预测。Chinese patent "AOD vertical correction effect evaluation method and system based on aerosol ground-based data" publication number 106446307A proposes a method for calculating aerosol data, including estimating the near-ground aerosol extinction coefficient and inverting the aerosol extinction coefficient based on visibility meter observation data, and using a simulation method to vertically correct the AOD. However, it does not consider the impact of different weather conditions on aerosol thickness, and is unable to predict the changing trend of AOD.

中国专利《基于逆方差加权平均的卫星AOD数据的融合方法及系统》公开号106407634A，提供基于逆方差加权平均的卫星AOD数据的融合方法及系统。通过该发明能够提高卫星气溶胶厚度数据覆盖率和融合精度要求。中国专利《一种融合多源特征地理参数的卫星AOD产品校正方法》公开号109213964A，公开了一种气溶胶校正方法，通过随机森林和校正阈值，修正卫星测量的参数，为研究大气PM2.5有效浓度提供参考。但是这两个方法难以满足在线变化的条件，对于阈值的选取有很强的依赖性。Chinese patent "Method and system for fusion of satellite AOD data based on inverse variance weighted average" Publication No. 106407634A provides a method and system for fusion of satellite AOD data based on inverse variance weighted average. This invention can improve the coverage rate and fusion accuracy requirements of satellite aerosol thickness data. Chinese patent "A method for correcting satellite AOD products by integrating multi-source characteristic geographic parameters" Publication No. 109213964A discloses an aerosol correction method, which corrects the parameters measured by satellite through random forests and correction thresholds to provide a reference for studying the effective concentration of atmospheric PM2.5. However, these two methods are difficult to meet the conditions of online changes and have a strong dependence on the selection of thresholds.

中国专利《基于PM2.5和PM10的气溶胶光学厚度估计方法》公开号106096246A，提供了一种气溶胶估算方法，通过万有引力神经网络模型进行估算，解决反演精度不高，难以实时获取的问题，然而该方法并没有考虑测量数据变化和模型更新的关系，无法满足模型自动调整。中国专利《一种气溶胶光学厚度反演方法》公开号110186823A，通过历史数据建立双向反射率分布函数参数数据库，通过查表法，得到对应的气溶胶光学厚度值，但是这种方法无法计算模型的泛化性能，计算结果只在特定的假设条件下有效。China's patent publication number 106096246A, "Aerosol optical thickness estimation method based on PM2.5 and PM10", provides an aerosol estimation method that estimates through a gravitational neural network model to solve the problem of low inversion accuracy and difficulty in real-time acquisition. However, this method does not consider the relationship between measurement data changes and model updates, and cannot meet the automatic adjustment of the model. China's patent publication number 110186823A, "A method for inverting aerosol optical thickness", establishes a bidirectional reflectivity distribution function parameter database through historical data, and obtains the corresponding aerosol optical thickness value through a table lookup method, but this method cannot calculate the generalization performance of the model, and the calculation results are only valid under specific assumptions.

针对不同需求，极限学习机建模方式被广泛应用，例如中国专利《一种基于ELM-CHMM的电源故障预测方法》公开号109615003A，中国专利《一种融合KPCA和ELM的无线传感器网络入侵检测系统及方法》公开号109818798A，由于极限学习机由于收敛速度快和解析表达明确的特点，有效支撑了复杂系统的软件功能模块。To meet different needs, extreme learning machine modeling is widely used, such as the Chinese patent "A power fault prediction method based on ELM-CHMM" publication number 109615003A, and the Chinese patent "A wireless sensor network intrusion detection system and method integrating KPCA and ELM" publication number 109818798A. Due to the characteristics of fast convergence speed and clear analytical expression, the extreme learning machine effectively supports the software functional modules of complex systems.

研究一种基于多源信息的气溶胶光学厚度在线预测方法，不仅能够有效估计光学参数，也能促进大气光通信系统的设计，具有实际应用价值。Studying an online prediction method for aerosol optical depth based on multi-source information can not only effectively estimate optical parameters, but also promote the design of atmospheric optical communication systems, which has practical application value.

发明内容Summary of the invention

本发明提供一种基于多源信息的气溶胶光学厚度在线预测方法，充分考虑了多源数据信息和数据的变化趋势，为多种条件下的光学参数在线估计提供了解决方案。The present invention provides an online prediction method for aerosol optical thickness based on multi-source information, which fully considers the multi-source data information and the change trend of the data, and provides a solution for online estimation of optical parameters under various conditions.

本发明采取的技术方案是，包括下列步骤：The technical solution adopted by the present invention comprises the following steps:

(1)建立多源信息采集系统，用于实现大气环境参数和光学参数的采集及存储；(1) Establish a multi-source information acquisition system to collect and store atmospheric environment parameters and optical parameters;

(2)对大气环境参数进行特征提取，得到风速特征WS_D、温度特征TE_D、湿度特征HU_D、天空背景亮度特征LU_D，组成输入参数x，气溶胶光学厚度为输出参数y；(2) Extract the characteristics of atmospheric environmental parameters to obtain wind speed characteristics WS _D , temperature characteristics TE _D , humidity characteristics HU _D , and sky background brightness characteristics LU _D , which constitute the input parameter x, and the aerosol optical thickness is the output parameter y;

(3)采用数据驱动思想，建立基于动态极限学习机(ELM)的非线性映射模型，确定了输入参数和输出参数的关系，利用人工蜂群算法进行优化，确定了非线性映射模型输入权重和偏置的最优解；(3) Using data-driven thinking, a nonlinear mapping model based on a dynamic extreme learning machine (ELM) was established, the relationship between input parameters and output parameters was determined, and the artificial bee colony algorithm was used for optimization to determine the optimal solution for the input weight and bias of the nonlinear mapping model;

(4)对非线性映射模型进行在线更新，计算实时到达数据片段的真实值和预测值的误差，根据误差定量更新非线性映射模型，解析计算非线性映射模型的输出权重，得到下一个数据片段气溶胶光学厚度预测值，实现气溶胶光学厚度在线预测，通过特征提取、非线性模型、模型更新策略，将多源特征进行融合，实现了气溶胶光学厚度在线预测。(4) The nonlinear mapping model is updated online, the error between the real value and the predicted value of the real-time data segment is calculated, the nonlinear mapping model is quantitatively updated according to the error, the output weight of the nonlinear mapping model is analytically calculated, and the predicted value of the aerosol optical thickness of the next data segment is obtained to realize the online prediction of the aerosol optical thickness. Through feature extraction, nonlinear model and model updating strategy, the multi-source features are integrated to realize the online prediction of the aerosol optical thickness.

本发明所述步骤(1)中多源信息采集系统包括数据采集设备、信息交互终端、数据库，数据采集设备包括激光雷达、地面气象站、背景辐射计，以及对应的数据接口，数据库采用SQL Server存储数据，信息交互终端与数据采集设备进行信息交互，并将采集到的数据信息保存到数据库中；其中地面气象站、背景辐射计和激光雷达用于实现大气环境参数和光学参数的采集，大气环境参数包括地面气象站提供的大气数据风速WS、温度TE、湿度HU，以及背景辐射计采集的天空背景亮度LU，光学参数为气溶胶光学厚度，由激光雷达采集。The multi-source information acquisition system in step (1) of the present invention includes a data acquisition device, an information exchange terminal, and a database. The data acquisition device includes a laser radar, a ground meteorological station, a background radiometer, and a corresponding data interface. The database uses SQL Server to store data. The information exchange terminal exchanges information with the data acquisition device and saves the collected data information in the database. The ground meteorological station, the background radiometer, and the laser radar are used to realize the collection of atmospheric environment parameters and optical parameters. The atmospheric environment parameters include the atmospheric data wind speed WS, temperature TE, humidity HU provided by the ground meteorological station, and the sky background brightness LU collected by the background radiometer. The optical parameter is the aerosol optical thickness, which is collected by the laser radar.

本发明所述步骤(2)中大气环境参数特征提取方法如下：The atmospheric environment parameter feature extraction method in step (2) of the present invention is as follows:

对数据采集系统中存储的大气数据进行主成分分析，包括风速WS、温度TE、湿度HU、天空背景亮度LU，考虑M天的累积效果，第K天风速样本片段WS_K写成如下的矩阵形式：The atmospheric data stored in the data acquisition system are subjected to principal component analysis, including wind speed WS, temperature TE, humidity HU, and sky background brightness LU. Considering the cumulative effect of M days, the wind speed sample fragment WS _K on the Kth day is written in the following matrix form:

其中，ws_KM表示第K个样本片段中第M天的风速值，对风速样本矩阵片段WS_K中的每一个风速值进行标准化处理：Among them, ws _KM represents the wind speed value of the Mth day in the Kth sample segment, and each wind speed value in the wind speed sample matrix segment WS _K is standardized:

其中，ws′_km表示第k个样本片段中第m天的风速标准化值，μ_m、σ_m分别表示第m天风速样本片段的均值和方差，处理后得到风速标准化矩阵WS′_K，并计算风速特征值λ_WS,1,λ_WS,2,…λ_WS,M对应的风速特征向量η_WS,1,η_WS,2,…η_WS,M，将风速特征值从大到小进行排序，计算风速特征值的贡献度，当所选定风速特征值贡献度大于阈值θ₁时，确定主成分个数Num₁；Wherein, ws′ _km represents the normalized wind speed value of the mth day in the kth sample segment, μ _m and σ _m represent the mean and variance of the wind speed sample segment of the mth day, respectively. After processing, the wind speed normalization matrix WS′ _K is obtained, and the wind speed eigenvalues λ _WS,1 ,λ _WS,2 ,…λ _WS,M corresponding to the wind speed eigenvalues η _WS,1 ,η _WS,2 ,…η _WS,M are calculated. The wind speed eigenvalues are sorted from large to small, and the contribution of the wind speed eigenvalues is calculated. When the contribution of the selected wind speed eigenvalue is greater than the threshold θ ₁ , the number of principal components Num ₁ is determined;

对应的风速特征向量集合即为风速特征 The corresponding wind speed feature vector set is the wind speed feature

考虑M天的累积效果，对第K天温度样本片段TE_K进行标准化处理，得到温度标准化矩阵TE′_K，计算温度特征值λ_TE,1,λ_TE,2,…λ_TE,M对应的温度特征向量η_TE,1,η_TE,2,…η_TE,M，将温度特征值从大到小进行排序，计算温度特征值的贡献度，当所选定温度特征值贡献度大于阈值θ₂时，确定主成分个数Num₂；Considering the cumulative effect of M days, the temperature sample fragment TE _K of the Kth day is standardized to obtain the temperature standardized matrix TE′ _K , and the temperature eigenvalues λ _TE,1 ,λ _TE,2 ,…λ _TE,M corresponding to the temperature eigenvalues η _TE,1 ,η _TE,2 ,…η _TE,M are calculated. The temperature eigenvalues are sorted from large to small, and the contribution of the temperature eigenvalues is calculated. When the contribution of the selected temperature eigenvalue is greater than the threshold θ ₂ , the number of principal components Num ₂ is determined;

对应的温度特征向量集合即为温度特征 The corresponding temperature feature vector set is the temperature feature

考虑M天的累积效果，对第K天湿度样本片段HU_K进行标准化处理，得到湿度标准化矩阵HU′_K，计算湿度特征值λ_HU,1,λ_HU,2,…λ_HU,M对应的湿度特征向量η_HU,1,η_HU,2,…η_HU,M，将湿度特征值从大到小进行排序，计算湿度特征值的贡献度，当所选定湿度特征值贡献度大于阈值θ₃时，确定主成分个数Num₃ Considering the cumulative effect of M days, the humidity sample segment HU _K of the Kth day is standardized to obtain the humidity standardized matrix HU′ _K , and the humidity eigenvalues λ _HU,1 ,λ _HU,2 ,…λ _HU,M corresponding to the humidity eigenvalues η _HU,1 ,η _HU,2 ,…η _HU,M are calculated. The humidity eigenvalues are sorted from large to small, and the contribution of the humidity eigenvalues is calculated. When the contribution of the selected humidity eigenvalue is greater than the threshold θ ₃ , the number of principal components Num ₃ is determined.

对应的湿度特征向量集合即为湿度特征 The corresponding humidity feature vector set is the humidity feature

考虑M天的累积效果，对第K天天空背景亮度样本片段LU_K进行标准化处理，得到天空背景亮度标准化矩阵LU′_K，计算天空背景亮度特征值λ_LU,1,λ_LU,2,…λ_LU,M对应的天空背景亮度特征向量η_LU,1,η_LU,2,…η_LU,M，将天空背景亮度特征值从大到小进行排序，计算天空背景亮度特征值的贡献度，当所选定天空背景亮度特征值贡献度大于阈值θ₄时，确定主成分个数Num₄；Considering the cumulative effect of M days, the sky background brightness sample fragment LU _K of the Kth day is standardized to obtain the sky background brightness standardized matrix LU′ _K , the sky background brightness eigenvalues λ _LU,1 ,λ _LU,2 ,…λ _LU,M corresponding to the sky background brightness eigenvalues η _LU,1 ,η _LU,2 ,…η _LU,M are calculated, the sky background brightness eigenvalues are sorted from large to small, the contribution of the sky background brightness eigenvalues is calculated, and when the contribution of the selected sky background brightness eigenvalue is greater than the threshold θ ₄ , the number of principal components Num ₄ is determined;

对应的天空背景亮度特征向量集合即为天空背景亮度特征 The corresponding sky background brightness feature vector set is the sky background brightness feature

输入参数x即为风速特征TE_D、温度特征TE_D、湿度特征HU_D、天空背景亮度特征LU_D的集合，即x＝(WS_D,TE_D,HU_D,LU_D)。The input parameter x is a set of wind speed feature TE _D , temperature feature TE _D , humidity feature HU _D , and sky background brightness feature LU _D , that is, x=(WS _D , TE _D , HU _D , LU _D ).

本发明所述步骤(3)中非线性映射模型的构建包含输入层、隐层、输出层的3层神经网络结构，考虑新到达的样本数据，数据采样长度为N，初始阶段，输入层为N个节点，隐层为L个节点，输出层为1个节点，对输入参数x和输出参数y进行采样与合并，得到样本数据x_i代表x的第i个输入参数，y_i代表y的第i个输出参数，非线性映射模型输入输出关系如下：The construction of the nonlinear mapping model in step (3) of the present invention includes a three-layer neural network structure of an input layer, a hidden layer, and an output layer. Considering the newly arrived sample data, the data sampling length is N. In the initial stage, the input layer has N nodes, the hidden layer has L nodes, and the output layer has 1 node. The input parameter x and the output parameter y are sampled and merged to obtain the sample data. _Xi represents the i-th input parameter of x, and _Yi represents the i-th output parameter of y. The input-output relationship of the nonlinear mapping model is as follows:

其中，a和b分别为输入权重和偏置，β为输出层权重，L为神经网络隐层节点数目，为激活函数，从整体上考虑节点间的映射关系，根据极限学习机理论，输入层节点和隐层之间的权重和偏置随机选取，输出权重β是整个网络中唯一需要求解的变量，通过Moore-Penrose广义逆计算，公式(7)可以表示成如下的矩阵形式：Among them, a and b are input weight and bias respectively, β is the output layer weight, L is the number of hidden layer nodes of the neural network, is the activation function. Considering the mapping relationship between nodes as a whole, according to the extreme learning machine theory, the weights and biases between the input layer nodes and the hidden layer are randomly selected. The output weight β is the only variable that needs to be solved in the entire network. Through Moore-Penrose generalized inverse calculation, formula (7) can be expressed in the following matrix form:

y＝Hβ······························(8)y＝Hβ······························(8)

其中，y＝[y₁,y₂,...,y_N]^T代表了当前片段的气溶胶光学厚度，引入非线性模型中隐层矩阵H进行计算，H为N行L列的随机矩阵，具体表达式为：Among them, y = [y ₁ ,y ₂ ,...,y _N ] ^T represents the aerosol optical thickness of the current segment. The hidden matrix H is introduced into the nonlinear model for calculation. H is a random matrix with N rows and L columns. The specific expression is:

其中，为激活函数，实现非线性映射，保证了函数在区间内存在导数，为了进一步计算输出权重β，计算H的Moore-Penrose广义逆 in, To activate the function, nonlinear mapping is realized, which ensures that the function has derivatives in the interval. In order to further calculate the output weight β, the Moore-Penrose generalized inverse of H is calculated.

H⁺＝(H^TH)^-1H^T·················(10)H ⁺ =(H ^T H) ^-1 H ^T ··················(10)

采用求解Moore-Penrose广义逆的形式，得到输出权重β的最优解β^*：By solving the Moore-Penrose generalized inverse, we can obtain the optimal solution β ^* of the output weight β:

β^*＝H⁺y····························(11)β ^* =H ⁺ y·····························(11)

进一步优化输入权重和偏置，对权重和偏置进行字符串编码，得到候选解向量集合E＝(a,b)，具体形式如下：Further optimize the input weights and biases, encode the weights and biases into strings, and obtain the candidate solution vector set E = (a, b), which is in the following form:

候选向量集合的边界为最大值和最小值的解向量，在这里，表示为E_max＝{max(e)|e∈E}和E_min＝{min(e)|e∈E}，其中，e表示每个元素都满足边界条件的一组集合，根据人工蜂群算法进行优化，在初始条件下，输入权重、偏置的解向量表示为：The boundaries of the candidate vector set are the maximum and minimum solution vectors, which are expressed here as E _max = {max(e)|e∈E} and E _min = {min(e)|e∈E}, where e represents a set of elements that satisfy the boundary conditions. According to the artificial bee colony algorithm, the solution vector of the input weight and bias under the initial conditions is expressed as:

E_h,j＝E_min,j+ω×(E_max,j-E_min,j)···················(13)E _h,j =E _min,j +ω×(E _max,j -E _min,j )···················(13)

E_min,j和E_max,j分别表示第j个输入解向量的最大值和最小值，ω均匀分布于区间[-1,1]，进行最优解向量搜索，得到如下的更新策略：E _min,j and E _max,j represent the maximum and minimum values of the jth input solution vector respectively. ω is evenly distributed in the interval [-1,1]. The optimal solution vector search is performed and the following update strategy is obtained:

其中，E_h,j表示当前的解向量，表示新的可行的解向量，h、j和r均为指示标识，u_h,j在区间[-1,1]范围内随机变化，计算适应度，选择最优的解向量：Among them, E _h,j represents the current solution vector, Represents a new feasible solution vector, h, j and r are all indicators, u _{h, j} changes randomly in the interval [-1,1], calculates the fitness, and selects the optimal solution vector:

其中，num表示迭代次数，完成迭代后，最优的解向量将被更新，得到最优的输入权重a^*、偏置b^*，于是，非线性模型的表达式最优解为：Among them, num represents the number of iterations. After the iteration is completed, the optimal solution vector will be updated to obtain the optimal input weight a ^* and bias b ^* . Therefore, the optimal solution of the expression of the nonlinear model is:

本发明所述步骤(4)中非线性映射模型更新方法是：The nonlinear mapping model updating method in step (4) of the present invention is:

在初始阶段，得到输出权重新的样本片段到达后，计算实时到达数据片段的真实值和预测值的误差，表示为：In the initial stage, the output weights are obtained New sample snippet After arrival, the error between the true value and the predicted value of the real-time arriving data segment is calculated, expressed as:

其中，N₀和N₁分别表示当前样本片段的起始位置和截止位置，H₁和y₁分别表示新到达片段的隐层输出和标签集合，β₁表示当前非线性模型输出权重，在当前输出权重的基础上，得到β₁表达式为：Among them, N ₀ and N ₁ represent the starting position and end position of the current sample segment, H ₁ and y ₁ represent the hidden layer output and label set of the newly arrived segment, β ₁ represents the current nonlinear model output weight. Based on the current output weight, the expression of β ₁ is:

将上面的表达式分为两部分考虑，引入中间变量根据矩阵乘法结合律，前一部分矩阵的表达式为：Divide the above expression into two parts and introduce intermediate variables According to the associative law of matrix multiplication, the expression of the first part of the matrix is:

在当前的片段状态下，H₁和K₀均为已知条件，进一步代入到整体的输出层权重表达式中，可以得到模型更新阶段的输出权重β₁的表达式：In the current fragment state, H ₁ and K ₀ are both known conditions. Substituting them into the overall output layer weight expression, we can get the expression of the output weight β ₁ in the model update phase:

进一步进行迭代计算，就得到了第(k+1)片段对应的中间变量K_k+1表达式Further iterative calculations yield the intermediate variable K _k+1 expression corresponding to the (k+1)th segment:

进一步进行迭代计算，就得到了第(k+1)片段对应的非线性映射模型输出权重β_k+1的最优解：Further iterative calculations are performed to obtain the optimal solution for the output weight β _k+1 of the nonlinear mapping model corresponding to the (k+1)th segment:

在线更新后的模型表达式为：The model expression after online update is:

将输入参数x代入到更新后的模型，就可以计算得到气溶胶光学厚度y，实现气溶胶光学厚度的在线预测。By substituting the input parameter x into the updated model, the aerosol optical depth y can be calculated, thus realizing the online prediction of aerosol optical depth.

本发明建立了多源数据采集系统，通过信息交互终端，实现了数据采集设备到数据库的信息交互。数据采集设备实现了风速、温度、湿度、天空背景亮度等数据的采集，并通过数据接口传递到信息交互终端，通过数据处理操作，实现数据存储。在多源数据采集系统基础上，构建了非线性映射模型，设计神经网络结构，并实现神经网络权重和偏置的优化，得到气溶胶光学厚度的映射模型。通过实时采集的数据进行非线性模型更新，实现对气溶胶光学厚度的在线预测方法。The present invention establishes a multi-source data acquisition system, and realizes information interaction from the data acquisition device to the database through the information interaction terminal. The data acquisition device realizes the collection of data such as wind speed, temperature, humidity, sky background brightness, etc., and transmits them to the information interaction terminal through the data interface, and realizes data storage through data processing operations. Based on the multi-source data acquisition system, a nonlinear mapping model is constructed, a neural network structure is designed, and the optimization of the neural network weights and biases is realized to obtain a mapping model of aerosol optical thickness. The nonlinear model is updated through the real-time collected data to realize an online prediction method for aerosol optical thickness.

本发明优点是：The advantages of the present invention are:

(1)本发明考虑了多源数据采集系统，存储了环境参数和光学参数，通过数据驱动的思想建立了非线性映射模型，实现了气溶胶光学参数在线预测，突破了物理模型驱动方式造成的精度不足和难以计算的瓶颈。(1) The present invention takes into account a multi-source data acquisition system, stores environmental parameters and optical parameters, establishes a nonlinear mapping model through the idea of data-driven, realizes the online prediction of aerosol optical parameters, and breaks through the bottleneck of insufficient accuracy and difficulty in calculation caused by the physical model-driven method.

(2)通过主成分分析进行特征提取，利用动态极限学习机构建神经网络，优化了权重和偏置，得到了可适应在线环境的输入、输出关系。(2) Feature extraction is performed through principal component analysis, and a neural network is constructed using a dynamic extreme learning machine. The weights and biases are optimized to obtain an input-output relationship that can adapt to the online environment.

(3)依托于多源信息处理方案，根据当前数据片段定量调整模型，避免了模型的重新学习，实现整个预测过程的自动化，为复杂系统智能化处理提供解决方案，对工程实践有一定的普适性。(3) Relying on a multi-source information processing solution, the model is quantitatively adjusted according to the current data fragment, avoiding the need to relearn the model and automating the entire prediction process, providing a solution for the intelligent processing of complex systems and having a certain degree of universality in engineering practice.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明多源数据采集系统；FIG1 is a multi-source data acquisition system of the present invention;

图2是本发明非线性映射模型；FIG2 is a nonlinear mapping model of the present invention;

图3是本发明的流程图。FIG. 3 is a flow chart of the present invention.

具体实施方式DETAILED DESCRIPTION

包括下列步骤：The following steps are involved:

(2)对大气环境参数进行特征提取，得到风速特征WS_D、温度特征TE_D、湿度特征HU_D、天空背景亮度特征LU_D，组成输入参数x，气溶胶光学厚度为输出参数y；(2) Extract the characteristics of atmospheric environment parameters to obtain wind speed characteristics WS _D , temperature characteristics TE _D , humidity characteristics HU _D , and sky background brightness characteristics LU _D , which constitute the input parameter x, and the aerosol optical thickness is the output parameter y;

(4)对非线性映射模型进行在线更新，计算实时到达数据片段的真实值和预测值的误差，根据误差定量更新非线性映射模型，解析计算非线性映射模型的输出权重，得到下一个数据片段气溶胶光学厚度预测值，实现气溶胶光学厚度在线预测，通过特征提取、非线性模型、模型更新策略，将多源特征进行融合，实现了气溶胶光学厚度在线预测。(4) The nonlinear mapping model is updated online, the error between the real value and the predicted value of the real-time data segment is calculated, the nonlinear mapping model is quantitatively updated according to the error, the output weight of the nonlinear mapping model is analytically calculated, and the predicted value of the aerosol optical thickness of the next data segment is obtained to realize the online prediction of aerosol optical thickness. Through feature extraction, nonlinear model and model updating strategy, multi-source features are integrated to realize the online prediction of aerosol optical thickness.

本发明所述步骤(1)中多源信息采集系统包括数据采集设备、信息交互终端、数据库，数据采集设备包括激光雷达、地面气象站、背景辐射计，以及对应的数据接口，数据库采用SQL Server存储数据，信息交互终端与数据采集设备进行信息交互，并将采集到的数据信息保存到数据库中；其中地面气象站、背景辐射计和激光雷达用于实现大气环境参数和光学参数的采集，大气环境参数包括地面气象站提供的大气数据风速WS、温度TE、湿度HU，以及背景辐射计采集的天空背景亮度LU，光学参数为气溶胶光学厚度，由激光雷达采集；The multi-source information acquisition system in step (1) of the present invention includes a data acquisition device, an information exchange terminal, and a database. The data acquisition device includes a laser radar, a ground meteorological station, a background radiometer, and a corresponding data interface. The database uses SQL Server to store data. The information exchange terminal exchanges information with the data acquisition device and saves the collected data information into the database. The ground meteorological station, the background radiometer, and the laser radar are used to collect atmospheric environment parameters and optical parameters. The atmospheric environment parameters include atmospheric data wind speed WS, temperature TE, humidity HU provided by the ground meteorological station, and sky background brightness LU collected by the background radiometer. The optical parameter is aerosol optical thickness, which is collected by the laser radar.

其中，ws′_km表示第k个样本片段中第m天的风速标准化值，μ_m、σ_m分别表示第m天风速样本片段的均值和方差，处理后得到风速标准化矩阵WS′_K，并计算风速特征值λ_WS,1,λ_WS，2,…λ_WS,M对应的风速特征向量η_WS,1,η_WS,2,…η_WS,M，将风速特征值从大到小进行排序，计算风速特征值的贡献度，当所选定风速特征值贡献度大于阈值θ₁时，确定主成分个数Num₁；Wherein, ws′ _km represents the normalized wind speed value of the mth day in the kth sample segment, μ _m and σ _m represent the mean and variance of the wind speed sample segment of the mth day, respectively. After processing, the wind speed normalization matrix WS′ _K is obtained, and the wind speed eigenvalues λ _WS,1 ,λ _WS，2 ,…λ _WS,M corresponding to the wind speed eigenvalues η _WS,1 ,η _WS,2 ,…η _WS,M are calculated. The wind speed eigenvalues are sorted from large to small, and the contribution of the wind speed eigenvalues is calculated. When the contribution of the selected wind speed eigenvalue is greater than the threshold θ ₁ , the number of principal components Num ₁ is determined;

输入参数x即为风速特征TE_D、温度特征TE_D、湿度特征HU_D、天空背景亮度特征LU_D的集合，即x＝(WS_D,TE_D,HU_D,LU_D)；The input parameter x is the set of wind speed feature TE _D , temperature feature TE _D , humidity feature HU _D , and sky background brightness feature LU _D , that is, x=(WS _D , TE _D , HU _D , LU _D );

y＝Hβ·····························(8)y＝Hβ·····························(8)

其中，E_h,j表示当前的解向量，表示新的可行的解向量，h、j和r均为指示标识。u_h,j在区间[-1,1]范围内随机变化。计算适应度，选择最优的解向量：Among them, E _h,j represents the current solution vector, represents a new feasible solution vector, h, j and r are all indicators. _{u h,j} changes randomly in the interval [-1,1]. Calculate the fitness and select the optimal solution vector:

将输入参数x代入到更新后的模型，就可以计算得到气溶胶光学厚度y，实现气溶胶光学厚度的在线预测。在整个求解过程中，非线性模型不需要重新学习，根据到达的数据就能对模型权重进行更新，实现气溶胶光学厚度在线预测功能。Substituting the input parameter x into the updated model, the aerosol optical thickness y can be calculated, realizing the online prediction of aerosol optical thickness. During the entire solution process, the nonlinear model does not need to be relearned, and the model weights can be updated according to the received data to realize the online prediction function of aerosol optical thickness.

下面结合附图对本发明进行具体说明。The present invention will be described in detail below with reference to the accompanying drawings.

(1)、搭建多源数据采集系统，具体形式如图1所示，激光雷达采集气溶胶光学厚度信息，地面气象站采集风速、温度、湿度信息，背景辐射计采集天空背景亮度信息，信息交互终端为普通计算机，可以与数据采集设备实现信息交互。数据库中加载了SQLServer软件，通过维护数据表实现数据存储功能；(1) Build a multi-source data acquisition system. The specific form is shown in Figure 1. The laser radar collects aerosol optical thickness information, the ground meteorological station collects wind speed, temperature, and humidity information, and the background radiometer collects sky background brightness information. The information exchange terminal is an ordinary computer that can realize information exchange with the data acquisition equipment. SQLServer software is loaded in the database, and the data storage function is realized by maintaining the data table;

激光雷达通过RJ45网络接口与信息交互终端相连，背景辐射计和地面气象站采用RS232接口与信息交互终端进行串口通信，在信息采集的过程中，信息交互终端向数据采集设备发送采集命令，数据采集设备通过数据接口将数据信息传递到信息交互终端，实现信息交互，The laser radar is connected to the information interaction terminal through the RJ45 network interface. The background radiometer and the ground meteorological station use the RS232 interface to communicate with the information interaction terminal. During the information collection process, the information interaction terminal sends a collection command to the data collection device, and the data collection device transmits the data information to the information interaction terminal through the data interface to realize information interaction.

(2)、实现特征提取。将采集到的数据进行整理，通过主成分分析分别考虑风速、温度、湿度、天空背景亮度的累积效果，以1天为数据采集周期，每间隔半小时为1个采样点，每天为48个样本向量，考虑当前时间节点前7天的数据，形成48行7列的样本片段矩阵；(2) Feature extraction. The collected data is sorted and the cumulative effects of wind speed, temperature, humidity, and sky background brightness are considered through principal component analysis. The data collection period is one day, and each half-hour interval is one sampling point. There are 48 sample vectors per day. Considering the data of the 7 days before the current time node, a sample segment matrix with 48 rows and 7 columns is formed;

对数据采集系统中存储的大气数据进行主成分分析，包括风速WS、温度TE、湿度HU、天空背景亮度LU，考虑M天的累积效果，第K天风速样本片段WS_K可以写成如下的矩阵形式：The principal component analysis of the atmospheric data stored in the data acquisition system, including wind speed WS, temperature TE, humidity HU, and sky background brightness LU, is performed. Considering the cumulative effect of M days, the wind speed sample fragment WS _K on the Kth day can be written in the following matrix form:

其中，ws′_km表示第k个样本片段中第m天的风速标准化值，μ_m、σ_m分别表示第m天风速样本片段的均值和方差，处理后得到风速标准化矩阵WS′_K，并计算风速特征值λ_WS,1,λ_WS,2,…λ_WS,M对应的风速特征向量η_WS,1,η_WS,2,…η_WS,M，将风速特征值从大到小进行排序，计算风速特征值的贡献度，当所选定风速特征值贡献度大于阈值θ₁＝0.90时，确定主成分个数Num₁；Wherein, ws′ _km represents the normalized wind speed value of the mth day in the kth sample segment, μ _m and σ _m represent the mean and variance of the wind speed sample segment of the mth day, respectively. After processing, the wind speed normalized matrix WS′ _K is obtained, and the wind speed characteristic vectors η _WS,1 ,η _WS,2 ,…η _WS,M corresponding to the wind speed characteristic values λ _WS,1 ,λ _WS,2 ,…λ _WS,M are calculated. The wind speed characteristic values are sorted from large to small, and the contribution of the wind speed characteristic values is calculated. When the contribution of the selected wind speed characteristic value is greater than the threshold value θ ₁ =0.90, the number of principal components Num ₁ is determined;

考虑M天的累积效果，对第K天温度样本片段TE_K进行标准化处理，得到温度标准化矩阵TE′_K，计算温度特征值λ_TE,1,λ_TE,2,…λ_TE,M对应的温度特征向量η_TE,1,η_TE,2,…η_TE,M，将温度特征值从大到小进行排序，计算温度特征值的贡献度，当所选定温度特征值贡献度大于阈值θ₂＝0.85时，确定主成分个数Num₂ Considering the cumulative effect of M days, the temperature sample fragment TE _K of the Kth day is standardized to obtain the temperature standardized matrix TE′ _K , and the temperature eigenvalues λ _TE,1 ,λ _TE,2 ,…λ _TE,M corresponding to the temperature eigenvalues η _TE,1 ,η _TE,2 ,…η _TE,M are calculated. The temperature eigenvalues are sorted from large to small, and the contribution of the temperature eigenvalues is calculated. When the contribution of the selected temperature eigenvalue is greater than the threshold value θ ₂ =0.85, the number of principal components Num ₂ is determined.

考虑M天的累积效果，对第K天湿度样本片段HU_K进行标准化处理，得到湿度标准化矩阵HU′_K，计算湿度特征值λ_HU,1,λ_HU,2,…λ_HU,M对应的湿度特征向量η_HU,1,η_HU,2,…η_HU,M，将湿度特征值从大到小进行排序，计算湿度特征值的贡献度，当所选定湿度特征值贡献度大于阈值θ₃＝0.85时，确定主成分个数Num₃ Considering the cumulative effect of M days, the humidity sample segment HU _K of the Kth day is standardized to obtain the humidity standardized matrix HU′ _K , and the humidity eigenvalues λ _HU,1 ,λ _HU,2 ,…λ _HU,M corresponding to the humidity eigenvalues η _HU,1 ,η _HU,2 ,…η _HU,M are calculated. The humidity eigenvalues are sorted from large to small, and the contribution of the humidity eigenvalues is calculated. When the contribution of the selected humidity eigenvalue is greater than the threshold value θ ₃ =0.85, the number of principal components Num ₃ is determined.

考虑M天的累积效果，对第K天天空背景亮度样本片段LU_K进行标准化处理，得到天空背景亮度标准化矩阵LU′_K，计算天空背景亮度特征值λ_LU,1,λ_LU,2,…λ_LU,M对应的天空背景亮度特征向量η_LU,1,η_LU,2,…η_LU,M，将天空背景亮度特征值从大到小进行排序，计算天空背景亮度特征值的贡献度，当所选定天空背景亮度特征值贡献度大于阈值θ₄＝0.80时，确定主成分个数Num₄；Considering the cumulative effect of M days, the sky background brightness sample fragment LU _K of the Kth day is standardized to obtain the sky background brightness standardized matrix LU′ _K , and the sky background brightness eigenvalues λ _LU,1 ,λ _LU,2 ,…λ _LU,M corresponding to the sky background brightness eigenvalues η _LU,1 ,η _LU,2 ,…η _LU,M are calculated. The sky background brightness eigenvalues are sorted from large to small, and the contribution of the sky background brightness eigenvalues is calculated. When the contribution of the selected sky background brightness eigenvalue is greater than the threshold value θ ₄ =0.80, the number of principal components Num ₄ is determined;

(3)采用数据驱动思想，建立基于动态极限学习机(ELM)的非线性映射模型，确定了输入参数和输出参数的关系，利用人工蜂群算法进行优化，确定了非线性映射模型输入权重和偏置的最优解；其中：(3) Using data-driven thinking, a nonlinear mapping model based on a dynamic extreme learning machine (ELM) was established, the relationship between input parameters and output parameters was determined, and the artificial bee colony algorithm was used for optimization to determine the optimal solution for the input weight and bias of the nonlinear mapping model;

非线性映射模型的构建包含输入层、隐层、输出层的3层神经网络结构，考虑新到达的样本数据，数据采样长度为48，初始阶段，输入层为48个节点，隐层为50个节点，输出层为1个节点，其中，输入层节点和隐层之间为全相关连接，隐层和输出层之间为全相关连接，输入层权重为48行50列的矩阵，偏置为1行50列的矩阵，对输入参数x和输出参数y进行采样与合并，得到样本数据x_i代表x的第i个输入参数，y_i代表y的第i个输出参数，非线性映射模型输入输出关系如下：The construction of the nonlinear mapping model includes a three-layer neural network structure of input layer, hidden layer, and output layer. Considering the newly arrived sample data, the data sampling length is 48. In the initial stage, the input layer has 48 nodes, the hidden layer has 50 nodes, and the output layer has 1 node. Among them, the input layer nodes and the hidden layer are fully connected, and the hidden layer and the output layer are fully connected. The input layer weight is a matrix of 48 rows and 50 columns, and the bias is a matrix of 1 row and 50 columns. The input parameters x and output parameters y are sampled and merged to obtain sample data. _Xi represents the i-th input parameter of x, and _Yi represents the i-th output parameter of y. The input-output relationship of the nonlinear mapping model is as follows:

y＝Hβ·····················(8)y＝Hβ·····················(8)

其中，y＝[y₁,y₂,...,y_N]^T代表了当前片段的气溶胶光学厚度，引入非线性模型中隐层矩阵H进行计算，H为48行50列的随机矩阵，具体表达式为：Among them, y = [y ₁ ,y ₂ ,...,y _N ] ^T represents the aerosol optical thickness of the current segment. The hidden matrix H is introduced into the nonlinear model for calculation. H is a random matrix with 48 rows and 50 columns. The specific expression is:

其中，非线性模型的输入参数可以表示为x＝[x₁,x₂,...,x_N]^T，融合了多个特征；Among them, the input parameters of the nonlinear model can be expressed as x = [x ₁ , x ₂ , ..., x _N ] ^T , which integrates multiple features;

计算输出权重。输入层节点和隐层之间的权重和偏置随机选取，隐层和输出层之间的权重通过M-P广义逆计算。通过M-P广义逆得到输出权重，隐层矩阵H的广义逆 Calculate the output weights. The weights and biases between the input layer nodes and the hidden layer are randomly selected, and the weights between the hidden layer and the output layer are calculated using the MP generalized inverse. The output weights are obtained using the MP generalized inverse, and the generalized inverse of the hidden layer matrix H is

H⁺＝(H^TH)^-1H^T·········(10)H ⁺ =(H ^T H) ^-1 H ^T ··········(10)

满足误差最小的条件，β是整个网络中唯一需要求解的变量，y＝[y₁,y₂,...,y_N]^T代表了当前片段的气溶胶光学厚度值。通过M-P广义逆，得到输出权重的表达式；To meet the minimum error condition, β is the only variable that needs to be solved in the entire network, and y = [y ₁ ,y ₂ ,...,y _N ] ^T represents the aerosol optical thickness value of the current segment. Through the MP generalized inverse, the expression of the output weight is obtained;

进一步对权重和偏置进行优化，将权重和偏置进行字符串编码，得到候选向量集合，具体形式如下：The weights and biases are further optimized, and the weights and biases are string-encoded to obtain a set of candidate vectors, which are in the following form:

候选向量集合的边界为最大值和最小值的解向量，e表示每个元素都满足边界条件的一组集合，E_max＝{max(e)|e∈E}和E_min＝{min(e)|e∈E}。根据人工蜂群算法进行优化，人工蜂群算法类比蜜蜂进行食物查找，实现多参数优化，主要考虑三种形式，包括通过雇佣蜂、跟随蜂和侦查蜂，三种蜜蜂写作完成信息搜索任务。在进行最优的权重和偏置求解的过程中，雇佣蜂首先进行信息的查找，并将查找的结果进行反馈，跟随蜂接收到反馈回来的信息，然后继续在最优向量周围搜索，如果发现更优解，则进行更新；否则，放弃当前解，侦查蜂继续寻找。最终雇佣蜂实现最优解附近局部搜索，侦查蜂完成全局搜索，整个算法收敛，通过优化得到非线性映射模型权重和偏置的最优解向量；The boundaries of the candidate vector set are the solution vectors of the maximum and minimum values, e represents a set of sets whose elements satisfy the boundary conditions, E _max = {max(e)|e∈E} and E _min = {min(e)|e∈E}. The optimization is performed according to the artificial bee colony algorithm. The artificial bee colony algorithm is analogous to bees searching for food to achieve multi-parameter optimization. Three forms are mainly considered, including employed bees, follower bees and scout bees. The three types of bees complete the information search task. In the process of solving the optimal weights and biases, the employed bees first search for information and feedback the results of the search. The follower bees receive the feedback information and continue to search around the optimal vector. If a better solution is found, it is updated; otherwise, the current solution is abandoned and the scout bees continue to search. Finally, the employed bees realize the local search near the optimal solution, the scout bees complete the global search, the entire algorithm converges, and the optimal solution vector of the weights and bias of the nonlinear mapping model is obtained through optimization;

在得到第j个输入解向量的最大值和最小值E_min,j和E_max,j的基础上，计算初始条件下的边界组合，权重解向量表示为：Based on the maximum and minimum values E _min,j and E _max,j of the jth input solution vector, the boundary combination under the initial conditions is calculated, and the weighted solution vector is expressed as:

E_h,j＝E_min,j+ω×(E_max，j-E_min,j)··················(13)E _h,j =E _min,j +ω×(E _max,j -E _min,j )···················(13)

确定更新参数，此时得到在区间[-1,1]范围内随机变化的u_h,j，然后通过最优向量搜索，得到候选向量更新策略：Determine the update parameters, and get u _h,j that changes randomly in the interval [-1,1]. Then, through the optimal vector search, get the candidate vector update strategy:

通过随机变量u_h,j的迭代，实现不同权重和偏置条件下适应度的计算：Through the iteration of random variables u _h,j , the fitness calculation under different weights and bias conditions is realized:

在这里，迭代次数num为50，完成迭代后，最优的解向量将被更新，得到最优的输入权重a^*、偏置b^*，综合神经网络各层的权重和偏置，得到输入参数和输出参数的关系模型：Here, the number of iterations num is 50. After the iteration is completed, the optimal solution vector will be updated to obtain the optimal input weight a ^* and bias b ^* . By integrating the weights and biases of each layer of the neural network, the relationship model between the input parameters and the output parameters is obtained:

初始阶段神经网络输入层为48个节点，隐层为50个节点，输出层为1个节点。确定非线性映射模型的所有参数，就得到了输入参数和输出参数关系，即风速、温度、湿度、天空背景亮度与气溶胶光学厚度的解析关系，初始阶段得到的输出权重表示为： In the initial stage, the neural network input layer has 48 nodes, the hidden layer has 50 nodes, and the output layer has 1 node. After determining all the parameters of the nonlinear mapping model, the relationship between the input parameters and the output parameters is obtained, that is, the analytical relationship between wind speed, temperature, humidity, sky background brightness and aerosol optical thickness. The output weight obtained in the initial stage is expressed as:

(4)、进行模型更新，新的样本片段到达后，根据采集到的数据片段计算样本的近似误差，目标函数表示为：(4) Update the model and add new sample segments After arrival, the approximate error of the sample is calculated based on the collected data fragments, and the objective function is expressed as:

N₀和N₁分别表示当前样本片段的起始位置和截止位置，H₁和Y₁分别表示新到达片段的隐层输出和标签集合，β₁表表示当前非线性模型权重。在当前输出权重的基础上，得到输出层权重表达式为：N ₀ and N ₁ represent the starting position and end position of the current sample segment, respectively. H ₁ and Y ₁ represent the hidden layer output and label set of the newly arrived segment, respectively. β ₁ represents the current nonlinear model weight. Based on the current output weight, the output layer weight expression is:

分别考虑表达式的前后两部分，引入中间变量通过矩阵乘法结合律，计算前一部分矩阵的表达式为：Consider the front and back parts of the expression separately and introduce intermediate variables By using the associative law of matrix multiplication, the expression for calculating the first part of the matrix is:

考虑β₁表达式的后一部分，利用矩阵乘法原理进行展开，得到：Consider the latter part of the expression of β ₁ and expand it using the principle of matrix multiplication to obtain:

在当前输出权重的基础上，将新的输入参数代入到非线性模型中，通过矩阵变换得到输出权重的表达式，在当前时刻，H₁和K₀均为已知条件，计算模型更新阶段的输出权重为：On the basis of the current output weight, the new input parameters are substituted into the nonlinear model, and the expression of the output weight is obtained through matrix transformation. At the current moment, _H1 and _K0 are both known conditions, and the output weight of the calculation model update stage is:

在计算过程中，引入中间变量随机矩阵K，实现当前片段输入参数和输出参数的映射表达，进一步得到了每一次迭代之后，第(k+1)片段的表达式：During the calculation process, the intermediate variable random matrix K is introduced to realize the mapping expression of the input parameters and output parameters of the current segment, and the expression of the (k+1)th segment after each iteration is further obtained:

计算气溶胶光学厚度，将新片段的输入参数信息代入到非线性模型中，通过神经网络计算，得到输出参数数值；Calculate the aerosol optical thickness, substitute the input parameter information of the new segment into the nonlinear model, and obtain the output parameter value through neural network calculation;

最后通过反归一化，就得到了最终气溶胶光学厚度的预测结果。Finally, through inverse normalization, the final prediction result of aerosol optical thickness is obtained.

Claims

1. An online prediction method for aerosol optical thickness based on multi-source information, characterized in that it comprises the following steps:

(1) Establish a multi-source information acquisition system to collect and store atmospheric environment parameters and optical parameters;

(2) Extract the characteristics of atmospheric environmental parameters to obtain wind speed characteristics WS _D , temperature characteristics TE _D , humidity characteristics HU _D , and sky background brightness characteristics LU _D , which constitute the input parameter x, and the aerosol optical thickness is the output parameter y;

The atmospheric environment parameter feature extraction method is as follows:

The atmospheric data stored in the data acquisition system are subjected to principal component analysis, including wind speed WS, temperature TE, humidity HU, and sky background brightness LU. Considering the cumulative effect of M days, the wind speed sample fragment WS _K on the Kth day is written in the following matrix form:

Among them, ws _KM represents the wind speed value of the Mth day in the Kth sample segment, and each wind speed value in the wind speed sample matrix segment WS _K is standardized:

Wherein, ws′ _km represents the normalized wind speed value of the mth day in the kth sample segment, μ _m and σ _m represent the mean and variance of the wind speed sample segment of the mth day, respectively. After processing, the wind speed normalization matrix WS′ _K is obtained, and the wind speed eigenvalues λ _WS,1 ,λ _WS,2 ,...λ _WS,M corresponding to the wind speed eigenvalues η _WS,1 ,η _WS,2 ,...η _WS,M are calculated. The wind speed eigenvalues are sorted from large to small, and the contribution of the wind speed eigenvalues is calculated. When the contribution of the selected wind speed eigenvalue is greater than the threshold θ ₁ , the number of principal components Num ₁ is determined;

The corresponding wind speed feature vector set is the wind speed feature

Considering the cumulative effect of M days, the temperature sample fragment TE _K of the Kth day is standardized to obtain the temperature standardized matrix TE′ _K , and the temperature eigenvalues λ _TE,1 ,λ _TE,2 ,...λ _TE,M corresponding to the temperature eigenvalues η _TE,1 ,η _TE,2 ,...η _TE,M are calculated. The temperature eigenvalues are sorted from large to small, and the contribution of the temperature eigenvalues is calculated. When the contribution of the selected temperature eigenvalue is greater than the threshold θ ₂ , the number of principal components Num ₂ is determined;

The corresponding temperature feature vector set is the temperature feature

Considering the cumulative effect of M days, the humidity sample segment HU _K of the Kth day is standardized to obtain the humidity standardized matrix HU′ _K , and the humidity eigenvalues λ _HU,1 ,λ _HU,2 ,...λ _HU,M corresponding to the humidity eigenvalues η _HU,1 ,η _HU,2 ,...η _HU,M are calculated. The humidity eigenvalues are sorted from large to small, and the contribution of the humidity eigenvalues is calculated. When the contribution of the selected humidity eigenvalue is greater than the threshold θ ₃ , the number of principal components Num ₃ is determined.

The corresponding humidity feature vector set is the humidity feature

Considering the cumulative effect of M days, the sky background brightness sample fragment LU _K of the Kth day is standardized to obtain the sky background brightness standardized matrix LU′ _K , the sky background brightness eigenvalues λ _LU,1 ,λ _LU,2 ,...λ _LU,M corresponding to the sky background brightness eigenvalues η _LU,1 ,η _LU,2 ,...η _LU,M are calculated, the sky background brightness eigenvalues are sorted from large to small, the contribution of the sky background brightness eigenvalues is calculated, and when the contribution of the selected sky background brightness eigenvalue is greater than the threshold θ ₄ , the number of principal components Num ₄ is determined;

The corresponding sky background brightness feature vector set is the sky background brightness feature

The input parameter x is the set of wind speed feature TE _D , temperature feature TE _D , humidity feature HU _D , and sky background brightness feature LU _D , that is, x=(WS _D , TE _D , HU _D , LU _D );

(3) Using data-driven thinking, a nonlinear mapping model based on a dynamic extreme learning machine (ELM) was established, the relationship between input parameters and output parameters was determined, and the artificial bee colony algorithm was used for optimization to determine the optimal solution for the input weight and bias of the nonlinear mapping model;

The construction of the nonlinear mapping model includes a three-layer neural network structure of input layer, hidden layer and output layer. Considering the newly arrived sample data, the data sampling length is N. In the initial stage, the input layer is N nodes, the hidden layer is L nodes, and the output layer is 1 node. The input parameter x and the output parameter y are sampled and merged to obtain the sample data. _Xi represents the i-th input parameter of x, and _Yi represents the i-th output parameter of y. The input-output relationship of the nonlinear mapping model is as follows:

Among them, a and b are input weight and bias respectively, β is the output layer weight, L is the number of hidden layer nodes of the neural network, is the activation function. Considering the mapping relationship between nodes as a whole, according to the extreme learning machine theory, the weights and biases between the input layer nodes and the hidden layer are randomly selected. The output weight β is the only variable that needs to be solved in the entire network. Through Moore-Penrose generalized inverse calculation, formula (7) can be expressed in the following matrix form:

y=Hβ·································(8)

Among them, y = [y ₁ ,y ₂ ,...,y _N ] ^T represents the aerosol optical thickness of the current segment. The hidden matrix H is introduced into the nonlinear model for calculation. H is a random matrix with N rows and L columns. The specific expression is:

in, To activate the function, nonlinear mapping is realized, which ensures that the function has derivatives in the interval. In order to further calculate the output weight β, the Moore-Penrose generalized inverse of H is calculated.

H ⁺ =(H ^T H) ^-1 H ^T ···························(10)

By solving the Moore-Penrose generalized inverse, we can obtain the optimal solution β ^* of the output weight β:

β ^* =H ⁺ y····························(11)

Further optimize the input weights and biases, encode the weights and biases into strings, and obtain the candidate solution vector set E = (a, b), which is in the following form:

The boundaries of the candidate vector set are the maximum and minimum solution vectors, which are expressed here as E _max = {max(e)|e∈E} and E _min = {min(e)|e∈E}, where e represents a set of elements that satisfy the boundary conditions. According to the artificial bee colony algorithm, the solution vector of the input weight and bias under the initial conditions is expressed as:

E _h,j =E _min,j +ω×(E _max,j -E _min,j )····················(13)

E _min,j and E _max,j represent the maximum and minimum values of the jth input solution vector respectively. ω is evenly distributed in the interval [-1,1]. The optimal solution vector search is performed and the following update strategy is obtained:

Among them, E _h,j represents the current solution vector, Represents a new feasible solution vector, h, j and r are all indicator marks, u _{h, j} changes randomly in the interval [-1,1], calculates the fitness, and selects the optimal solution vector:

Among them, num represents the number of iterations. After the iteration is completed, the optimal solution vector will be updated to obtain the optimal input weight a ^* and bias b ^* . Therefore, the optimal solution of the expression of the nonlinear model is:

(4) The nonlinear mapping model is updated online, the error between the real value and the predicted value of the real-time data segment is calculated, the nonlinear mapping model is quantitatively updated according to the error, the output weight of the nonlinear mapping model is calculated analytically, and the predicted value of the aerosol optical thickness of the next data segment is obtained to realize the online prediction of the aerosol optical thickness. Through feature extraction, nonlinear model, and model update strategy, multi-source features are integrated to realize the online prediction of the aerosol optical thickness.

The nonlinear mapping model updating method is:

In the initial stage, the output weights are obtained: New sample snippet After arrival, the error between the true value and the predicted value of the real-time arriving data segment is calculated, expressed as:

Among them, N ₀ and N ₁ represent the starting position and end position of the current sample segment, H ₁ and y ₁ represent the hidden layer output and label set of the newly arrived segment, β ₁ represents the current nonlinear model output weight. Based on the current output weight, the expression of β ₁ is:

Divide the above expression into two parts and introduce intermediate variables According to the associative law of matrix multiplication, the expression of the first part of the matrix is:

In the current fragment state, H ₁ and K ₀ are both known conditions. Substituting them into the overall output layer weight expression, we can get the expression of the output weight β ₁ in the model update phase:

Further iterative calculations yield the intermediate variable K _k+1 expression corresponding to the (k+1)th segment:

Further iterative calculations are performed to obtain the optimal solution for the output weight β _k+1 of the nonlinear mapping model corresponding to the (k+1)th segment:

The model expression after online update is:

Substitute the input parameter x into the updated model and calculate the aerosol optical depth y to achieve online prediction of aerosol optical depth.

2. According to claim 1, a method for online prediction of aerosol optical depth based on multi-source information is characterized in that the multi-source information acquisition system in step (1) includes a data acquisition device, an information exchange terminal, and a database, the data acquisition device includes a laser radar, a ground meteorological station, a background radiometer, and a corresponding data interface, the database uses SQL Server to store data, the information exchange terminal exchanges information with the data acquisition device, and saves the collected data information into the database; wherein the ground meteorological station, the background radiometer and the laser radar are used to realize the collection of atmospheric environment parameters and optical parameters, the atmospheric environment parameters include the atmospheric data wind speed WS, temperature TE, humidity HU provided by the ground meteorological station, and the sky background brightness LU collected by the background radiometer, and the optical parameter is the aerosol optical thickness, which is collected by the laser radar.