CN116777213A - Carbon trading market risk early warning system and method based on big data - Google Patents
Carbon trading market risk early warning system and method based on big data Download PDFInfo
- Publication number
- CN116777213A CN116777213A CN202310741848.9A CN202310741848A CN116777213A CN 116777213 A CN116777213 A CN 116777213A CN 202310741848 A CN202310741848 A CN 202310741848A CN 116777213 A CN116777213 A CN 116777213A
- Authority
- CN
- China
- Prior art keywords
- model
- module
- early warning
- risk assessment
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 title claims abstract description 50
- 229910052799 carbon Inorganic materials 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000012502 risk assessment Methods 0.000 claims abstract description 83
- 238000007637 random forest analysis Methods 0.000 claims abstract description 27
- 238000007781 pre-processing Methods 0.000 claims abstract description 25
- 238000003062 neural network model Methods 0.000 claims abstract description 20
- 230000015654 memory Effects 0.000 claims abstract description 11
- 238000003066 decision tree Methods 0.000 claims description 34
- 238000012549 training Methods 0.000 claims description 24
- 238000012545 processing Methods 0.000 claims description 17
- 238000004422 calculation algorithm Methods 0.000 claims description 15
- 238000013480 data collection Methods 0.000 claims description 12
- 238000005457 optimization Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 12
- 238000004140 cleaning Methods 0.000 claims description 10
- 238000012795 verification Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 9
- 230000006403 short-term memory Effects 0.000 claims description 9
- 238000011156 evaluation Methods 0.000 claims description 6
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 5
- 230000007787 long-term memory Effects 0.000 claims description 5
- 230000002159 abnormal effect Effects 0.000 claims description 4
- 238000013079 data visualisation Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000008901 benefit Effects 0.000 abstract description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000002790 cross-validation Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 239000012535 impurity Substances 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域Technical field
本发明涉及大数据分析技术领域,具体涉及基于大数据的碳交易市场风险预警系统以及方法。The invention relates to the technical field of big data analysis, and specifically to a carbon trading market risk early warning system and method based on big data.
背景技术Background technique
随着碳交易市场的兴起,越来越多的企业或者个人参与进入碳交易中,碳交易市场变得更加复杂,因此需要对碳交易市场进行必要的风险预警。With the rise of the carbon trading market, more and more companies or individuals are participating in carbon trading, and the carbon trading market has become more complex. Therefore, necessary risk warnings for the carbon trading market are needed.
目前的碳交易模型主要是以碳减排量、减排效益、清洁能源利用率等作为指标对结果进行衡量,现有的碳交易模型缺少对于碳交易市场未来的预测。The current carbon trading model mainly measures the results using carbon emission reductions, emission reduction benefits, clean energy utilization, etc. as indicators. The existing carbon trading model lacks predictions for the future of the carbon trading market.
综上所述,急需基于大数据的碳交易市场风险预警系统以及方法以解决现有技术中存在的问题。In summary, there is an urgent need for a carbon trading market risk early warning system and methods based on big data to solve the problems existing in the existing technology.
发明内容Contents of the invention
本发明目的在于提供基于大数据的碳交易市场风险预警系统以及方法,具体技术方案如下:The purpose of this invention is to provide a carbon trading market risk early warning system and method based on big data. The specific technical solutions are as follows:
基于大数据的碳交易市场风险预警系统,包括数据采集模块、数据预处理模块、风险评估模块、模型更新模块和预警模块;所述数据采集模块连接所述数据预处理模块,所述数据预处理模块连接所述风险评估模块,所述风险评估模块连接所述预警模块,所述模型更新模块连接所述风险评估模块;The carbon trading market risk early warning system based on big data includes a data collection module, a data preprocessing module, a risk assessment module, a model update module and an early warning module; the data collection module is connected to the data preprocessing module, and the data preprocessing module The module is connected to the risk assessment module, the risk assessment module is connected to the early warning module, and the model update module is connected to the risk assessment module;
所述数据采集模块用于采集第一交易数据;The data collection module is used to collect the first transaction data;
所述数据预处理模块用于对所述第一交易数据进行数据清洗,得到第二交易数据;The data preprocessing module is used to perform data cleaning on the first transaction data to obtain the second transaction data;
所述风险评估模块包括风险评估模型,所述风险评估模型用于对所述第二交易数据进行风险评估,得到风险评估结果;The risk assessment module includes a risk assessment model, and the risk assessment model is used to perform risk assessment on the second transaction data to obtain risk assessment results;
所述模型更新模块用于对所述风险评估模型进行更新优化;The model update module is used to update and optimize the risk assessment model;
所述预警模块中设有可调节的风险阈值,当所述风险评估结果超过所述风险阈值时,则向用户发出预警,反之则不发出预警。The early warning module is provided with an adjustable risk threshold. When the risk assessment result exceeds the risk threshold, an early warning is issued to the user; otherwise, no early warning is issued.
优选的,在所述数据采集模块中,所述第一交易数据包括碳交易市场中的交易数据。Preferably, in the data collection module, the first transaction data includes transaction data in the carbon trading market.
优选的,在所述数据预处理模块中,所述数据清洗包括处理缺失值、处理异常值和处理重复值。Preferably, in the data preprocessing module, the data cleaning includes processing missing values, processing outliers and processing duplicate values.
优选的,在所述风险评估模块中,所述风险评估模型包括随机森林模型和引入残差连接的长短期记忆神经网络模型,其中,所述随机森林模型对所述第二交易数据进行预测并得到第一预测结果,所述长短期记忆神经网络模型对第二交易数据进行预测并得到第二预测结果,所述第一预测结果和第二预测结果进行加权平均求和得到风险评估结果。Preferably, in the risk assessment module, the risk assessment model includes a random forest model and a long short-term memory neural network model that introduces residual connections, wherein the random forest model predicts the second transaction data and A first prediction result is obtained. The long short-term memory neural network model predicts the second transaction data and obtains a second prediction result. The first prediction result and the second prediction result are weighted and averaged to obtain a risk assessment result.
优选的,所述随机森林模型对所述第二交易数据进行预测的过程如下:Preferably, the process of predicting the second transaction data by the random forest model is as follows:
步骤A1:构建决策树模型,具体是,将第二交易数据划分为训练集和测试集,从训练集中有放回地随机抽取一定数量的样本,形成多个随机子集,对每个随机子集的特征进行随机选择,得到多个特征子集,对特征子集采用分类与回归树算法构建决策树模型,每个特征子集对应一个独立的决策树模型;Step A1: Construct a decision tree model. Specifically, divide the second transaction data into a training set and a test set, and randomly select a certain number of samples from the training set with replacement to form multiple random subsets. Randomly select the features of the set to obtain multiple feature subsets, and use classification and regression tree algorithms to construct a decision tree model for the feature subsets. Each feature subset corresponds to an independent decision tree model;
步骤A2:构建随机森林模型,具体是,将构建的多个决策树模型组合,得到随机森林模型,将测试集中的样本输入每个决策树模型,得到决策树模型预测结果,所有决策树模型预测结果的平均值即为第一预测结果。Step A2: Construct a random forest model. Specifically, combine multiple constructed decision tree models to obtain a random forest model. Enter the samples in the test set into each decision tree model to obtain the decision tree model prediction results. All decision tree model predictions The average of the results is the first prediction result.
优选的,在步骤A2中,还包括对随机森林模型进行优化的过程,具体是:计算均方根误差作为优化指标,基于所述优化指标调整决策树模型的深度和最小样本数,或者是调整随机森林模型中决策树模型的数量和特征子集的大小;均方根误差表达式如下:Preferably, step A2 also includes a process of optimizing the random forest model, specifically: calculating the root mean square error as an optimization index, adjusting the depth and minimum number of samples of the decision tree model based on the optimization index, or adjusting The number of decision tree models and the size of feature subsets in the random forest model; the root mean square error expression is as follows:
其中,RMSE表示均方根误差,N表示样本个数,ypred表示样本的预测值,ytrue表示样本的真实值。Among them, RMSE represents the root mean square error, N represents the number of samples, y pred represents the predicted value of the sample, and y true represents the true value of the sample.
优选的,所述长短期记忆神经网络模型对第二交易数据进行预测的过程如下:Preferably, the process of predicting the second transaction data by the long short-term memory neural network model is as follows:
步骤B1:构建长短期记忆神经网络模型,具体是,将第二交易数据划分为训练集、验证集和测试集,定义长短期记忆神经网络模型的输入层、隐藏层、输入层节点、隐藏层节点和激活函数,并在隐藏层之间添加残差连接;采用均方差作为损失函数,采用反向传播算法和ADam优化算法对所述长短期记忆神经网络模型进行训练,得到训练模型;Step B1: Construct a long-short-term memory neural network model. Specifically, divide the second transaction data into a training set, a verification set, and a test set, and define the input layer, hidden layer, input layer node, and hidden layer of the long-short-term memory neural network model. nodes and activation functions, and add residual connections between hidden layers; use the mean square error as the loss function, use the back propagation algorithm and the ADam optimization algorithm to train the long short-term memory neural network model to obtain a training model;
步骤B2:模型验证,具体是,将验证集输入到训练模型中,得到预测值,根据预测值和真实标签计算评估指标;Step B2: Model verification, specifically, input the verification set into the training model, obtain the predicted value, and calculate the evaluation index based on the predicted value and the real label;
步骤B3:模型调优,具体是,根据评估指标对训练模型进行调优,所述调优包括调整训练模型的输入层、隐藏层、输入层节点和隐藏层节点,使用交叉验证的方式选择输入层、隐藏层、输入层节点和隐藏层节点的最佳组合,得到预测模型;Step B3: Model tuning, specifically, tuning the training model according to the evaluation indicators. The tuning includes adjusting the input layer, hidden layer, input layer nodes and hidden layer nodes of the training model, and selecting inputs using cross-validation. The best combination of layer, hidden layer, input layer node and hidden layer node is used to obtain the prediction model;
步骤B4:数据预测,具体是,将测试集输入到预测模型中,通过预测模型的前向传播过程计算得到第二预测结果。Step B4: Data prediction, specifically, input the test set into the prediction model, and calculate the second prediction result through the forward propagation process of the prediction model.
优选的,在所述预警模块中,所述预警模块持续监控用户的反馈,如果用户在接收到预警后没有采取任何措施,则再次对用户发出预警。Preferably, in the early warning module, the early warning module continuously monitors user feedback, and if the user does not take any measures after receiving the early warning, it will issue another early warning to the user.
优选的,所述碳交易市场风险预警系统还包括数据可视化模块,用于向用户实时展示风险评估结果。Preferably, the carbon trading market risk early warning system also includes a data visualization module for displaying risk assessment results to users in real time.
另外,本发明还公开了基于大数据的碳交易市场风险预警方法,所述方法应用如上述的碳交易市场风险预警系统实现碳交易市场风险预警方法,所述方法的步骤如下:In addition, the present invention also discloses a carbon trading market risk early warning method based on big data. The method applies the carbon trading market risk early warning system as mentioned above to realize the carbon trading market risk early warning method. The steps of the method are as follows:
步骤S1:数据采集,具体是,数据采集模块采集第一交易数据,将所述第一交易数据传输到数据预处理模块;Step S1: Data collection, specifically, the data collection module collects the first transaction data and transmits the first transaction data to the data preprocessing module;
步骤S2:数据清洗,具体是,处理第一交易数据中的缺失值、异常值和重复值,然后将第一交易数据进行特征选取以及归一化处理,得到第二交易数据,将所述第二交易数据传输到风险评估模块;Step S2: Data cleaning, specifically, processing missing values, abnormal values and duplicate values in the first transaction data, and then performing feature selection and normalization processing on the first transaction data to obtain the second transaction data, and then converting the first transaction data into 2. Transaction data is transmitted to the risk assessment module;
步骤S3:风险评估,具体是,将第二交易数据输入到风险评估模块中的风险评估模型,得到风险评估结果,将所述风险评估结果传输到预警模块;Step S3: Risk assessment, specifically, input the second transaction data into the risk assessment model in the risk assessment module, obtain the risk assessment results, and transmit the risk assessment results to the early warning module;
步骤S4:风险预警,具体是,预警模块检测风险评估结果是否超过预先设置的风险阈值,当所述风险评估结果超过所述风险阈值时,则向用户发出预警,反之则不发出预警。Step S4: Risk early warning. Specifically, the early warning module detects whether the risk assessment result exceeds the preset risk threshold. When the risk assessment result exceeds the risk threshold, an early warning is issued to the user. Otherwise, no early warning is issued.
应用本发明的技术方案,具有以下有益效果:Applying the technical solution of the present invention has the following beneficial effects:
本发明公开了基于大数据的碳交易市场风险预警系统以及方法,所述系统包括数据采集模块、数据预处理模块、风险评估模块、模型更新模块和预警模块,数据采集模块采集碳交易市场中的第一交易数据,数据预处理模块对第一交易数据进行预处理得到第二交易数据,风险评估模块基于第二交易数据得到风险评估结果,实现了对碳交易市场风险的预测,预警模块可以根据风险评估结果向用户发出预警。The invention discloses a carbon trading market risk early warning system and method based on big data. The system includes a data acquisition module, a data preprocessing module, a risk assessment module, a model update module and an early warning module. The data acquisition module collects information in the carbon trading market. The first transaction data, the data preprocessing module preprocesses the first transaction data to obtain the second transaction data, the risk assessment module obtains the risk assessment results based on the second transaction data, and realizes the prediction of carbon trading market risks. The early warning module can Risk assessment results provide early warning to users.
本发明中的数据预处理模块提取交易数据中的特征,并通过信息增益方法对特征进行评估和筛选,选取具有较高信息增益的特征作为最重要的特征。The data preprocessing module in the present invention extracts features in transaction data, evaluates and filters the features through the information gain method, and selects features with higher information gain as the most important features.
本发明中的风险评估模块包括风险评估模型,所述风险评估模型采用了随机森林模型和引入残差连接的长短期记忆神经网络模型(ResLSTM模型)的结合,充分利用二者的优势,获得更全面的风险预测能力。所述随机森林模型擅长处理结构化数据和特征的重要性排序,可用于特征选择和预测模型的构建;所述随机森林模型通过集成多个决策树来进行预测,对噪声和异常值具有一定的鲁棒性,并能处理多个特征之间的复杂关系,包括非线性关系和交互效应,解决了这对于碳交易市场中风险预估受到多个因素影响的问题。另一方面,本发明还引入了残差连接的LSTM模型(ResLSTM模型),以处理时序数据和捕捉序列中的长期依赖关系,在碳交易市场中,时间序列数据具有重要意义,市场中的风险和趋势往往存在时间上的相关性。本发明通过引入残差连接,可以有效缓解梯度消失问题,使网络能够更好地捕捉长期依赖关系,同时增强模型的表达能力、提高鲁棒性和泛化能力,以及促进模型的收敛和训练效率。本发明通过融合随机森林模型和ResLSTM模型,可以更全面地捕捉碳交易市场的特征和趋势,更好地应对不确定性和数据的多样性,提高风险预测的准确性和稳定性。The risk assessment module in the present invention includes a risk assessment model. The risk assessment model adopts a combination of a random forest model and a long short-term memory neural network model (ResLSTM model) that introduces residual connections, making full use of the advantages of both to obtain better results. Comprehensive risk prediction capabilities. The random forest model is good at processing structured data and the importance ranking of features, and can be used for feature selection and construction of prediction models; the random forest model makes predictions by integrating multiple decision trees, and has certain immunity to noise and outliers. It is robust and can handle complex relationships between multiple features, including non-linear relationships and interactive effects, which solves the problem that risk estimation in the carbon trading market is affected by multiple factors. On the other hand, the present invention also introduces the residual connected LSTM model (ResLSTM model) to process time series data and capture long-term dependencies in the sequence. In the carbon trading market, time series data is of great significance and risks in the market There is often a temporal correlation with trends. By introducing residual connections, the present invention can effectively alleviate the problem of gradient disappearance, enable the network to better capture long-term dependencies, while enhancing the expressive ability of the model, improving robustness and generalization capabilities, and promoting the convergence and training efficiency of the model. . By integrating the random forest model and the ResLSTM model, the present invention can more comprehensively capture the characteristics and trends of the carbon trading market, better cope with uncertainty and data diversity, and improve the accuracy and stability of risk prediction.
除了上面所描述的目的、特征和优点之外,本发明还有其它的目的、特征和优点。下面将参照图,对本发明作进一步详细的说明。In addition to the objects, features and advantages described above, the present invention has other objects, features and advantages. The present invention will be described in further detail below with reference to the drawings.
附图说明Description of drawings
构成本申请的一部分的附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings forming a part of this application are used to provide a further understanding of the present invention. The illustrative embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the attached picture:
图1是本发明优选实施例中碳交易市场风险预警系统的系统框图;Figure 1 is a system block diagram of the carbon trading market risk early warning system in the preferred embodiment of the present invention;
图2是本发明优选实施例中LSTM模型的单元结构流程图;Figure 2 is a unit structure flow chart of the LSTM model in the preferred embodiment of the present invention;
图3是本发明优选实施例中碳交易市场风险预警方法的步骤流程图。Figure 3 is a step flow chart of the carbon trading market risk early warning method in the preferred embodiment of the present invention.
具体实施方式Detailed ways
以下结合附图对本发明的实施例进行详细说明,但是本发明可以根据权利要求限定和覆盖的多种不同方式实施。The embodiments of the present invention are described in detail below with reference to the accompanying drawings, but the present invention can be implemented in many different ways as defined and covered by the claims.
实施例:Example:
参见图1,基于大数据的碳交易市场风险预警系统,包括数据采集模块、数据预处理模块、风险评估模块、模型更新模块、预警模块和数据可视化模块;所述数据采集模块连接所述数据预处理模块,所述数据预处理模块连接所述风险评估模块,所述风险评估模块连接所述预警模块,所述模型更新模块连接所述风险评估模块;Referring to Figure 1, the carbon trading market risk early warning system based on big data includes a data collection module, a data preprocessing module, a risk assessment module, a model update module, an early warning module and a data visualization module; the data collection module is connected to the data preprocessing module. A processing module, the data preprocessing module is connected to the risk assessment module, the risk assessment module is connected to the early warning module, and the model update module is connected to the risk assessment module;
所述数据采集模块用于采集第一交易数据;所述第一交易数据包括碳交易市场中的交易数据。另外,本实施中的交易数据可以是碳排放数据、碳交易价格数据或者行业数据。The data collection module is used to collect first transaction data; the first transaction data includes transaction data in the carbon trading market. In addition, the transaction data in this implementation can be carbon emission data, carbon trading price data or industry data.
所述数据预处理模块用于对所述第一交易数据进行数据清洗,得到第二交易数据;具体而言,所述数据清洗包括处理缺失值、处理异常值和处理重复值;本实施例中的数据清洗可以检查第一交易数据的完整性和准确性,排除不符合要求的数据。另外,所述数据预处理模块还可以将数据的格式统一,有利于后续进行风险评估。数据清洗后,所述数据预处理模块根据碳交易市场风险预测的目标,提取与该目标相关的特征变量,并使用信息增益方法对所述特征变量进行评估和筛选。The data preprocessing module is used to perform data cleaning on the first transaction data to obtain the second transaction data; specifically, the data cleaning includes processing missing values, processing abnormal values and processing repeated values; in this embodiment Data cleaning can check the completeness and accuracy of the first transaction data and exclude data that does not meet the requirements. In addition, the data preprocessing module can also unify the format of the data, which is beneficial to subsequent risk assessment. After data cleaning, the data preprocessing module extracts characteristic variables related to the goal of risk prediction in the carbon trading market and uses the information gain method to evaluate and screen the characteristic variables.
所述风险评估模块包括风险评估模型,所述风险评估模型用于对所述第二交易数据进行风险评估,得到风险评估结果;所述风险评估模型包括随机森林模型和引入残差连接的长短期记忆神经网络模型,其中,所述随机森林模型对所述第二交易数据进行预测并得到第一预测结果,所述长短期记忆神经网络模型对第二交易数据进行预测并得到第二预测结果,所述第一预测结果和第二预测结果进行加权平均求和得到风险评估结果。The risk assessment module includes a risk assessment model, which is used to perform risk assessment on the second transaction data to obtain risk assessment results; the risk assessment model includes a random forest model and a long- and short-term model that introduces residual connections. A memory neural network model, wherein the random forest model predicts the second transaction data and obtains a first prediction result, and the long and short-term memory neural network model predicts the second transaction data and obtains a second prediction result, The first prediction result and the second prediction result are weighted and averaged to obtain a risk assessment result.
所述模型更新模块用于对所述风险评估模型进行更新优化;本实施例通过模型更新模块对风险评估模型进行更新优化,以适应新的数据和业务需求,提升模型的预测性能和泛化能力,解决了随着时间的推移导致原有的模型失效的问题,保证模型的可用性和准确性。The model update module is used to update and optimize the risk assessment model; in this embodiment, the risk assessment model is updated and optimized through the model update module to adapt to new data and business needs and improve the prediction performance and generalization ability of the model. , which solves the problem of original model failure over time and ensures the availability and accuracy of the model.
所述预警模块中设有可调节的风险阈值,当所述风险评估结果超过所述风险阈值时,则向用户发出预警,反之则不发出预警。The early warning module is provided with an adjustable risk threshold. When the risk assessment result exceeds the risk threshold, an early warning is issued to the user; otherwise, no early warning is issued.
所述数据可视化模块,用于向用户实时展示风险评估结果。The data visualization module is used to display risk assessment results to users in real time.
进一步地,本实施例中所述随机森林模型对所述第二交易数据进行预测的过程如下:Further, the process of predicting the second transaction data by the random forest model in this embodiment is as follows:
步骤A1:构建决策树模型,具体是,将第二交易数据划分为训练集和测试集,从训练集中有放回地随机抽取一定数量的样本,形成多个随机子集,对每个随机子集的特征进行随机选择,得到多个特征子集,对特征子集采用分类与回归树算法构建决策树模型,每个特征子集对应一个独立的决策树模型;Step A1: Construct a decision tree model. Specifically, divide the second transaction data into a training set and a test set, and randomly select a certain number of samples from the training set with replacement to form multiple random subsets. Randomly select the features of the set to obtain multiple feature subsets, and use classification and regression tree algorithms to construct a decision tree model for the feature subsets. Each feature subset corresponds to an independent decision tree model;
步骤A2:构建随机森林模型,具体是,将构建的多个决策树模型组合,得到随机森林模型,将测试集中的样本输入每个决策树模型,得到决策树模型预测结果,所有决策树模型预测结果的平均值即为第一预测结果。Step A2: Construct a random forest model. Specifically, combine multiple constructed decision tree models to obtain a random forest model. Enter the samples in the test set into each decision tree model to obtain the decision tree model prediction results. All decision tree model predictions The average of the results is the first prediction result.
进一步地,构建决策树模型的具体步骤如下:Further, the specific steps to build a decision tree model are as follows:
第一步:根据第二交易数据获取数据集,根据数据集的特征和标签选择初始节点;Step 1: Obtain the data set based on the second transaction data, and select the initial node based on the characteristics and labels of the data set;
第二步:对于每个节点,选择一个特征进行划分,遍历所有特征,对于每个特征,遍历该特征的所有可能取值,将数据集划分为子集;Step 2: For each node, select a feature to divide, traverse all features, and for each feature, traverse all possible values of the feature, and divide the data set into subsets;
第三步:计算每个子集的不纯度,分类与回归树算法(CART算法)使用平方误差作为衡量指标,平方误差定义如下:Step 3: Calculate the impurity of each subset. The classification and regression tree algorithm (CART algorithm) uses square error as a measurement indicator. The square error is defined as follows:
其中,MSE表示平方误差,N为样本个数,yi是样本i的真实值,是样本i的预测值。平方误差越小,说明预测结果越准确。Among them, MSE represents the squared error, N is the number of samples, yi is the true value of sample i, is the predicted value of sample i. The smaller the square error, the more accurate the prediction result is.
第四步:选择最小的平方误差,确定最佳的划分特征和划分点;Step 4: Select the smallest square error and determine the best dividing features and dividing points;
第五步,如果划分后的子集满足停止条件,则将该节点标记为叶节点,表示决策树模型的一个回归结果;如果划分后的子集不满足停止条件,则将该节点标记为内部节点,并递归地对每个子集重复第二步到第四步,直到划分后的子集满足停止条件为止;所述停止条件为预先设置的决策树深度和样本数量;Step 5: If the divided subset meets the stopping condition, mark the node as a leaf node, which represents a regression result of the decision tree model; if the divided subset does not meet the stopping condition, mark the node as internal node, and recursively repeat steps two to four for each subset until the divided subset meets the stopping condition; the stopping condition is the preset decision tree depth and number of samples;
第六步,构建完整的决策树模型后,通过剪枝操作(dropout)剪掉一些节点或者合并一些叶节点对决策树模型进行优化,提高决策树模型的复杂度和泛化能力。Step 6: After constructing a complete decision tree model, use pruning operation (dropout) to cut off some nodes or merge some leaf nodes to optimize the decision tree model to improve the complexity and generalization ability of the decision tree model.
进一步地,在步骤A2中,还包括对随机森林模型进行优化的过程,具体是:计算均方根误差作为优化指标,基于所述优化指标调整决策树模型的深度和最小样本数,或者是调整随机森林模型中决策树模型的数量和特征子集的大小;均方根误差表达式如下:Further, in step A2, the process of optimizing the random forest model is also included, specifically: calculating the root mean square error as an optimization index, adjusting the depth and minimum number of samples of the decision tree model based on the optimization index, or adjusting The number of decision tree models and the size of feature subsets in the random forest model; the root mean square error expression is as follows:
其中,RMSE表示均方根误差,N表示样本个数,ypred表示样本的预测值,ytrue表示样本的真实值。Among them, RMSE represents the root mean square error, N represents the number of samples, y pred represents the predicted value of the sample, and y true represents the true value of the sample.
进一步地,本实施例中所述长短期记忆神经网络(LSTM)模型对第二交易数据进行预测的过程如下:Further, the process of predicting the second transaction data by the long short-term memory neural network (LSTM) model described in this embodiment is as follows:
步骤B1:构建长短期记忆神经网络模型,具体是,将第二交易数据划分为训练集、验证集和测试集,定义长短期记忆神经网络模型的输入层、隐藏层、输入层节点、隐藏层节点和激活函数,并在隐藏层之间添加残差连接;采用均方差作为损失函数,采用反向传播算法和ADam优化算法对所述长短期记忆神经网络模型进行训练,得到训练模型。Step B1: Construct a long-short-term memory neural network model. Specifically, divide the second transaction data into a training set, a verification set, and a test set, and define the input layer, hidden layer, input layer node, and hidden layer of the long-short-term memory neural network model. nodes and activation functions, and add residual connections between hidden layers; use the mean square error as the loss function, and use the back propagation algorithm and ADam optimization algorithm to train the long short-term memory neural network model to obtain a training model.
所述残差连接具体是,将前一层的输出和当前层的输出逐元素相加,得到最终的残差连接输出,具体计算公式为:Specifically, the residual connection is to add the output of the previous layer and the output of the current layer element by element to obtain the final residual connection output. The specific calculation formula is:
yres=yprev+ycurr;y res = y prev + y curr ;
其中,yres表示残差连接输出,yprev表示前一层的输出,ycurr表示当前层的输出。Among them, y res represents the residual connection output, y prev represents the output of the previous layer, and y curr represents the output of the current layer.
进一步地,所述损失函数的具体计算公式为:Further, the specific calculation formula of the loss function is:
其中,Loss表示损失,N为样本数量,ytrue,i为样本真实值,ypred,i为样本预测值。Among them, Loss represents the loss, N is the number of samples, y true,i is the true value of the sample, and y pred,i is the predicted value of the sample.
采用ADam(Adaptive moment estimation)优化算法对网络进行训练,其是一种自适应学习率的优化算法,结合了动量方法和自适应学习率的特性,通过计算梯度的一阶矩估计和二阶矩估计来动态调整每个参数的学习率。The network is trained using the ADam (Adaptive moment estimation) optimization algorithm. It is an adaptive learning rate optimization algorithm that combines the characteristics of the momentum method and the adaptive learning rate. It calculates the first-order moment estimate and the second-order moment of the gradient. Estimated to dynamically adjust the learning rate of each parameter.
ADam(Adaptive moment estimation)优化算法的具体算法公式如下:The specific algorithm formula of ADam (Adaptive moment estimation) optimization algorithm is as follows:
m和v分别表示一阶矩估计和二阶矩估计m and v represent the first-order moment estimate and the second-order moment estimate respectively.
β1和β2分别是一阶矩估计和二阶矩估计的指数衰减率β 1 and β 2 are the exponential decay rates of the first-order moment estimate and the second-order moment estimate respectively.
η表示学习率eta represents the learning rate
步骤B2:模型验证,具体是,将验证集输入到训练模型中,得到预测值,根据预测值和真实标签计算评估指标;Step B2: Model verification, specifically, input the verification set into the training model, obtain the predicted value, and calculate the evaluation index based on the predicted value and the real label;
步骤B3:模型调优,具体是,根据评估指标对训练模型进行调优,所述调优包括调整训练模型的输入层、隐藏层、输入层节点和隐藏层节点,使用交叉验证的方式选择输入层、隐藏层、输入层节点和隐藏层节点的最佳组合,得到预测模型;Step B3: Model tuning, specifically, tuning the training model according to the evaluation indicators. The tuning includes adjusting the input layer, hidden layer, input layer nodes and hidden layer nodes of the training model, and selecting inputs using cross-validation. The best combination of layer, hidden layer, input layer node and hidden layer node is used to obtain the prediction model;
步骤B4:数据预测,具体是,将测试集输入到预测模型中,通过预测模型的前向传播过程计算得到第二预测结果。Step B4: Data prediction, specifically, input the test set into the prediction model, and calculate the second prediction result through the forward propagation process of the prediction model.
进一步地,如图2所示的LSTM模型的单元结构流程图:Further, the unit structure flow chart of the LSTM model is shown in Figure 2:
①遗忘门:接受一个长期记忆Ct-1(上一个单元模块传过来的输出)并决定要保留和遗忘Ct-1的部分。把t-1时的长期记忆输入Ct-1乘上遗忘因子ft。遗忘因子计算公式为:①Forgetting gate: accepts a long-term memory C t-1 (the output from the previous unit module) and decides to retain and forget the part of C t-1 . The long-term memory input at t-1 is C t-1 multiplied by the forgetting factor f t . The formula for calculating the forgetting factor is:
ft=σ(Wf·[ht-1,xt]+bf);f t =σ(W f ·[h t-1 ,x t ]+b f );
②输入门:决定当前t时刻输入信息有多少保存到单元状态Ct。计算公式为:②Input gate: determines how much of the input information at the current time t is saved to the unit state C t . The calculation formula is:
it=σ(Wi·[ht-1,xt]+bi)i t =σ(W i ·[h t-1 ,x t ]+b i )
以及t时刻的cell状态方程:And the cell state equation at time t:
③输出门:控制单元状态Ct有多少输出到LSTM当前输出值ht。计算公式为:③Output gate: How much of the control unit state C t is output to the current output value h t of the LSTM. The calculation formula is:
ot=σ(Wo·[ht-1,xt]+bo)o t =σ(W o ·[h t-1 ,x t ]+b o )
ht=ot⊙tanh(Ct);h t =o t ⊙tanh(C t );
其中,xt表示当前的输入数据,由上述交叉验证法划分的训练集组成。W0和b0分别表示权重矩阵和偏置项;σ(·)表示sigmoid函数,tanh(·)表示双曲正切函数。Among them, x t represents the current input data, which consists of the training set divided by the above cross-validation method. W 0 and b 0 represent the weight matrix and bias term respectively; σ(·) represents the sigmoid function, and tanh(·) represents the hyperbolic tangent function.
优选的,在所述预警模块中,所述预警模块持续监控用户的反馈,如果用户在接收到预警后没有采取任何措施,则再次对用户发出预警。Preferably, in the early warning module, the early warning module continuously monitors user feedback, and if the user does not take any measures after receiving the early warning, it will issue another early warning to the user.
另外,如图3所示,本实施例还公开了基于大数据的碳交易市场风险预警方法,所述方法应用如上述的碳交易市场风险预警系统实现碳交易市场风险预警方法,所述方法的步骤如下:In addition, as shown in Figure 3, this embodiment also discloses a carbon trading market risk early warning method based on big data. The method uses the above carbon trading market risk early warning system to implement the carbon trading market risk early warning method. The method has Proceed as follows:
步骤S1:数据采集,具体是,数据采集模块采集第一交易数据,将所述第一交易数据传输到数据预处理模块;Step S1: Data collection, specifically, the data collection module collects the first transaction data and transmits the first transaction data to the data preprocessing module;
步骤S2:数据清洗,具体是,处理第一交易数据中的缺失值、异常值和重复值,然后将第一交易数据进行特征选取以及归一化处理,得到第二交易数据,将所述第二交易数据传输到风险评估模块;Step S2: Data cleaning, specifically, processing missing values, abnormal values and duplicate values in the first transaction data, and then performing feature selection and normalization processing on the first transaction data to obtain the second transaction data, and then converting the first transaction data into 2. Transaction data is transmitted to the risk assessment module;
步骤S3:风险评估,具体是,将第二交易数据输入到风险评估模块中的风险评估模型,得到风险评估结果,将所述风险评估结果传输到预警模块;Step S3: Risk assessment, specifically, input the second transaction data into the risk assessment model in the risk assessment module, obtain the risk assessment results, and transmit the risk assessment results to the early warning module;
步骤S4:风险预警,具体是,预警模块检测风险评估结果是否超过预先设置的风险阈值,当所述风险评估结果超过所述风险阈值时,则向用户发出预警,反之则不发出预警。Step S4: Risk early warning. Specifically, the early warning module detects whether the risk assessment result exceeds the preset risk threshold. When the risk assessment result exceeds the risk threshold, an early warning is issued to the user. Otherwise, no early warning is issued.
进一步地,所述特征选取具体是,根据碳交易市场风险预测的目标,选择与风险相关的特征变量,然后采用信息增益方法对特征进行评估和筛选,确定对于预测碳交易市场风险最具有信息量的特征。按照信息增益的大小对特征进行排序,选取具有较高信息增益的特征作为最重要的特征,表达式如下:Further, the feature selection is specifically based on the goal of risk prediction in the carbon trading market, selecting risk-related feature variables, and then using the information gain method to evaluate and screen the features to determine the most informative variables for predicting carbon trading market risks. Characteristics. Sort the features according to the size of the information gain, and select the features with higher information gain as the most important features. The expression is as follows:
G(D,A)=H(D)-H(D|A)G(D,A)=H(D)-H(D|A)
其中,D为数据集,样本容量为|D|,K为分类个数,|Ck|表示类别Ck的样本个数;根据特征A的取值,将D划分为n个子集D1、D2、……、Dn。|Di|是样本Di的样本个数;G(D,A)为信息增益,H(D)为数据集D的经验熵;H(D|A)为特征A对数据集D的经验条件熵。Among them, D is the data set, the sample capacity is |D|, K is the number of categories, |C k | represents the number of samples in category C k ; according to the value of feature A, D is divided into n subsets D 1 , D2 ,..., Dn . |D i | is the number of samples of sample D i ; G(D,A) is the information gain, H(D) is the empirical entropy of data set D; H(D|A) is the experience of feature A on data set D Conditional entropy.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310741848.9A CN116777213B (en) | 2023-06-21 | 2023-06-21 | Carbon transaction market risk early warning system and method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310741848.9A CN116777213B (en) | 2023-06-21 | 2023-06-21 | Carbon transaction market risk early warning system and method based on big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116777213A true CN116777213A (en) | 2023-09-19 |
CN116777213B CN116777213B (en) | 2024-11-08 |
Family
ID=88012908
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310741848.9A Active CN116777213B (en) | 2023-06-21 | 2023-06-21 | Carbon transaction market risk early warning system and method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116777213B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117455245A (en) * | 2023-12-22 | 2024-01-26 | 赛飞特工程技术集团有限公司 | Intelligent risk assessment system for enterprise safety production |
CN117689219A (en) * | 2024-02-04 | 2024-03-12 | 江西科技学院 | A sports equipment safety assessment system based on machine learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929926A (en) * | 2019-11-18 | 2020-03-27 | 西北工业大学 | Short-term explosion passenger flow prediction method based on long and short-term memory network and random forest |
AU2020100709A4 (en) * | 2020-05-05 | 2020-06-11 | Bao, Yuhang Mr | A method of prediction model based on random forest algorithm |
CN113159615A (en) * | 2021-05-10 | 2021-07-23 | 麦荣章 | Intelligent information security risk measuring system and method for industrial control system |
CN113505983A (en) * | 2021-07-07 | 2021-10-15 | 广东电网有限责任公司 | Energy industry chain monitoring and early warning system |
-
2023
- 2023-06-21 CN CN202310741848.9A patent/CN116777213B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929926A (en) * | 2019-11-18 | 2020-03-27 | 西北工业大学 | Short-term explosion passenger flow prediction method based on long and short-term memory network and random forest |
AU2020100709A4 (en) * | 2020-05-05 | 2020-06-11 | Bao, Yuhang Mr | A method of prediction model based on random forest algorithm |
CN113159615A (en) * | 2021-05-10 | 2021-07-23 | 麦荣章 | Intelligent information security risk measuring system and method for industrial control system |
CN113505983A (en) * | 2021-07-07 | 2021-10-15 | 广东电网有限责任公司 | Energy industry chain monitoring and early warning system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117455245A (en) * | 2023-12-22 | 2024-01-26 | 赛飞特工程技术集团有限公司 | Intelligent risk assessment system for enterprise safety production |
CN117689219A (en) * | 2024-02-04 | 2024-03-12 | 江西科技学院 | A sports equipment safety assessment system based on machine learning |
Also Published As
Publication number | Publication date |
---|---|
CN116777213B (en) | 2024-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111124840B (en) | Method and device for predicting alarm in business operation and maintenance and electronic equipment | |
Yan et al. | Improved adaptive genetic algorithm for the vehicle Insurance Fraud Identification Model based on a BP Neural Network | |
CN116777213A (en) | Carbon trading market risk early warning system and method based on big data | |
Bin et al. | Regression model for appraisal of real estate using recurrent neural network and boosting tree | |
CN107609147B (en) | Method and system for automatically extracting features from log stream | |
CN109214863B (en) | A method for predicting urban housing demand based on express delivery data | |
CN118378912B (en) | Emergency scene intelligent analysis and decision support method based on AI large model | |
Azzouz et al. | Steady state IBEA assisted by MLP neural networks for expensive multi-objective optimization problems | |
CN112651534B (en) | Method, device and storage medium for predicting resource supply chain demand | |
CN115249081A (en) | Object type prediction method and device, computer equipment and storage medium | |
CN115114128B (en) | Satellite health state evaluation system and satellite health state evaluation method | |
CN113469570A (en) | Information quality evaluation model construction method, device, equipment and storage medium | |
Gomes et al. | Mechanism for measuring system complexity applying sensitivity analysis | |
Zhang | Prediction of Purchase Volume of Cross‐Border e‐Commerce Platform Based on BP Neural Network | |
Abolghasemi et al. | How to effectively use machine learning models to predict the solutions for optimization problems: lessons from loss function | |
CN110458719A (en) | A power grid scheduling method and system for power grid enterprises | |
CN117971511B (en) | Collaborative visual simulation platform | |
Wu et al. | Symphony in the latent space: Provably integrating high-dimensional techniques with non-linear machine learning models | |
CN116776006B (en) | Customer portrait construction method and system for enterprise financing | |
CN118261637A (en) | Agricultural product supply forecasting system and method based on online trading platform | |
CN117873837A (en) | Analysis method for capacity depletion trend of storage device | |
CN117196384A (en) | Efficiency evaluation method and system for information system | |
CN109785137A (en) | A kind of method and apparatus that prediction user opens credit accounts | |
CN116611911A (en) | Credit risk prediction method and device based on support vector machine | |
CN114298472A (en) | Evaluation method and system for portraits of upstream and downstream enterprises in digital factories |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |