CN116777213A - Carbon trading market risk early warning system and method based on big data - Google Patents

Carbon trading market risk early warning system and method based on big data Download PDF

Info

Publication number
CN116777213A
CN116777213A CN202310741848.9A CN202310741848A CN116777213A CN 116777213 A CN116777213 A CN 116777213A CN 202310741848 A CN202310741848 A CN 202310741848A CN 116777213 A CN116777213 A CN 116777213A
Authority
CN
China
Prior art keywords
model
module
early warning
risk assessment
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310741848.9A
Other languages
Chinese (zh)
Other versions
CN116777213B (en
Inventor
张琦
李汪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Technology
Original Assignee
Hunan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Technology filed Critical Hunan University of Technology
Priority to CN202310741848.9A priority Critical patent/CN116777213B/en
Publication of CN116777213A publication Critical patent/CN116777213A/en
Application granted granted Critical
Publication of CN116777213B publication Critical patent/CN116777213B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a carbon transaction market risk early warning system and method based on big data, wherein the system comprises a data acquisition module, a data preprocessing module, a risk assessment module, a model updating module and an early warning module; the data acquisition module is connected with the data preprocessing module, the data preprocessing module is connected with the risk assessment module, the risk assessment module is connected with the early warning module, and the model updating module is connected with the risk assessment module. The risk assessment module comprises a risk assessment model, wherein the risk assessment model adopts the combination of a random forest model and a long-period memory neural network model introducing residual connection, the advantages of the random forest model and the long-period memory neural network model are fully utilized, and the comprehensive risk prediction capability is obtained.

Description

基于大数据的碳交易市场风险预警系统以及方法Carbon trading market risk early warning system and method based on big data

技术领域Technical field

本发明涉及大数据分析技术领域,具体涉及基于大数据的碳交易市场风险预警系统以及方法。The invention relates to the technical field of big data analysis, and specifically to a carbon trading market risk early warning system and method based on big data.

背景技术Background technique

随着碳交易市场的兴起,越来越多的企业或者个人参与进入碳交易中,碳交易市场变得更加复杂,因此需要对碳交易市场进行必要的风险预警。With the rise of the carbon trading market, more and more companies or individuals are participating in carbon trading, and the carbon trading market has become more complex. Therefore, necessary risk warnings for the carbon trading market are needed.

目前的碳交易模型主要是以碳减排量、减排效益、清洁能源利用率等作为指标对结果进行衡量,现有的碳交易模型缺少对于碳交易市场未来的预测。The current carbon trading model mainly measures the results using carbon emission reductions, emission reduction benefits, clean energy utilization, etc. as indicators. The existing carbon trading model lacks predictions for the future of the carbon trading market.

综上所述,急需基于大数据的碳交易市场风险预警系统以及方法以解决现有技术中存在的问题。In summary, there is an urgent need for a carbon trading market risk early warning system and methods based on big data to solve the problems existing in the existing technology.

发明内容Contents of the invention

本发明目的在于提供基于大数据的碳交易市场风险预警系统以及方法,具体技术方案如下:The purpose of this invention is to provide a carbon trading market risk early warning system and method based on big data. The specific technical solutions are as follows:

基于大数据的碳交易市场风险预警系统,包括数据采集模块、数据预处理模块、风险评估模块、模型更新模块和预警模块;所述数据采集模块连接所述数据预处理模块,所述数据预处理模块连接所述风险评估模块,所述风险评估模块连接所述预警模块,所述模型更新模块连接所述风险评估模块;The carbon trading market risk early warning system based on big data includes a data collection module, a data preprocessing module, a risk assessment module, a model update module and an early warning module; the data collection module is connected to the data preprocessing module, and the data preprocessing module The module is connected to the risk assessment module, the risk assessment module is connected to the early warning module, and the model update module is connected to the risk assessment module;

所述数据采集模块用于采集第一交易数据;The data collection module is used to collect the first transaction data;

所述数据预处理模块用于对所述第一交易数据进行数据清洗,得到第二交易数据;The data preprocessing module is used to perform data cleaning on the first transaction data to obtain the second transaction data;

所述风险评估模块包括风险评估模型,所述风险评估模型用于对所述第二交易数据进行风险评估,得到风险评估结果;The risk assessment module includes a risk assessment model, and the risk assessment model is used to perform risk assessment on the second transaction data to obtain risk assessment results;

所述模型更新模块用于对所述风险评估模型进行更新优化;The model update module is used to update and optimize the risk assessment model;

所述预警模块中设有可调节的风险阈值,当所述风险评估结果超过所述风险阈值时,则向用户发出预警,反之则不发出预警。The early warning module is provided with an adjustable risk threshold. When the risk assessment result exceeds the risk threshold, an early warning is issued to the user; otherwise, no early warning is issued.

优选的,在所述数据采集模块中,所述第一交易数据包括碳交易市场中的交易数据。Preferably, in the data collection module, the first transaction data includes transaction data in the carbon trading market.

优选的,在所述数据预处理模块中,所述数据清洗包括处理缺失值、处理异常值和处理重复值。Preferably, in the data preprocessing module, the data cleaning includes processing missing values, processing outliers and processing duplicate values.

优选的,在所述风险评估模块中,所述风险评估模型包括随机森林模型和引入残差连接的长短期记忆神经网络模型,其中,所述随机森林模型对所述第二交易数据进行预测并得到第一预测结果,所述长短期记忆神经网络模型对第二交易数据进行预测并得到第二预测结果,所述第一预测结果和第二预测结果进行加权平均求和得到风险评估结果。Preferably, in the risk assessment module, the risk assessment model includes a random forest model and a long short-term memory neural network model that introduces residual connections, wherein the random forest model predicts the second transaction data and A first prediction result is obtained. The long short-term memory neural network model predicts the second transaction data and obtains a second prediction result. The first prediction result and the second prediction result are weighted and averaged to obtain a risk assessment result.

优选的,所述随机森林模型对所述第二交易数据进行预测的过程如下:Preferably, the process of predicting the second transaction data by the random forest model is as follows:

步骤A1:构建决策树模型,具体是,将第二交易数据划分为训练集和测试集,从训练集中有放回地随机抽取一定数量的样本,形成多个随机子集,对每个随机子集的特征进行随机选择,得到多个特征子集,对特征子集采用分类与回归树算法构建决策树模型,每个特征子集对应一个独立的决策树模型;Step A1: Construct a decision tree model. Specifically, divide the second transaction data into a training set and a test set, and randomly select a certain number of samples from the training set with replacement to form multiple random subsets. Randomly select the features of the set to obtain multiple feature subsets, and use classification and regression tree algorithms to construct a decision tree model for the feature subsets. Each feature subset corresponds to an independent decision tree model;

步骤A2:构建随机森林模型,具体是,将构建的多个决策树模型组合,得到随机森林模型,将测试集中的样本输入每个决策树模型,得到决策树模型预测结果,所有决策树模型预测结果的平均值即为第一预测结果。Step A2: Construct a random forest model. Specifically, combine multiple constructed decision tree models to obtain a random forest model. Enter the samples in the test set into each decision tree model to obtain the decision tree model prediction results. All decision tree model predictions The average of the results is the first prediction result.

优选的,在步骤A2中,还包括对随机森林模型进行优化的过程,具体是:计算均方根误差作为优化指标,基于所述优化指标调整决策树模型的深度和最小样本数,或者是调整随机森林模型中决策树模型的数量和特征子集的大小;均方根误差表达式如下:Preferably, step A2 also includes a process of optimizing the random forest model, specifically: calculating the root mean square error as an optimization index, adjusting the depth and minimum number of samples of the decision tree model based on the optimization index, or adjusting The number of decision tree models and the size of feature subsets in the random forest model; the root mean square error expression is as follows:

其中,RMSE表示均方根误差,N表示样本个数,ypred表示样本的预测值,ytrue表示样本的真实值。Among them, RMSE represents the root mean square error, N represents the number of samples, y pred represents the predicted value of the sample, and y true represents the true value of the sample.

优选的,所述长短期记忆神经网络模型对第二交易数据进行预测的过程如下:Preferably, the process of predicting the second transaction data by the long short-term memory neural network model is as follows:

步骤B1:构建长短期记忆神经网络模型,具体是,将第二交易数据划分为训练集、验证集和测试集,定义长短期记忆神经网络模型的输入层、隐藏层、输入层节点、隐藏层节点和激活函数,并在隐藏层之间添加残差连接;采用均方差作为损失函数,采用反向传播算法和ADam优化算法对所述长短期记忆神经网络模型进行训练,得到训练模型;Step B1: Construct a long-short-term memory neural network model. Specifically, divide the second transaction data into a training set, a verification set, and a test set, and define the input layer, hidden layer, input layer node, and hidden layer of the long-short-term memory neural network model. nodes and activation functions, and add residual connections between hidden layers; use the mean square error as the loss function, use the back propagation algorithm and the ADam optimization algorithm to train the long short-term memory neural network model to obtain a training model;

步骤B2:模型验证,具体是,将验证集输入到训练模型中,得到预测值,根据预测值和真实标签计算评估指标;Step B2: Model verification, specifically, input the verification set into the training model, obtain the predicted value, and calculate the evaluation index based on the predicted value and the real label;

步骤B3:模型调优,具体是,根据评估指标对训练模型进行调优,所述调优包括调整训练模型的输入层、隐藏层、输入层节点和隐藏层节点,使用交叉验证的方式选择输入层、隐藏层、输入层节点和隐藏层节点的最佳组合,得到预测模型;Step B3: Model tuning, specifically, tuning the training model according to the evaluation indicators. The tuning includes adjusting the input layer, hidden layer, input layer nodes and hidden layer nodes of the training model, and selecting inputs using cross-validation. The best combination of layer, hidden layer, input layer node and hidden layer node is used to obtain the prediction model;

步骤B4:数据预测,具体是,将测试集输入到预测模型中,通过预测模型的前向传播过程计算得到第二预测结果。Step B4: Data prediction, specifically, input the test set into the prediction model, and calculate the second prediction result through the forward propagation process of the prediction model.

优选的,在所述预警模块中,所述预警模块持续监控用户的反馈,如果用户在接收到预警后没有采取任何措施,则再次对用户发出预警。Preferably, in the early warning module, the early warning module continuously monitors user feedback, and if the user does not take any measures after receiving the early warning, it will issue another early warning to the user.

优选的,所述碳交易市场风险预警系统还包括数据可视化模块,用于向用户实时展示风险评估结果。Preferably, the carbon trading market risk early warning system also includes a data visualization module for displaying risk assessment results to users in real time.

另外,本发明还公开了基于大数据的碳交易市场风险预警方法,所述方法应用如上述的碳交易市场风险预警系统实现碳交易市场风险预警方法,所述方法的步骤如下:In addition, the present invention also discloses a carbon trading market risk early warning method based on big data. The method applies the carbon trading market risk early warning system as mentioned above to realize the carbon trading market risk early warning method. The steps of the method are as follows:

步骤S1:数据采集,具体是,数据采集模块采集第一交易数据,将所述第一交易数据传输到数据预处理模块;Step S1: Data collection, specifically, the data collection module collects the first transaction data and transmits the first transaction data to the data preprocessing module;

步骤S2:数据清洗,具体是,处理第一交易数据中的缺失值、异常值和重复值,然后将第一交易数据进行特征选取以及归一化处理,得到第二交易数据,将所述第二交易数据传输到风险评估模块;Step S2: Data cleaning, specifically, processing missing values, abnormal values and duplicate values in the first transaction data, and then performing feature selection and normalization processing on the first transaction data to obtain the second transaction data, and then converting the first transaction data into 2. Transaction data is transmitted to the risk assessment module;

步骤S3:风险评估,具体是,将第二交易数据输入到风险评估模块中的风险评估模型,得到风险评估结果,将所述风险评估结果传输到预警模块;Step S3: Risk assessment, specifically, input the second transaction data into the risk assessment model in the risk assessment module, obtain the risk assessment results, and transmit the risk assessment results to the early warning module;

步骤S4:风险预警,具体是,预警模块检测风险评估结果是否超过预先设置的风险阈值,当所述风险评估结果超过所述风险阈值时,则向用户发出预警,反之则不发出预警。Step S4: Risk early warning. Specifically, the early warning module detects whether the risk assessment result exceeds the preset risk threshold. When the risk assessment result exceeds the risk threshold, an early warning is issued to the user. Otherwise, no early warning is issued.

应用本发明的技术方案,具有以下有益效果:Applying the technical solution of the present invention has the following beneficial effects:

本发明公开了基于大数据的碳交易市场风险预警系统以及方法,所述系统包括数据采集模块、数据预处理模块、风险评估模块、模型更新模块和预警模块,数据采集模块采集碳交易市场中的第一交易数据,数据预处理模块对第一交易数据进行预处理得到第二交易数据,风险评估模块基于第二交易数据得到风险评估结果,实现了对碳交易市场风险的预测,预警模块可以根据风险评估结果向用户发出预警。The invention discloses a carbon trading market risk early warning system and method based on big data. The system includes a data acquisition module, a data preprocessing module, a risk assessment module, a model update module and an early warning module. The data acquisition module collects information in the carbon trading market. The first transaction data, the data preprocessing module preprocesses the first transaction data to obtain the second transaction data, the risk assessment module obtains the risk assessment results based on the second transaction data, and realizes the prediction of carbon trading market risks. The early warning module can Risk assessment results provide early warning to users.

本发明中的数据预处理模块提取交易数据中的特征,并通过信息增益方法对特征进行评估和筛选,选取具有较高信息增益的特征作为最重要的特征。The data preprocessing module in the present invention extracts features in transaction data, evaluates and filters the features through the information gain method, and selects features with higher information gain as the most important features.

本发明中的风险评估模块包括风险评估模型,所述风险评估模型采用了随机森林模型和引入残差连接的长短期记忆神经网络模型(ResLSTM模型)的结合,充分利用二者的优势,获得更全面的风险预测能力。所述随机森林模型擅长处理结构化数据和特征的重要性排序,可用于特征选择和预测模型的构建;所述随机森林模型通过集成多个决策树来进行预测,对噪声和异常值具有一定的鲁棒性,并能处理多个特征之间的复杂关系,包括非线性关系和交互效应,解决了这对于碳交易市场中风险预估受到多个因素影响的问题。另一方面,本发明还引入了残差连接的LSTM模型(ResLSTM模型),以处理时序数据和捕捉序列中的长期依赖关系,在碳交易市场中,时间序列数据具有重要意义,市场中的风险和趋势往往存在时间上的相关性。本发明通过引入残差连接,可以有效缓解梯度消失问题,使网络能够更好地捕捉长期依赖关系,同时增强模型的表达能力、提高鲁棒性和泛化能力,以及促进模型的收敛和训练效率。本发明通过融合随机森林模型和ResLSTM模型,可以更全面地捕捉碳交易市场的特征和趋势,更好地应对不确定性和数据的多样性,提高风险预测的准确性和稳定性。The risk assessment module in the present invention includes a risk assessment model. The risk assessment model adopts a combination of a random forest model and a long short-term memory neural network model (ResLSTM model) that introduces residual connections, making full use of the advantages of both to obtain better results. Comprehensive risk prediction capabilities. The random forest model is good at processing structured data and the importance ranking of features, and can be used for feature selection and construction of prediction models; the random forest model makes predictions by integrating multiple decision trees, and has certain immunity to noise and outliers. It is robust and can handle complex relationships between multiple features, including non-linear relationships and interactive effects, which solves the problem that risk estimation in the carbon trading market is affected by multiple factors. On the other hand, the present invention also introduces the residual connected LSTM model (ResLSTM model) to process time series data and capture long-term dependencies in the sequence. In the carbon trading market, time series data is of great significance and risks in the market There is often a temporal correlation with trends. By introducing residual connections, the present invention can effectively alleviate the problem of gradient disappearance, enable the network to better capture long-term dependencies, while enhancing the expressive ability of the model, improving robustness and generalization capabilities, and promoting the convergence and training efficiency of the model. . By integrating the random forest model and the ResLSTM model, the present invention can more comprehensively capture the characteristics and trends of the carbon trading market, better cope with uncertainty and data diversity, and improve the accuracy and stability of risk prediction.

除了上面所描述的目的、特征和优点之外,本发明还有其它的目的、特征和优点。下面将参照图,对本发明作进一步详细的说明。In addition to the objects, features and advantages described above, the present invention has other objects, features and advantages. The present invention will be described in further detail below with reference to the drawings.

附图说明Description of drawings

构成本申请的一部分的附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings forming a part of this application are used to provide a further understanding of the present invention. The illustrative embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the attached picture:

图1是本发明优选实施例中碳交易市场风险预警系统的系统框图;Figure 1 is a system block diagram of the carbon trading market risk early warning system in the preferred embodiment of the present invention;

图2是本发明优选实施例中LSTM模型的单元结构流程图;Figure 2 is a unit structure flow chart of the LSTM model in the preferred embodiment of the present invention;

图3是本发明优选实施例中碳交易市场风险预警方法的步骤流程图。Figure 3 is a step flow chart of the carbon trading market risk early warning method in the preferred embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的实施例进行详细说明,但是本发明可以根据权利要求限定和覆盖的多种不同方式实施。The embodiments of the present invention are described in detail below with reference to the accompanying drawings, but the present invention can be implemented in many different ways as defined and covered by the claims.

实施例:Example:

参见图1,基于大数据的碳交易市场风险预警系统,包括数据采集模块、数据预处理模块、风险评估模块、模型更新模块、预警模块和数据可视化模块;所述数据采集模块连接所述数据预处理模块,所述数据预处理模块连接所述风险评估模块,所述风险评估模块连接所述预警模块,所述模型更新模块连接所述风险评估模块;Referring to Figure 1, the carbon trading market risk early warning system based on big data includes a data collection module, a data preprocessing module, a risk assessment module, a model update module, an early warning module and a data visualization module; the data collection module is connected to the data preprocessing module. A processing module, the data preprocessing module is connected to the risk assessment module, the risk assessment module is connected to the early warning module, and the model update module is connected to the risk assessment module;

所述数据采集模块用于采集第一交易数据;所述第一交易数据包括碳交易市场中的交易数据。另外,本实施中的交易数据可以是碳排放数据、碳交易价格数据或者行业数据。The data collection module is used to collect first transaction data; the first transaction data includes transaction data in the carbon trading market. In addition, the transaction data in this implementation can be carbon emission data, carbon trading price data or industry data.

所述数据预处理模块用于对所述第一交易数据进行数据清洗,得到第二交易数据;具体而言,所述数据清洗包括处理缺失值、处理异常值和处理重复值;本实施例中的数据清洗可以检查第一交易数据的完整性和准确性,排除不符合要求的数据。另外,所述数据预处理模块还可以将数据的格式统一,有利于后续进行风险评估。数据清洗后,所述数据预处理模块根据碳交易市场风险预测的目标,提取与该目标相关的特征变量,并使用信息增益方法对所述特征变量进行评估和筛选。The data preprocessing module is used to perform data cleaning on the first transaction data to obtain the second transaction data; specifically, the data cleaning includes processing missing values, processing abnormal values and processing repeated values; in this embodiment Data cleaning can check the completeness and accuracy of the first transaction data and exclude data that does not meet the requirements. In addition, the data preprocessing module can also unify the format of the data, which is beneficial to subsequent risk assessment. After data cleaning, the data preprocessing module extracts characteristic variables related to the goal of risk prediction in the carbon trading market and uses the information gain method to evaluate and screen the characteristic variables.

所述风险评估模块包括风险评估模型,所述风险评估模型用于对所述第二交易数据进行风险评估,得到风险评估结果;所述风险评估模型包括随机森林模型和引入残差连接的长短期记忆神经网络模型,其中,所述随机森林模型对所述第二交易数据进行预测并得到第一预测结果,所述长短期记忆神经网络模型对第二交易数据进行预测并得到第二预测结果,所述第一预测结果和第二预测结果进行加权平均求和得到风险评估结果。The risk assessment module includes a risk assessment model, which is used to perform risk assessment on the second transaction data to obtain risk assessment results; the risk assessment model includes a random forest model and a long- and short-term model that introduces residual connections. A memory neural network model, wherein the random forest model predicts the second transaction data and obtains a first prediction result, and the long and short-term memory neural network model predicts the second transaction data and obtains a second prediction result, The first prediction result and the second prediction result are weighted and averaged to obtain a risk assessment result.

所述模型更新模块用于对所述风险评估模型进行更新优化;本实施例通过模型更新模块对风险评估模型进行更新优化,以适应新的数据和业务需求,提升模型的预测性能和泛化能力,解决了随着时间的推移导致原有的模型失效的问题,保证模型的可用性和准确性。The model update module is used to update and optimize the risk assessment model; in this embodiment, the risk assessment model is updated and optimized through the model update module to adapt to new data and business needs and improve the prediction performance and generalization ability of the model. , which solves the problem of original model failure over time and ensures the availability and accuracy of the model.

所述预警模块中设有可调节的风险阈值,当所述风险评估结果超过所述风险阈值时,则向用户发出预警,反之则不发出预警。The early warning module is provided with an adjustable risk threshold. When the risk assessment result exceeds the risk threshold, an early warning is issued to the user; otherwise, no early warning is issued.

所述数据可视化模块,用于向用户实时展示风险评估结果。The data visualization module is used to display risk assessment results to users in real time.

进一步地,本实施例中所述随机森林模型对所述第二交易数据进行预测的过程如下:Further, the process of predicting the second transaction data by the random forest model in this embodiment is as follows:

步骤A1:构建决策树模型,具体是,将第二交易数据划分为训练集和测试集,从训练集中有放回地随机抽取一定数量的样本,形成多个随机子集,对每个随机子集的特征进行随机选择,得到多个特征子集,对特征子集采用分类与回归树算法构建决策树模型,每个特征子集对应一个独立的决策树模型;Step A1: Construct a decision tree model. Specifically, divide the second transaction data into a training set and a test set, and randomly select a certain number of samples from the training set with replacement to form multiple random subsets. Randomly select the features of the set to obtain multiple feature subsets, and use classification and regression tree algorithms to construct a decision tree model for the feature subsets. Each feature subset corresponds to an independent decision tree model;

步骤A2:构建随机森林模型,具体是,将构建的多个决策树模型组合,得到随机森林模型,将测试集中的样本输入每个决策树模型,得到决策树模型预测结果,所有决策树模型预测结果的平均值即为第一预测结果。Step A2: Construct a random forest model. Specifically, combine multiple constructed decision tree models to obtain a random forest model. Enter the samples in the test set into each decision tree model to obtain the decision tree model prediction results. All decision tree model predictions The average of the results is the first prediction result.

进一步地,构建决策树模型的具体步骤如下:Further, the specific steps to build a decision tree model are as follows:

第一步:根据第二交易数据获取数据集,根据数据集的特征和标签选择初始节点;Step 1: Obtain the data set based on the second transaction data, and select the initial node based on the characteristics and labels of the data set;

第二步:对于每个节点,选择一个特征进行划分,遍历所有特征,对于每个特征,遍历该特征的所有可能取值,将数据集划分为子集;Step 2: For each node, select a feature to divide, traverse all features, and for each feature, traverse all possible values of the feature, and divide the data set into subsets;

第三步:计算每个子集的不纯度,分类与回归树算法(CART算法)使用平方误差作为衡量指标,平方误差定义如下:Step 3: Calculate the impurity of each subset. The classification and regression tree algorithm (CART algorithm) uses square error as a measurement indicator. The square error is defined as follows:

其中,MSE表示平方误差,N为样本个数,yi是样本i的真实值,是样本i的预测值。平方误差越小,说明预测结果越准确。Among them, MSE represents the squared error, N is the number of samples, yi is the true value of sample i, is the predicted value of sample i. The smaller the square error, the more accurate the prediction result is.

第四步:选择最小的平方误差,确定最佳的划分特征和划分点;Step 4: Select the smallest square error and determine the best dividing features and dividing points;

第五步,如果划分后的子集满足停止条件,则将该节点标记为叶节点,表示决策树模型的一个回归结果;如果划分后的子集不满足停止条件,则将该节点标记为内部节点,并递归地对每个子集重复第二步到第四步,直到划分后的子集满足停止条件为止;所述停止条件为预先设置的决策树深度和样本数量;Step 5: If the divided subset meets the stopping condition, mark the node as a leaf node, which represents a regression result of the decision tree model; if the divided subset does not meet the stopping condition, mark the node as internal node, and recursively repeat steps two to four for each subset until the divided subset meets the stopping condition; the stopping condition is the preset decision tree depth and number of samples;

第六步,构建完整的决策树模型后,通过剪枝操作(dropout)剪掉一些节点或者合并一些叶节点对决策树模型进行优化,提高决策树模型的复杂度和泛化能力。Step 6: After constructing a complete decision tree model, use pruning operation (dropout) to cut off some nodes or merge some leaf nodes to optimize the decision tree model to improve the complexity and generalization ability of the decision tree model.

进一步地,在步骤A2中,还包括对随机森林模型进行优化的过程,具体是:计算均方根误差作为优化指标,基于所述优化指标调整决策树模型的深度和最小样本数,或者是调整随机森林模型中决策树模型的数量和特征子集的大小;均方根误差表达式如下:Further, in step A2, the process of optimizing the random forest model is also included, specifically: calculating the root mean square error as an optimization index, adjusting the depth and minimum number of samples of the decision tree model based on the optimization index, or adjusting The number of decision tree models and the size of feature subsets in the random forest model; the root mean square error expression is as follows:

其中,RMSE表示均方根误差,N表示样本个数,ypred表示样本的预测值,ytrue表示样本的真实值。Among them, RMSE represents the root mean square error, N represents the number of samples, y pred represents the predicted value of the sample, and y true represents the true value of the sample.

进一步地,本实施例中所述长短期记忆神经网络(LSTM)模型对第二交易数据进行预测的过程如下:Further, the process of predicting the second transaction data by the long short-term memory neural network (LSTM) model described in this embodiment is as follows:

步骤B1:构建长短期记忆神经网络模型,具体是,将第二交易数据划分为训练集、验证集和测试集,定义长短期记忆神经网络模型的输入层、隐藏层、输入层节点、隐藏层节点和激活函数,并在隐藏层之间添加残差连接;采用均方差作为损失函数,采用反向传播算法和ADam优化算法对所述长短期记忆神经网络模型进行训练,得到训练模型。Step B1: Construct a long-short-term memory neural network model. Specifically, divide the second transaction data into a training set, a verification set, and a test set, and define the input layer, hidden layer, input layer node, and hidden layer of the long-short-term memory neural network model. nodes and activation functions, and add residual connections between hidden layers; use the mean square error as the loss function, and use the back propagation algorithm and ADam optimization algorithm to train the long short-term memory neural network model to obtain a training model.

所述残差连接具体是,将前一层的输出和当前层的输出逐元素相加,得到最终的残差连接输出,具体计算公式为:Specifically, the residual connection is to add the output of the previous layer and the output of the current layer element by element to obtain the final residual connection output. The specific calculation formula is:

yres=yprev+ycurry res = y prev + y curr ;

其中,yres表示残差连接输出,yprev表示前一层的输出,ycurr表示当前层的输出。Among them, y res represents the residual connection output, y prev represents the output of the previous layer, and y curr represents the output of the current layer.

进一步地,所述损失函数的具体计算公式为:Further, the specific calculation formula of the loss function is:

其中,Loss表示损失,N为样本数量,ytrue,i为样本真实值,ypred,i为样本预测值。Among them, Loss represents the loss, N is the number of samples, y true,i is the true value of the sample, and y pred,i is the predicted value of the sample.

采用ADam(Adaptive moment estimation)优化算法对网络进行训练,其是一种自适应学习率的优化算法,结合了动量方法和自适应学习率的特性,通过计算梯度的一阶矩估计和二阶矩估计来动态调整每个参数的学习率。The network is trained using the ADam (Adaptive moment estimation) optimization algorithm. It is an adaptive learning rate optimization algorithm that combines the characteristics of the momentum method and the adaptive learning rate. It calculates the first-order moment estimate and the second-order moment of the gradient. Estimated to dynamically adjust the learning rate of each parameter.

ADam(Adaptive moment estimation)优化算法的具体算法公式如下:The specific algorithm formula of ADam (Adaptive moment estimation) optimization algorithm is as follows:

m和v分别表示一阶矩估计和二阶矩估计m and v represent the first-order moment estimate and the second-order moment estimate respectively.

β1和β2分别是一阶矩估计和二阶矩估计的指数衰减率β 1 and β 2 are the exponential decay rates of the first-order moment estimate and the second-order moment estimate respectively.

η表示学习率eta represents the learning rate

步骤B2:模型验证,具体是,将验证集输入到训练模型中,得到预测值,根据预测值和真实标签计算评估指标;Step B2: Model verification, specifically, input the verification set into the training model, obtain the predicted value, and calculate the evaluation index based on the predicted value and the real label;

步骤B3:模型调优,具体是,根据评估指标对训练模型进行调优,所述调优包括调整训练模型的输入层、隐藏层、输入层节点和隐藏层节点,使用交叉验证的方式选择输入层、隐藏层、输入层节点和隐藏层节点的最佳组合,得到预测模型;Step B3: Model tuning, specifically, tuning the training model according to the evaluation indicators. The tuning includes adjusting the input layer, hidden layer, input layer nodes and hidden layer nodes of the training model, and selecting inputs using cross-validation. The best combination of layer, hidden layer, input layer node and hidden layer node is used to obtain the prediction model;

步骤B4:数据预测,具体是,将测试集输入到预测模型中,通过预测模型的前向传播过程计算得到第二预测结果。Step B4: Data prediction, specifically, input the test set into the prediction model, and calculate the second prediction result through the forward propagation process of the prediction model.

进一步地,如图2所示的LSTM模型的单元结构流程图:Further, the unit structure flow chart of the LSTM model is shown in Figure 2:

①遗忘门:接受一个长期记忆Ct-1(上一个单元模块传过来的输出)并决定要保留和遗忘Ct-1的部分。把t-1时的长期记忆输入Ct-1乘上遗忘因子ft。遗忘因子计算公式为:①Forgetting gate: accepts a long-term memory C t-1 (the output from the previous unit module) and decides to retain and forget the part of C t-1 . The long-term memory input at t-1 is C t-1 multiplied by the forgetting factor f t . The formula for calculating the forgetting factor is:

ft=σ(Wf·[ht-1,xt]+bf);f t =σ(W f ·[h t-1 ,x t ]+b f );

②输入门:决定当前t时刻输入信息有多少保存到单元状态Ct。计算公式为:②Input gate: determines how much of the input information at the current time t is saved to the unit state C t . The calculation formula is:

it=σ(Wi·[ht-1,xt]+bi)i t =σ(W i ·[h t-1 ,x t ]+b i )

以及t时刻的cell状态方程:And the cell state equation at time t:

③输出门:控制单元状态Ct有多少输出到LSTM当前输出值ht。计算公式为:③Output gate: How much of the control unit state C t is output to the current output value h t of the LSTM. The calculation formula is:

ot=σ(Wo·[ht-1,xt]+bo)o t =σ(W o ·[h t-1 ,x t ]+b o )

ht=ot⊙tanh(Ct);h t =o t ⊙tanh(C t );

其中,xt表示当前的输入数据,由上述交叉验证法划分的训练集组成。W0和b0分别表示权重矩阵和偏置项;σ(·)表示sigmoid函数,tanh(·)表示双曲正切函数。Among them, x t represents the current input data, which consists of the training set divided by the above cross-validation method. W 0 and b 0 represent the weight matrix and bias term respectively; σ(·) represents the sigmoid function, and tanh(·) represents the hyperbolic tangent function.

优选的,在所述预警模块中,所述预警模块持续监控用户的反馈,如果用户在接收到预警后没有采取任何措施,则再次对用户发出预警。Preferably, in the early warning module, the early warning module continuously monitors user feedback, and if the user does not take any measures after receiving the early warning, it will issue another early warning to the user.

另外,如图3所示,本实施例还公开了基于大数据的碳交易市场风险预警方法,所述方法应用如上述的碳交易市场风险预警系统实现碳交易市场风险预警方法,所述方法的步骤如下:In addition, as shown in Figure 3, this embodiment also discloses a carbon trading market risk early warning method based on big data. The method uses the above carbon trading market risk early warning system to implement the carbon trading market risk early warning method. The method has Proceed as follows:

步骤S1:数据采集,具体是,数据采集模块采集第一交易数据,将所述第一交易数据传输到数据预处理模块;Step S1: Data collection, specifically, the data collection module collects the first transaction data and transmits the first transaction data to the data preprocessing module;

步骤S2:数据清洗,具体是,处理第一交易数据中的缺失值、异常值和重复值,然后将第一交易数据进行特征选取以及归一化处理,得到第二交易数据,将所述第二交易数据传输到风险评估模块;Step S2: Data cleaning, specifically, processing missing values, abnormal values and duplicate values in the first transaction data, and then performing feature selection and normalization processing on the first transaction data to obtain the second transaction data, and then converting the first transaction data into 2. Transaction data is transmitted to the risk assessment module;

步骤S3:风险评估,具体是,将第二交易数据输入到风险评估模块中的风险评估模型,得到风险评估结果,将所述风险评估结果传输到预警模块;Step S3: Risk assessment, specifically, input the second transaction data into the risk assessment model in the risk assessment module, obtain the risk assessment results, and transmit the risk assessment results to the early warning module;

步骤S4:风险预警,具体是,预警模块检测风险评估结果是否超过预先设置的风险阈值,当所述风险评估结果超过所述风险阈值时,则向用户发出预警,反之则不发出预警。Step S4: Risk early warning. Specifically, the early warning module detects whether the risk assessment result exceeds the preset risk threshold. When the risk assessment result exceeds the risk threshold, an early warning is issued to the user. Otherwise, no early warning is issued.

进一步地,所述特征选取具体是,根据碳交易市场风险预测的目标,选择与风险相关的特征变量,然后采用信息增益方法对特征进行评估和筛选,确定对于预测碳交易市场风险最具有信息量的特征。按照信息增益的大小对特征进行排序,选取具有较高信息增益的特征作为最重要的特征,表达式如下:Further, the feature selection is specifically based on the goal of risk prediction in the carbon trading market, selecting risk-related feature variables, and then using the information gain method to evaluate and screen the features to determine the most informative variables for predicting carbon trading market risks. Characteristics. Sort the features according to the size of the information gain, and select the features with higher information gain as the most important features. The expression is as follows:

G(D,A)=H(D)-H(D|A)G(D,A)=H(D)-H(D|A)

其中,D为数据集,样本容量为|D|,K为分类个数,|Ck|表示类别Ck的样本个数;根据特征A的取值,将D划分为n个子集D1、D2、……、Dn。|Di|是样本Di的样本个数;G(D,A)为信息增益,H(D)为数据集D的经验熵;H(D|A)为特征A对数据集D的经验条件熵。Among them, D is the data set, the sample capacity is |D|, K is the number of categories, |C k | represents the number of samples in category C k ; according to the value of feature A, D is divided into n subsets D 1 , D2 ,..., Dn . |D i | is the number of samples of sample D i ; G(D,A) is the information gain, H(D) is the empirical entropy of data set D; H(D|A) is the experience of feature A on data set D Conditional entropy.

以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection scope of the present invention.

Claims (10)

1. The carbon transaction market risk early warning system based on big data is characterized by comprising a data acquisition module, a data preprocessing module, a risk assessment module, a model updating module and an early warning module; the data acquisition module is connected with the data preprocessing module, the data preprocessing module is connected with the risk assessment module, the risk assessment module is connected with the early warning module, and the model updating module is connected with the risk assessment module;
the data acquisition module is used for acquiring first transaction data;
the data preprocessing module is used for carrying out data cleaning on the first transaction data to obtain second transaction data;
the risk assessment module comprises a risk assessment model, and the risk assessment model is used for carrying out risk assessment on the second transaction data to obtain a risk assessment result;
the model updating module is used for updating and optimizing the risk assessment model;
and an adjustable risk threshold value is arranged in the early warning module, when the risk assessment result exceeds the risk threshold value, early warning is sent to a user, and otherwise, early warning is not sent.
2. The carbon market risk early warning system of claim 1, wherein in the data collection module, the first transaction data comprises transaction data in a carbon market.
3. The carbon market risk early warning system of claim 1, wherein in the data preprocessing module, the data cleansing includes processing missing values, processing outliers, and processing duplicate values.
4. The carbon market risk early warning system of claim 1, wherein in the risk assessment module, the risk assessment model comprises a random forest model and a long-short-term memory neural network model introducing residual connection, wherein the random forest model predicts the second transaction data and obtains a first prediction result, the long-short-term memory neural network model predicts the second transaction data and obtains a second prediction result, and the weighted average summation of the first prediction result and the second prediction result obtains a risk assessment result.
5. The carbon market risk pre-warning system of claim 4, wherein the process of predicting the second transaction data by the random forest model is as follows:
step A1: the method comprises the steps of constructing a decision tree model, namely dividing second transaction data into a training set and a testing set, randomly extracting a certain number of samples from the training set in a put-back way to form a plurality of random subsets, randomly selecting the characteristics of each random subset to obtain a plurality of characteristic subsets, constructing the decision tree model on the characteristic subsets by adopting a classification and regression tree algorithm, wherein each characteristic subset corresponds to an independent decision tree model;
step A2: the method comprises the steps of constructing a random forest model, specifically, combining a plurality of constructed decision tree models to obtain a random forest model, inputting samples in a test set into each decision tree model to obtain a decision tree model prediction result, wherein the average value of all decision tree model prediction results is a first prediction result.
6. The carbon market risk pre-warning system according to claim 5, characterized in that in step A2, it further comprises a process of optimizing a random forest model, in particular: calculating root mean square error as an optimization index, and adjusting the depth and the minimum sample number of the decision tree model based on the optimization index, or adjusting the number of the decision tree models and the size of the feature subsets in the random forest model; the root mean square error expression is as follows:
wherein RMSE represents root mean square error, N represents the number of samples, y pred Representing the predicted value of the sample, y true Representing the true value of the sample.
7. The carbon market risk early warning system of claim 4, wherein the long-term memory neural network model predicts the second transaction data as follows:
step B1: the method comprises the steps of constructing a long-period memory neural network model, specifically, dividing second transaction data into a training set, a verification set and a test set, defining an input layer, a hidden layer, input layer nodes, hidden layer nodes and an activation function of the long-period memory neural network model, and adding residual connection between the hidden layers; training the long-term and short-term memory neural network model by adopting a mean square error as a loss function and adopting a back propagation algorithm and an ADam optimization algorithm to obtain a training model;
step B2: model verification, namely inputting a verification set into a training model to obtain a predicted value, and calculating an evaluation index according to the predicted value and a real label;
step B3: the model tuning, specifically, tuning the training model according to the evaluation index, wherein the tuning comprises adjusting an input layer, a hidden layer, an input layer node and a hidden layer node of the training model, and selecting the optimal combination of the input layer, the hidden layer, the input layer node and the hidden layer node by using a cross verification mode to obtain a prediction model;
step B4: and data prediction, namely inputting the test set into a prediction model, and calculating to obtain a second prediction result through the forward propagation process of the prediction model.
8. The carbon market risk early warning system of claim 1, wherein in the early warning module, the early warning module continuously monitors feedback of the user and if the user does not take any action after receiving the early warning, the early warning is sent to the user again.
9. The carbon market risk early warning system of claim 1, further comprising a data visualization module for presenting the risk assessment results to a user in real time.
10. A carbon market risk early warning method based on big data, characterized in that the carbon market risk early warning method is realized by applying the carbon market risk early warning system according to any one of claims 1-9, and the steps of the method are as follows:
step S1: the data acquisition module acquires first transaction data and transmits the first transaction data to the data preprocessing module;
step S2: the data cleaning method comprises the steps of processing missing values, abnormal values and repeated values in first transaction data, then carrying out feature selection and normalization processing on the first transaction data to obtain second transaction data, and transmitting the second transaction data to a risk assessment module;
step S3: the risk assessment is specifically that second transaction data are input into a risk assessment model in a risk assessment module to obtain a risk assessment result, and the risk assessment result is transmitted to an early warning module;
step S4: and the risk early warning module detects whether a risk assessment result exceeds a preset risk threshold value, and when the risk assessment result exceeds the risk threshold value, the early warning module sends out early warning to a user, otherwise, the early warning module does not send out early warning.
CN202310741848.9A 2023-06-21 2023-06-21 Carbon transaction market risk early warning system and method based on big data Active CN116777213B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310741848.9A CN116777213B (en) 2023-06-21 2023-06-21 Carbon transaction market risk early warning system and method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310741848.9A CN116777213B (en) 2023-06-21 2023-06-21 Carbon transaction market risk early warning system and method based on big data

Publications (2)

Publication Number Publication Date
CN116777213A true CN116777213A (en) 2023-09-19
CN116777213B CN116777213B (en) 2024-11-08

Family

ID=88012908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310741848.9A Active CN116777213B (en) 2023-06-21 2023-06-21 Carbon transaction market risk early warning system and method based on big data

Country Status (1)

Country Link
CN (1) CN116777213B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117455245A (en) * 2023-12-22 2024-01-26 赛飞特工程技术集团有限公司 Intelligent risk assessment system for enterprise safety production
CN117689219A (en) * 2024-02-04 2024-03-12 江西科技学院 A sports equipment safety assessment system based on machine learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929926A (en) * 2019-11-18 2020-03-27 西北工业大学 Short-term explosion passenger flow prediction method based on long and short-term memory network and random forest
AU2020100709A4 (en) * 2020-05-05 2020-06-11 Bao, Yuhang Mr A method of prediction model based on random forest algorithm
CN113159615A (en) * 2021-05-10 2021-07-23 麦荣章 Intelligent information security risk measuring system and method for industrial control system
CN113505983A (en) * 2021-07-07 2021-10-15 广东电网有限责任公司 Energy industry chain monitoring and early warning system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929926A (en) * 2019-11-18 2020-03-27 西北工业大学 Short-term explosion passenger flow prediction method based on long and short-term memory network and random forest
AU2020100709A4 (en) * 2020-05-05 2020-06-11 Bao, Yuhang Mr A method of prediction model based on random forest algorithm
CN113159615A (en) * 2021-05-10 2021-07-23 麦荣章 Intelligent information security risk measuring system and method for industrial control system
CN113505983A (en) * 2021-07-07 2021-10-15 广东电网有限责任公司 Energy industry chain monitoring and early warning system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117455245A (en) * 2023-12-22 2024-01-26 赛飞特工程技术集团有限公司 Intelligent risk assessment system for enterprise safety production
CN117689219A (en) * 2024-02-04 2024-03-12 江西科技学院 A sports equipment safety assessment system based on machine learning

Also Published As

Publication number Publication date
CN116777213B (en) 2024-11-08

Similar Documents

Publication Publication Date Title
CN111124840B (en) Method and device for predicting alarm in business operation and maintenance and electronic equipment
Yan et al. Improved adaptive genetic algorithm for the vehicle Insurance Fraud Identification Model based on a BP Neural Network
CN116777213A (en) Carbon trading market risk early warning system and method based on big data
Bin et al. Regression model for appraisal of real estate using recurrent neural network and boosting tree
CN107609147B (en) Method and system for automatically extracting features from log stream
CN109214863B (en) A method for predicting urban housing demand based on express delivery data
CN118378912B (en) Emergency scene intelligent analysis and decision support method based on AI large model
Azzouz et al. Steady state IBEA assisted by MLP neural networks for expensive multi-objective optimization problems
CN112651534B (en) Method, device and storage medium for predicting resource supply chain demand
CN115249081A (en) Object type prediction method and device, computer equipment and storage medium
CN115114128B (en) Satellite health state evaluation system and satellite health state evaluation method
CN113469570A (en) Information quality evaluation model construction method, device, equipment and storage medium
Gomes et al. Mechanism for measuring system complexity applying sensitivity analysis
Zhang Prediction of Purchase Volume of Cross‐Border e‐Commerce Platform Based on BP Neural Network
Abolghasemi et al. How to effectively use machine learning models to predict the solutions for optimization problems: lessons from loss function
CN110458719A (en) A power grid scheduling method and system for power grid enterprises
CN117971511B (en) Collaborative visual simulation platform
Wu et al. Symphony in the latent space: Provably integrating high-dimensional techniques with non-linear machine learning models
CN116776006B (en) Customer portrait construction method and system for enterprise financing
CN118261637A (en) Agricultural product supply forecasting system and method based on online trading platform
CN117873837A (en) Analysis method for capacity depletion trend of storage device
CN117196384A (en) Efficiency evaluation method and system for information system
CN109785137A (en) A kind of method and apparatus that prediction user opens credit accounts
CN116611911A (en) Credit risk prediction method and device based on support vector machine
CN114298472A (en) Evaluation method and system for portraits of upstream and downstream enterprises in digital factories

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant