CN102982229A

CN102982229A - Multi-assortment commodity price expectation data pre-processing method based on neural networks

Info

Publication number: CN102982229A
Application number: CN2012103253686A
Authority: CN
Inventors: 朱全银; 尹永华; 严云洋; 陈婷; 曹苏群
Original assignee: Huaiyin Institute of Technology
Current assignee: Wuxi Manlai Software Co ltd
Priority date: 2012-09-06
Filing date: 2012-09-06
Publication date: 2013-03-20
Anticipated expiration: 2032-09-06
Also published as: CN102982229B

Abstract

The invention discloses a multi-assortment commodity price expectation data pre-processing method based on neural networks. The best order of magnitude of commodity price data which is obtained from websites is calculated by an improved radical basis function (RBF) neural networks and an improved back propagation (BP) artificial neural networks. The calculated best order of magnitude is used to preprocess normalized order of magnitude of the commodity price. Expectation accuracy of the RBF neural networks and the BP neural networks is improved. Generality of the RBF neural networks and the BP neural networks for expectation of different kinds of commodity prices is improved.

Description

A data preprocessing method for multi-variety commodity price prediction based on neural network

技术领域 technical field

本发明属于数据处理领域，特别涉及一种基于神经网络的多品种商品价格预测的数据预处理方法，可应用于商品价格预测分析与商品销售决策支持系统中的商品价格预测的数据预处理。The invention belongs to the field of data processing, in particular to a neural network-based data preprocessing method for multi-variety commodity price prediction, which can be applied to commodity price prediction data preprocessing in commodity price prediction analysis and commodity sales decision support systems.

背景技术 Background technique

商品价格的预测方法是市场预测分析与商品生产销售决策的基础，是市场预测领域中的一个重要问题，在商品生产、销售等很多问题中起着关键作用，而预测方法中的数据预处理方法对预测方法的通用性和准确性有着很大的影响。由于网络技术的发展与网络商店的普及，因此近年来，人们越来越重视对商品价格的预测方法的研究。商品价格的预测问题可以看作是基于时间序列的数据处理与数据分析问题，分为数据获取、数据处理与预测模型三个方面。股票市场、期货市场、电力市场等公开价格数据获取较为容易，用于价格预测的模型主要有最小二乘回归、神经网络、灰色马尔科夫链、小波理论和GM(1，1)模型等。针对消费类商品价格数据的获取方法，商品价格数据预处理方法和动态价格预测，2010年至2012年，朱全银等给出了商品销售数据抽取与数据挖掘的方法以及基于Web的商品价格的预处理方法和动态预测方法(Quanyin Zhu，Yunyang Yan，Jin Ding and Yu Zhang.The Commodities PriceExtracting for Shop Online，2010International Conference on Future Information Technology andManagement Engineering，Changzhou，Jiangsu，Chian，Dec.2010，Vol.2，pp.317-320；Quanyin Zhu，Yunyang Yan，Jin Ding and Jin Qian.The Case Study for Price Extracting of Mobile Phone SellOnline.IEEE 2nd International Conference on Software Engineering and Service Science，Beijing，Chian，July.2011，pp.281-295；Quanyin Zhu，Sunqun Cao，Jin Ding and Zhengyin Hah.Research onthe Price Forecast without Complete Data based on Web Mining，2011Distributed Computing andApplications to Business，Engineering and Science，Wuxi，Jiangsu，Chian，Oct.2011，pp.120-123；Quanyin Zhu，Hong Zhou，Yunyang Yan，Jin Qian and Pei Zhou.Commodities Price DynamicTrend Analysis Based on Web Mining.The International Conference on Multimedia InformationNetworking and Security，Shanghai，Chian，Nov.2011，pp.524-527；Jianping Deng，Fengwen Cao，Quanyin Zhu，and Yu Zhang.The Web Data Extracting and Application for Shop Online Based onCommodities Classified.Communications in Computer and Information Science，Vol.234(4)：189-197；Quanyin Zhu，Suqun Cao，Pei Zhou，Yunyang Yan，Hong Zhou.Integrated Price Forecastbased on Dichotomy Backfilling and Disturbance Factor Algorithm.International Review onComputers and Software，2011.Vol.6(6)：1089-1093；Quan-yin Zhu，Pei Zhou，Yun-Yang Yan，Yong-Hua Yin.Exchange Rate Forecasting based on Adaptive Sliding Window and RBF NeuralNetwork.International Review on Computers and Software，2011.Vol.6(7)：1290-1296；Jiajun Zong，Quanyin Zhu.Price Forecasting for Agricultural Products Based on BP and RBF Neural.ICSESS2012，p.607-610；Hong Zhou，Quanyin Zhu，Pei Zhou.A Hybrid Price Forecasting Based onLinear Backfilling and Sliding Window Algorithm.International Review on Computers andSoftware，2011.Vol.6(6)：1131-1134；王红艳，朱全银，严云洋，钱进.商品价格数据的两种WEB挖掘算法比较.微电子学与计算机.2011.Vol.28(19)：168-172)。The commodity price prediction method is the basis of market forecast analysis and commodity production and sales decision-making. It is an important issue in the field of market forecasting and plays a key role in many problems such as commodity production and sales. The data preprocessing method It has a great influence on the generality and accuracy of the forecasting method. Due to the development of network technology and the popularization of online stores, people have paid more and more attention to the research of commodity price forecasting methods in recent years. The problem of commodity price forecasting can be regarded as a problem of data processing and data analysis based on time series, which is divided into three aspects: data acquisition, data processing and forecasting model. It is relatively easy to obtain public price data such as stock market, futures market, and electricity market. The models used for price prediction mainly include least squares regression, neural network, gray Markov chain, wavelet theory, and GM (1,1) model. Aiming at the acquisition method of consumer commodity price data, commodity price data preprocessing method and dynamic price forecasting, from 2010 to 2012, Zhu Quanyin et al. gave commodity sales data extraction and data mining methods and web-based commodity price preprocessing Methods and Dynamic Forecasting Methods (Quanyin Zhu, Yunyang Yan, Jin Ding and Yu Zhang. The Commodities Price Extracting for Shop Online, 2010 International Conference on Future Information Technology and Management Engineering, Changzhou, Jiangsu, Chian, Dec.2010, Vol.2, pp. 317-320; Quanyin Zhu, Yunyang Yan, Jin Ding and Jin Qian. The Case Study for Price Extracting of Mobile Phone SellOnline. IEEE 2nd International Conference on Software Engineering and Service Science, Beijing, Chian, July.2011, pp.281- 295; Quanyin Zhu, Sunqun Cao, Jin Ding and Zhengyin Hah. Research on the Price Forecast without Complete Data based on Web Mining, 2011 Distributed Computing and Applications to Business, Engineering and Science, Wuxi, Jiangsu, Chian, Oct. 102011, pp. 123; Quanyin Zhu, Hong Zhou, Yunyang Yan, Jin Qian and Pei Zhou. Commodities Price Dynamic Trend Analysis Based on Web Mining. The International Conference on Multimedia Information Networking and Security, Shanghai, Chian, No v.2011, pp.524-527; Jianping Deng, Fengwen Cao, Quanyin Zhu, and Yu Zhang. The Web Data Extracting and Application for Shop Online Based on Commodities Classified. Communications in Computer and Information Science, Vol.234(4): 189-197; Quanyin Zhu, Suqun Cao, Pei Zhou, Yunyang Yan, Hong Zhou. Integrated Price Forecast based on Dichotomy Backfilling and Disturbance Factor Algorithm. International Review on Computers and Software, 2011. Vol.6(6): 1089-1093; -yin Zhu, Pei Zhou, Yun-Yang Yan, Yong-Hua Yin. Exchange Rate Forecasting based on Adaptive Sliding Window and RBF NeuralNetwork. International Review on Computers and Software, 2011.Vol.6(7): 1290-1296; Jiajun Zong, Quanyin Zhu. Price Forecasting for Agricultural Products Based on BP and RBF Neural. ICSESS2012, p.607-610; Hong Zhou, Quanyin Zhu, Pei Zhou. A Hybrid Price Forecasting Based on Linear Backfilling and Sliding or view on Compiling Window. International Alg andSoftware, 2011.Vol.6(6): 1131-1134; Wang Hongyan, Zhu Quanyin, Yan Yunyang, Qian Jin. Comparison of Two WEB Mining Algorithms for Commodity Price Data. Microelectronics and Computers. 2011.Vol.28(19): 168 -172).

RBF(Radical Basis Function)神经网络：RBF (Radical Basis Function) neural network:

RBF是一种前馈式神经网络，它模拟了人脑中局部调整、相互覆盖接受域的神经网络结构，具有很强的生物背景和逼近任意非线性函数的能力。它是一种三层结构的前馈网络：第一层为输入层，有信号源节点组成。第二层为隐含层，隐单元的变换函数式是一种局部分布的非负非线性函数，它对中心点径向对称且衰减。隐含层的单元数由所描述问题的需要确定。第三层为输出层，网络的输出是隐单元输出的线性加权。其中，输入层节点只传递输入信号到隐含层；隐含层的基函数为非线性的，它对输入信号产生一个局部化的响应，即每一个隐含节点有一个参数矢量称之为中心。该中心用来与网络输入矢量相比较以产生径向对称响应，仅当输入落在一个很小的指定区域中时，隐含节点才做出有意义的非零响应，响应值在0到1之间，输入与基函数中心的距离越近，隐节点响应越大；输出单元是线性的，即输出单元对隐节点输出进行线性加权组合。RBF is a feed-forward neural network, which simulates the neural network structure of the human brain with local adjustments and mutual coverage of the receptive field. It has a strong biological background and the ability to approximate any nonlinear function. It is a feed-forward network with a three-layer structure: the first layer is the input layer, which is composed of signal source nodes. The second layer is the hidden layer. The transformation function of the hidden unit is a locally distributed non-negative nonlinear function, which is radially symmetrical and attenuated to the center point. The number of units in the hidden layer is determined by the needs of the described problem. The third layer is the output layer, and the output of the network is the linear weighting of the hidden unit output. Among them, the input layer node only transmits the input signal to the hidden layer; the basis function of the hidden layer is nonlinear, and it produces a localized response to the input signal, that is, each hidden node has a parameter vector called the center . The center is used to compare with the network input vector to produce a radially symmetric response. Only when the input falls in a small specified area, the hidden node makes a meaningful non-zero response, and the response value is between 0 and 1. Between, the closer the distance between the input and the center of the basis function, the greater the response of the hidden node; the output unit is linear, that is, the output unit performs a linear weighted combination of the output of the hidden node.

BP(Back Propagation)神经网络：BP (Back Propagation) neural network:

BP是一种按误差逆传播算法训练的多层前馈网络。它能学习和存贮大量的输入-输出模式映射关系，而无需事前揭示描述这种映射关系的数学方程。它的学习规则是使用最速下降法，通过反向传播来不断调整网络的权值和阈值，使网络的误差平方和最小。BP神经网络是一种三层前馈网络，包括输入层、隐层和输出层。输入层各神经元负责接收来自外界的输入信息，并传递给中间层各神经元；中间层是内部信息处理层，负责信息变换，根据信息变化能力的需求，中间层可以设计为单隐层或者多隐层结构；最后一个隐层传递到输出层各神经元的信息，经进一步处理后，完成一次学习的正向传播处理过程，由输出层向外界输出信息处理结果。当实际输出与期望输出不符时，进入误差的反向传播阶段。误差通过输出层，按误差梯度下降的方式修正各层权值，向隐层、输入层逐层反传。周而复始的信息正向传播和误差反向传播过程，是各层权值不断调整的过程，也是神经网络学习训练的过程，此过程一直进行到网络输出的误差减少到可以接受的程度，或者预先设定的学习次数为止。BP is a multilayer feed-forward network trained by the error backpropagation algorithm. It can learn and store a large number of input-output pattern mappings without revealing the mathematical equations describing such mappings in advance. Its learning rule is to use the steepest descent method to continuously adjust the weights and thresholds of the network through backpropagation to minimize the sum of squared errors of the network. BP neural network is a three-layer feedforward network, including input layer, hidden layer and output layer. Each neuron in the input layer is responsible for receiving input information from the outside world and passing it to each neuron in the middle layer; the middle layer is the internal information processing layer, which is responsible for information transformation. According to the requirements of information change capability, the middle layer can be designed as a single hidden layer or Multi-hidden layer structure; the information transmitted from the last hidden layer to each neuron in the output layer, after further processing, completes a forward propagation process of learning, and the output layer outputs information processing results to the outside world. When the actual output does not match the expected output, enter the error backpropagation stage. The error passes through the output layer, corrects the weights of each layer according to the error gradient descent method, and then propagates back to the hidden layer and input layer layer by layer. The repeated process of information forward propagation and error back propagation is a process of continuous adjustment of the weights of each layer, and also a process of neural network learning and training. This process continues until the error of the network output is reduced to an acceptable level, or the pre-set up to the specified number of studies.

以上算法在用于价格预测时，无论是预测准确率，还是算法学习时间上都存在着很大的不确定性。算法中用到的技术计算语言MATLAB中的函数部分参数自定义的不确定性，增加了算法学习时间上和预测精度上的不确定性，这种不确定性使算法在用于商品价格的预测中存在很大的局限性。为了能更好的利用以上算法，提出了很多改进的价格预测方法：基于BP神经网络模型的k-means聚类股价预测；基于BP神经网络的自适应算法的IPO抑价预测；基于组合BP神经网络的时间序列模型的农产品价格预测模型；一种改进的基于小波变换和RBF神经网络的原油价格预测；基于动态RBF神经网络的非线性时间序列预测等。在提出的改进预测方法中，这些预测方法的针对性都较强，缺乏通用性，改进的预测方法只适用于一种商品或者同一类商品，而且预测方法的定参性使预测方法缺乏灵活性，在面对同一类不同种商品时不能保证价格预测的准确性。缺乏灵活性和通用性使这些改进的预测方法不能满足广大的销售商对不同消费种类商品市场预测分析与商品销售决策的迫切需求，因此，需要找到一种能够适用于不同种类商品价格或同种类不同商品价格的预测方法，或找到一种针对不同种类商品价格的数据预处理方法，以获得预测方法更好的通用性和更高的预测准确率。When the above algorithms are used for price prediction, there is a great deal of uncertainty in both the prediction accuracy and the learning time of the algorithm. The uncertainty of some function parameters in MATLAB, the technical computing language used in the algorithm, increases the uncertainty of the algorithm learning time and prediction accuracy. This uncertainty makes the algorithm used in the prediction of commodity prices There are great limitations in . In order to make better use of the above algorithms, many improved price prediction methods have been proposed: k-means cluster stock price prediction based on BP neural network model; IPO underpricing prediction based on adaptive algorithm of BP neural network; Agricultural product price prediction model based on network time series model; an improved crude oil price prediction based on wavelet transform and RBF neural network; nonlinear time series prediction based on dynamic RBF neural network, etc. Among the improved prediction methods proposed, these prediction methods are highly pertinent and lack universality. The improved prediction methods are only applicable to one commodity or the same type of commodity, and the fixed parameters of the prediction method make the prediction method inflexible. , the accuracy of price prediction cannot be guaranteed when faced with the same type of different commodities. The lack of flexibility and versatility makes these improved forecasting methods unable to meet the urgent needs of the vast number of sellers for market forecast analysis and commodity sales decisions of different types of commodities. The prediction method of different commodity prices, or find a data preprocessing method for different types of commodity prices, so as to obtain better versatility and higher prediction accuracy of the prediction method.

发明内容 Contents of the invention

本发明的目的是将归一化原始数据数量级方法与改进的RBF神经网络和BP神经网预测方法结合，利用改进的RBF神经网络和BP神经网络对网页挖掘的商品价格数据计算其最佳数量级，用计算得出的最佳数量级对商品价格数进行归一化数据量级的预处理，之后利用改进的RBF神经网络和BP神将网络进行商品价格的预测，提高RBF神经网络和BP神经网络的预测准确率，同时提高RBF神经网络和BP神经网络用于不同商品价格预测的通用性。The purpose of the present invention is to combine the normalized original data order of magnitude method with the improved RBF neural network and BP neural network prediction method, and utilize the improved RBF neural network and BP neural network to calculate its optimal order of magnitude for the commodity price data of web mining, Use the calculated optimal order of magnitude to preprocess the normalized data magnitude of the commodity price, and then use the improved RBF neural network and BP neural network to predict the commodity price, and improve the performance of the RBF neural network and BP neural network. Prediction accuracy, while improving the versatility of RBF neural network and BP neural network for different commodity price predictions.

本发明的技术方案是通过归一化原始数据数量级方法对网页挖取的数据进行预处理，在实现归一化数量级后的数据集上利用改进的RBF神经网络和BP神经网络计算得出商品价格数据的最佳量级，用计算得出的最佳数量级对商品价格数进行归一化数据量级的预处理，进而完成商品的市场价格预测。The technical solution of the present invention is to preprocess the data excavated from the webpage through the method of normalizing the order of magnitude of the original data, and use the improved RBF neural network and BP neural network to calculate the commodity price on the data set after realizing the normalized order of magnitude The optimal magnitude of the data, using the calculated optimal magnitude to preprocess the normalized data magnitude of the commodity price, and then complete the market price prediction of the commodity.

为便于理解本发明方案，首先对本发明的理论基础进行描述如下：For the convenience of understanding the scheme of the present invention, at first the theoretical basis of the present invention is described as follows:

在基于神经网络的价格预测领域中，提出了很多改进的用于价格预测的数据预处理方法，并都取得了明显的改进效果。但这些改进方法针对性较强，忽视了预测方法的灵活性和通用性，使改进的价格预测方法存在很大的局限性。归一化原始数据数量级的数据预处理方法能很好的提高预测方法的通用性和预测准确率。归一化原始数据数量级方法，对于某一商品的价格数据，相对降低了商品价格数据的波动范围，提高了预测方法的稳定性，同时提高了预测方法对于该商品价格预测时的准确率；对于不同商品的价格数据，相对降低了不同商品价格数据间的差异，同时对于某一特定商品，相对降低了该商品价格数据的波动范围，提高了预测方法的稳定性的同时增强了预测方法的通用性，获得了更高的预测准确率；利用改进的RBF神经网络和BP神经网络在归一化量级后的价格数据上实现商品的价格预测，获得更高的商品价格预测准确率。In the field of price prediction based on neural network, many improved data preprocessing methods for price prediction have been proposed, and all of them have achieved obvious improvement effects. However, these improved methods are highly targeted, ignoring the flexibility and versatility of the forecasting method, which makes the improved price forecasting method have great limitations. The data preprocessing method of normalizing the order of magnitude of the original data can improve the versatility and prediction accuracy of the prediction method. The normalized original data order of magnitude method, for the price data of a commodity, relatively reduces the fluctuation range of the commodity price data, improves the stability of the prediction method, and improves the accuracy of the prediction method for the price prediction of the commodity; The price data of different commodities relatively reduces the difference between the price data of different commodities. At the same time, for a specific commodity, the fluctuation range of the commodity price data is relatively reduced, which improves the stability of the prediction method and enhances the generality of the prediction method. The improved RBF neural network and BP neural network are used to realize the price prediction of commodities on the normalized price data and obtain higher prediction accuracy of commodity prices.

具体的说，本发明方案通过如下各步骤实现归一化原始数据数量级与改进的RBF神经网络和BP神经网络的商品价格预测：Specifically, the scheme of the present invention realizes the normalized original data order of magnitude and the commodity price prediction of the improved RBF neural network and BP neural network through the following steps:

步骤1、抽取网页中商品的名称、型号、类型与价格数据，建立有h个商品的数据集X＝{A₁，A₂，...，A_h}，设第i个商品抽取的价格数据为n个，A_i＝{x₁，x₂，...，x_n}，其中i∈[1，h]，x₁，x₂，...，x_n指第A_i个商品抽取的n个价格数据；Step 1. Extract the name, model, type and price data of the commodities in the webpage, and establish a data set X={A ₁ , A ₂ ,...,A _h } with h commodities, and set the extracted price of the i-th commodity There are n pieces of data, A _i = {x ₁ , x ₂ , ..., x _n }, where i∈[1, h], x ₁ , x ₂ , ..., x _n refers to the item A _i The extracted n price data;

步骤2、计算i个不同商品的价格量级，得到不同商品的价格量级M＝{b₁，b₂，...，b_h}；Step 2. Calculate the price magnitudes of i different commodities, and obtain the price magnitudes of different commodities M={b ₁ , b ₂ ,..., b _h };

步骤3、自定义一个包含数据个数为z的预测样本，共需预测价格个数D；Step 3. Customize a forecast sample that contains the number of data z, and the total number of predicted prices is D;

步骤4、选定预测模型；Step 4, select the prediction model;

步骤5、当选定的预测模型为RBF神经网络，执行步骤6到步骤12；当选定的预测模型为BP神经网络，执行步骤14到步骤21；Step 5, when the selected prediction model is RBF neural network, perform steps 6 to 12; when the selected prediction model is BP neural network, perform steps 14 to 21;

步骤6、设定模型训练函数为技术计算语言MATLAB中的newrbe(P，T，SPREAD)函数，该函数用于设计一个严格的径向基网络，其中P为输入矢量，T为目标矢量，SPREAD为径向基函数的分布；模型预测函数为技术计算语言MATLAB中的sim(′MODEL′，PARAMETERS)函数，此函数用于仿真一个神经网络，其中MODEL为训练好的网络模型，PARAMETERS为输入矢量；设定j个不同的径向基函数的分布值Spreads＝{spread₁，spread₂，...，spread_j}；Step 6, set the model training function as the newrbe (P, T, SPREAD) function in the technical computing language MATLAB, which is used to design a strict radial basis network, where P is the input vector, T is the target vector, and SPREAD is the distribution of radial basis functions; the model prediction function is the sim('MODEL', PARAMETERS) function in the technical computing language MATLAB, which is used to simulate a neural network, where MODEL is the trained network model, and PARAMETERS is the input vector ;Set the distribution values of j different radial basis functions Spreads={spread ₁ , spread ₂ ,..., spread _j };

步骤7、将商品A_i的销售价格数量级归一化为量级b_i，得到

Step 7. Normalize the order of magnitude of the sales price of commodity A _i to order of magnitude b _i , and obtain

步骤8、将输入矢量P，目标矢量T带入训练函数newrbe(P，T，SPREAD)，训练j个不同网络net_ij＝newrbe(P，T，spread_j)，建立预测样本Test＝[t₁，t₂，...，t_z]，

Step 8. Bring the input vector P and the target vector T into the training function newrbe(P, T, SPREAD), train j different networks net _ij =newrbe(P, T, spread _j ), and establish a prediction sample Test=[t ₁ ,t ₂ ,...,t _z ],

步骤9、商品A_i的第n+1天的j个预测值Y_ij＝sim(net_ij，Test)，设商品A_i的第n+1天的最佳预测值为y_i，y_i∈Y_ij；Step 9. The j predicted value Y _ij of commodity A _i on day n+1 = sim(net _ij , Test), and the best predicted value of commodity A _i on day n+1 is y _i , y _i ∈ Y _ij ;

步骤10、定义耦合权重W＝(w₁，w₂，w₃)，设商品A_i的第n+1天的三个最佳预测径向基函数的分布的值为Bspread_i1∈Spreads，Bspread_i2∈Spreads，Bspread_i3∈Spreads，求得最佳径向基函数的分布的值 $Bspread = \frac{{Bspread}_{i 1} * w_{1} + {Bespread}_{i 2} * w_{2} + {Bspread}_{i 3} * w_{3}}{w_{1} + w_{2} + w_{3}};$ Step 10. Define the coupling weight W=(w ₁ , w ₂ , w ₃ ), and set the distribution values of the three best forecast radial basis functions of commodity A _i on day n+1 as Bspread _i1 ∈ Spreads, Bspread _i2 ∈ Spreads, Bspread _i3 ∈ Spreads, find the value of the distribution of the best radial basis function $Bspread = \frac{{Bspread}_{i 1} * w_{1} + {Bespread}_{i 2} * w_{2} + {Bspread}_{i 3} * w_{3}}{w_{1} + w_{2} + w_{3}};$

步骤11、训练不变网络net＝newrbe(P，T，Bspread)；Step 11, training invariant network net=newrbe(P, T, Bspread);

步骤12、带入最佳预测值y_i作为预测样本进行下一次预测，方法为新的预测样本[t₁，t₂，...，t_z]中t₁＝上次预测样本[t₁，t₂，...，t_z]中的t₂，新的预测样本[t₁，t₂，...，t_z]中t₂＝上次预测样本[t₁，t₂，...，t_z]中的t₃，…，新的预测样本[t₁，t₂，...，t_z]中t_z-1＝上次预测样本[t₁，t₂，...，t_z]中的t_z，新的预测样本[t₁，t₂，...，t_z]中t_z＝y_i，得到新的预测样本Test＝[t₁，t₂，...，t_z]，商品第n+2天的预测值yi＝sim(net，Test)；Step 12. Bring in the best forecast value y _i as the forecast sample for the next forecast. The method is t ₁ = last forecast sample [t ₁ ] in the new forecast sample [t ₁ , t ₂ , ..., t _z ] _. _{_} _{_} _{_} _{_} _{_} _{_} _{_} _{_} .., t _z ] in t ₃ , ..., t _z- 1 in the new forecast sample [t ₁ , t ₂ , ..., t _z ] = last forecast sample [t ₁ , t ₂ , .. ., t _z ], t _z in the new prediction sample [t ₁ , t ₂ ,..., t _z ], t _z =y _i , and get a new prediction sample Test=[t ₁ , t ₂ ,. .., t _z ], the predicted value of the product on day n+2 yi=sim(net, Test);

步骤13、重复步骤12，得到商品A_i的所有预测值；重复步骤7到步骤12，得到数据集X中所有商品在不同数量级上的预测值，并得到最佳预测数量级O，O∈M；Step 13. Repeat step 12 to obtain all predicted values of commodity A _i ; repeat steps 7 to 12 to obtain predicted values of all commodities in data set X at different orders of magnitude, and obtain the best predicted order of magnitude O, O∈M;

步骤14、设定模型训练函数为技术计算语言MATLAB中的NET＝newff(P，T，NEURON)函数和NET′＝train(NET，P，T)函数，其中newff()函数用于创建一个前馈BP网络，P为输入矢量，T为目标矢量，NEURON为隐层神经元个数，train()函数用于训练一个神经网络，NET为创建好的前馈BP网络；模型预测函数为NET′(Test)，其中Test为预测样本；设定j个不同的隐层神经元个数的值Neurons＝{neuron₁，neuron₂，...，neuron_j}；Step 14, setting the model training function as the NET=newff(P, T, NEURON) function and NET'=train(NET, P, T) function in the technical computing language MATLAB, wherein the newff() function is used to create a previous Feed BP network, P is the input vector, T is the target vector, NEURON is the number of neurons in the hidden layer, the train() function is used to train a neural network, NET is the created feed-forward BP network; the model prediction function is NET' (Test), wherein Test is a prediction sample; The value Neurons={neuron ₁ , neuron ₂ ,..., neuron _j } of j different hidden layer neuron numbers is set;

步骤15、将商品A_i的销售价格数量级归一化为量级b_i，得到

Step 15. Normalize the order of magnitude of the sales price of commodity A _i to order of magnitude b _i to obtain

步骤16、将输入矢量P，目标矢量T带入训练函数NET＝newff(P，T，NEURON)和NET′＝train(NET，P，T)，训练就j个不同网络net_ij＝newff(P，T，Neurons)，net_ij＝train(net_ij，P，T)；建立预测样本Test＝[t₁，t₂，...，t_z]，

Step 16, input vector P, target vector T are brought into training function NET=newff(P, T, NEURON) and NET'=train(NET, P, T), training just j different networks net _ij =newff(P , T, Neurons), net _ij = train(net _ij , P, T); build prediction sample Test=[t ₁ , t ₂ ,..., t _z ],

步骤17、商品A_i的第n+1天的j个预测值Y_ij＝net_i(Test)，设商品A_i的第n+1天的最佳预测值为y_i，y_i∈Y_ij；Step 17. The j predicted value Y _ij of commodity A _i on day n+1 = net _i (Test), assuming that the best predicted value of commodity A _i on day n+1 is y _i , y _i ∈ _{Y ij} ;

步骤18、定义耦合权重W＝(w₁，w₂，w₃)，设商品A_i的第n+1天的三个最佳预测隐层神经元个数的值为Bneuron_i1∈Neurons，Bneuron_i2∈Neurons，Bneuron_i3∈Neurons，求得最佳隐层神经元个数的值 $Bneuron = \frac{{Bneuron}_{i 1} * w_{1} + {Bneuron}_{i 2} * w_{2} + {Bneuron}_{i 3} * w_{3}}{w_{1} + w_{2} + w_{3}};$ Step 18. Define the coupling weight W=(w ₁ , w ₂ , w ₃ ), and set the value of the three best predicted hidden layer neurons on day n+1 of commodity A _i as Bneuron _i1 ∈ Neurons, Bneuron _i2 ∈ Neurons, Bneuron _i3 ∈ Neurons, find the value of the optimal number of neurons in the hidden layer $Bneuron = \frac{{Bneuron}_{i 1} * w_{1} + {Bneuron}_{i 2} * w_{2} + {Bneuron}_{i 3} * w_{3}}{w_{1} + w_{2} + w_{3}};$

步骤19、训练不变网络net＝newff(P，T，Bneuron)，net＝train(net，P，T)；Step 19, training invariant network net=newff (P, T, Bneuron), net=train (net, P, T);

步骤20、带入最佳预测值y_i作为预测样本进行下一次预测，方法为新的预测样本[t₁，t₂，...，t_z]中t₁＝上次预测样本[t₁，t₂，...，t_z]中的t₂，新的预测样本[t₁，t₂，...，t_z]中t₂＝上次预测样本[t₁，t₂，...，t_z]中的t₃，…，新的预测样本[t₁，t₂，...，t_z]中t_z-1＝上次预测样本[t₁，t₂，...，t_z]中的t_z，新的预测样本[t₁，t₂，...，t_z]中t_z＝y_i，得到新的预测样本Test＝[t₁，t₂，...，t_z]，商品第n+2天的预测值y_i＝net(Test)；Step 20, bring in the best predicted value y _i as the forecast sample for the next forecast, the method is t ₁ in the new forecast sample [t ₁ , t ₂ , ..., t _z ] = last forecast sample [t ₁ _. _{_} _{_} _{_} _{_} _{_} _{_} _{_} _{_} .., t _z ] in t ₃ , ..., t _z- 1 in the new forecast sample [t ₁ , t ₂ , ..., t _z ] = last forecast sample [t ₁ , t ₂ , .. ., t _z ], t _z in the new prediction sample [t ₁ , t ₂ ,..., t _z ], t _z =y _i , and get a new prediction sample Test=[t ₁ , t ₂ ,. .., t _z ], the predicted value y _i =net(Test) of the commodity on the n+2th day;

步骤21、重复步骤20，得到商品A_i的所有预测值；重复步骤15到步骤20，得到数据集X中所有商品在不同数量级上的预测值，并得到最佳预测数量级O，O∈M。Step 21. Repeat step 20 to obtain all predicted values of commodity A _i ; repeat steps 15 to 20 to obtain predicted values of all commodities in data set X at different orders of magnitude, and obtain the best predicted order of magnitude O, O∈M.

步骤1中所述抽取网页中商品的名称、型号、类型与价格数据是指，利用任意Web数据抽取算法，抽取商品在网页上显示的名称、型号、类型与价格数据；其中x₁，x₂，...，x_n可以是第i个商品A_i从一个网页中抽取的n个价格数据，也可以是从多个网页中抽取的n个平均价格数据。Extracting the name, model, type and price data of the commodity in the webpage mentioned in step 1 refers to extracting the name, model, type and price data of the commodity displayed on the webpage by using any web data extraction algorithm; where x ₁ , x ₂ ,..., x _n can be n pieces of price data extracted from one webpage for the i-th commodity A _i , or can be n pieces of average price data extracted from multiple webpages.

步骤2是对任一商品的价格数据计算获得该商品价格数据的量级。Step 2 is to calculate the price data of any commodity to obtain the magnitude of the commodity price data.

步骤3到步骤5是针对任意一个商品在价格预测时的参数设定和预测模型选定，其中z值一般为3，5，7，D值一般为3，7。Steps 3 to 5 are for the parameter setting and forecasting model selection of any commodity in price forecasting, where the z value is generally 3, 5, 7, and the D value is generally 3, 7.

步骤6和步骤14中技术计算语言MATLAB是MathWorks公司的产品，版本为R2011b。The technical computing language MATLAB in step 6 and step 14 is a product of MathWorks, and its version is R2011b.

步骤6到步骤12是针对任意一个商品在一个网页中不同日期的价格数据在改进的RBF神经网络下的预测值，或多个网页中不同日期的平均值价格数据在改进的RBF神经网络下的预测值。Steps 6 to 12 are the predicted value of the price data of any commodity on different dates in a webpage under the improved RBF neural network, or the average price data of different dates in multiple webpages under the improved RBF neural network Predictive value.

步骤14到步骤20是针对任意一个商品在一个网页中不同日期的价格数据在改进的BP神经网络下的预测值，或多个网页中不同日期的平均值价格数据在改进的BP神经网络下的预测值。Steps 14 to 20 are the predicted value of the price data of any commodity on different dates in a webpage under the improved BP neural network, or the average price data of different dates in multiple webpages under the improved BP neural network Predictive value.

步骤6、步骤8、步骤14和步骤16中的输入矢量P为训练样本集，目标矢量T为训练测试预测值的数据集。The input vector P in step 6, step 8, step 14 and step 16 is the training sample set, and the target vector T is the data set of the training test prediction value.

步骤6中预先设定的j值一般为40，步骤14中预先设定的j值一般为10。The preset value of j in step 6 is generally 40, and the preset value of j in step 14 is generally 10.

步骤7和步骤15中是将任一商品的价格数据数量级归一化到统一的量级，商品的价格数据的数量级和归一化的量级相同，该商品的价格数据的数量级不进行归一化量级预处理；商品的价格数据的数量级和归一化的量级不同，该商品的价格数据的数量级进行归一化量级预处理，量级一般为1，10，100，1000。In step 7 and step 15, the magnitude of the price data of any commodity is normalized to a unified magnitude, the magnitude of the price data of the commodity is the same as the normalized magnitude, and the magnitude of the price data of the commodity is not normalized Quantitative magnitude preprocessing; the magnitude of the commodity price data is different from the normalized magnitude. The magnitude of the commodity price data is preprocessed with normalized magnitude. The magnitude is generally 1, 10, 100, 1000.

步骤10和步骤18中定义的耦合权重w＝[2，4，2]。Coupling weight w=[2, 4, 2] defined in step 10 and step 18.

相比现有技术的各种价格预测中的数据预处理方法，本发明选取挖掘的网页商品的价格数据，利用改进的RBF神经网络和BP神经网络，计算商品价格原始数据的最佳量级，采用计算所得的最佳量级，对商品价格的原始数据进行统一的归一化量级处理；采用本发明的原始数据的归一化数量级的预处理方法，对于某一特定商品，降低了该商品的价格数据的波动范围；对于不同的商品，降低了不同商品的价格数据间的差异，弥补了现有价格预测方法因数据预处理方法应用于不同商品价格预测时的局限性，提高了预测方法的通用性的同时提高了预测的准确率。Compared with the data preprocessing methods in various price predictions of the prior art, the present invention selects the price data of webpage commodities mined, and uses the improved RBF neural network and BP neural network to calculate the optimal magnitude of the original data of commodity prices, Using the calculated optimal order of magnitude, the original data of the commodity price is processed in a unified normalized order of magnitude; the preprocessing method of the normalized order of magnitude of the original data of the present invention is used for a certain commodity, which reduces the The fluctuation range of commodity price data; for different commodities, the difference between the price data of different commodities is reduced, which makes up for the limitations of the existing price prediction method when the data preprocessing method is applied to the price prediction of different commodities, and improves the prediction The versatility of the method improves the prediction accuracy at the same time.

附图说明 Description of drawings

图1为本发明具体实施方式的流程图。Fig. 1 is a flowchart of a specific embodiment of the present invention.

具体实施方式 Detailed ways

下面结合附图对本发明的技术方案进行详细说明：The technical scheme of the present invention is described in detail below in conjunction with accompanying drawing:

如附图1所示，本发明实施方案按照以下步骤进行：As shown in accompanying drawing 1, embodiment of the present invention carries out according to the following steps:

步骤4、选定预测模型；Step 4, select the prediction model;

步骤7、将商品A_i的销售价格数量级归一化为量级b_i，得到

步骤10、定义耦合权重W＝(w₁，w₂，w₃)，设商品A_i的第n+1天的三个最佳预测径向基函数的分布的值为Bspread_i1∈Spreads，Bspreadd_i2∈Spreads，Bspread_i3∈Spreads，求得最佳径向基函数的分布的值 $Bspread = \frac{{Bspread}_{i 1} * w_{1} + {Bespread}_{i 2} * w_{2} + {Bspread}_{i 3} * w_{3}}{w_{1} + w_{2} + w_{3}};$ Step 10. Define the coupling weight W=(w ₁ , w ₂ , w ₃ ), and set the distribution values of the three best predicted radial basis functions of commodity A _i on day n+1 as Bspread _i1 ∈ Spreads, Bspreadd _i2 ∈ Spreads, Bspread _i3 ∈ Spreads, find the value of the distribution of the best radial basis function $Bspread = \frac{{Bspread}_{i 1} * w_{1} + {Bespread}_{i 2} * w_{2} + {Bspread}_{i 3} * w_{3}}{w_{1} + w_{2} + w_{3}};$

步骤12、带入最佳预测值y_i作为预测样本进行下一次预测，方法为新的预测样本[t₁，t₂，...，t_z]中t₁＝上次预测样本[t₁，t₂，...，t_z]中的t₂，新的预测样本[t₁，t₂，...，t_z]中t₂＝上次预测样本[t₁，t₂，...，t_z]中的t₃，…，新的预测样本[t₁，t₂，...，t_z]]中t_z-1＝上次预测样本[t₁，t₂，...，t_z]中的t_z，新的预测样本[t₁，t₂，...，t_z]中t_z＝y_i，得到新的预测样本Test＝[t₁，t₂，...，t_z，商品第n+2天的预测值y_i＝sim(net，Test)；Step 12. Bring in the best forecast value y _i as the forecast sample for the next forecast. The method is t ₁ = last forecast sample [t ₁ ] in the new forecast sample [t ₁ , t ₂ , ..., t _z ] _. _{_} _{_} _{_} _{_} _{_} _{_} _{_} _{_} .., t _z ] in t ₃ , ..., new forecast sample [t ₁ , t ₂ , ..., t _z ]] in t _z-1 = last forecast sample [t ₁ , t ₂ , . t _z in .., t _{z ]} , t _z = y _i in the new prediction sample [t ₁ , t ₂ , ..., t _z ], and get a new prediction sample Test=[t ₁ , t ₂ , ..., t _z , the predicted value y _i =sim(net, Test) of the product on the n+2th day;

步骤15、将商品A_i的销售价格数量级归一化为量级b_i，得到

：步骤14到步骤20是针对任意一个商品在一个网页中不同日期的价格数据在改进的BP神经网络下的预测值，或多个网页中不同日期的平均值价格数据在改进的BP神经网络下的预测值。: Step 14 to step 20 is the predicted value under the improved BP neural network for the price data of any commodity on different dates in a webpage, or the average price data of different dates in multiple webpages under the improved BP neural network predicted value of .

为了更好地说明本方法的有效性，利用从网页上抽取的8种不同人民币汇率从2011年1月1日至2011年12月31日的每天平均价格数据作为原始数据，计算得出原始数据的量级为1、10和100，对原始数据数量级进行归一化处理实验。In order to better illustrate the effectiveness of this method, the daily average price data of 8 different RMB exchange rates extracted from the webpage from January 1, 2011 to December 31, 2011 were used as the original data to calculate the original data The order of magnitude is 1, 10 and 100, and the normalization experiment is performed on the order of magnitude of the original data.

在改进的RBF神经网络的实验环境下，在不进行归一化量级预处理时，原始数据的实验结果为：澳大利亚元的平均误差为3.9％，港币的平均误差为11.82％，加拿大元的平均误差为640.84％，美元的平均误差为21571.04％，欧元的平均误差为1.66％，日元的平均误差为1.15％，瑞士法郎的平均误差为2.77％，新加坡元的平均误差为28959.17％，实验的平均误差为6399.04％；在数据量级归一化为100时，实验结果为：澳大利亚元的平均误差为3.9％，港币的平均误差为1.97％，加拿大元的平均误差为640.84％，美元的平均误差为21571.04％，欧元的平均误差为1.66％，日元的平均误差为1233177％，瑞士法郎的平均误差为2.77％，新加坡元的平均误差为28959.17％，实验的平均误差为160544.8％；在数据量级归一化为10时，实验结果为：澳大利亚元的平均误差为1.77％，港币的平均误差为0.94％，加拿大元的平均误差为0.39％，美元的平均误差为0.11％，欧元的平均误差为224438.5％，日元的平均误差为1.05％，瑞士法郎的平均误差为1.57％，新加坡元的平均误差为0.50％，实验的平均误差为28055.6％；在数据量级归一化为1时，实验结果为：澳大利亚元的平均误差为0.98％，港币的平均误差为0.94％，加拿大元的平均误差为0.41％，美元的平均误差为0.09％，欧元的平均误差为1.12％，日元的平均误差为1.29％，瑞士法郎的平均误差为1.91％，新加坡元的平均误差为0.16％，实验的平均误差为0.86％。结论是数据量级归一化为1时，取得了最好的预测结果，预测的平均准确率达99.14％。In the experimental environment of the improved RBF neural network, without normalization magnitude preprocessing, the experimental results of the original data are: the average error of the Australian dollar is 3.9%, the average error of the Hong Kong dollar is 11.82%, and the average error of the Canadian dollar is 3.9%. The average error is 640.84%, the average error of USD is 21571.04%, the average error of EUR is 1.66%, the average error of JPY is 1.15%, the average error of CHF is 2.77%, and the average error of SGD is 28959.17%. The average error is 6399.04%; when the data magnitude is normalized to 100, the experimental results are: the average error of the Australian dollar is 3.9%, the average error of the Hong Kong dollar is 1.97%, the average error of the Canadian dollar is 640.84%, and the average error of the US dollar is 640.84%. The average error is 21571.04%, the average error of the euro is 1.66%, the average error of the Japanese yen is 1233177%, the average error of the Swiss franc is 2.77%, the average error of the Singapore dollar is 28959.17%, and the average error of the experiment is 160544.8%; When the data magnitude is normalized to 10, the experimental results are: the average error of the Australian dollar is 1.77%, the average error of the Hong Kong dollar is 0.94%, the average error of the Canadian dollar is 0.39%, the average error of the US dollar is 0.11%, and the average error of the euro is 0.94%. The average error is 224438.5%, the average error is 1.05% for Japanese yen, 1.57% for Swiss franc, 0.50% for Singapore dollar, and 28055.6% for the experiment; normalized to 1 in the data magnitude , the experimental results are: the average error of the Australian dollar is 0.98%, the average error of the Hong Kong dollar is 0.94%, the average error of the Canadian dollar is 0.41%, the average error of the US dollar is 0.09%, the average error of the euro is 1.12%, the Japanese yen The average error is 1.29% for the Swiss franc, 1.91% for the Swiss franc, 0.16% for the Singapore dollar, and 0.86% for the experiment. The conclusion is that when the data magnitude is normalized to 1, the best prediction result is obtained, and the average prediction accuracy rate reaches 99.14%.

在改进的BP神经网络的实验环境下，在不进行归一化量级预处理时，原始数据的实验结果为：澳大利亚元的平均误差为1.31％，港币的平均误差为0.17％，加拿大元的平均误差为0.28％，美元的平均误差为0.26％，欧元的平均误差为1.41％，日元的平均误差为1.24％，瑞士法郎的平均误差为1.65％，新加坡元的平均误差为0.21％，实验的平均误差为0.82％；在数量级归一化为100时，实验结果为：澳大利亚元的平均误差为1.31％，港币的平均误差为0.24％，加拿大元的平均误差为0.46％，美元的平均误差为0.03％，欧元的平均误差为2.21％，日元的平均误差为1.14％，瑞士法郎的平均误差为1.59％，新加坡元的平均误差为2.98％，实验的平均误差为1.25％；在数据量级归一化为10时，实验结果为：澳大利亚元的平均误差为1.09％，港币的平均误差为0.28％，加拿大元的平均误差为0.13％，美元的平均误差为0.48％，欧元的平均误差为1.38％，日元的平均误差为2.54％，瑞士法郎的平均误差为1.93％，新加坡元的平均误差为0.06％，实验的平均误差为0.99％；在数据量级归一化为1时，实验结果为：澳大利亚元的平均误差为0.39％，港币的平均误差为0.18％，加拿大元的平均误差为0.37％，美元的平均误差为0.40％，欧元的平均误差为1.43％，日元的平均误差为1.18％，瑞士法郎的平均误差为1.74％，新加坡元的平均误差为0.28％，实验的平均误差为0.75％。结论是数据量级归一化为1时取得了最好的预测结果，预测的平均准确率高达99.25％。In the experimental environment of the improved BP neural network, without normalization magnitude preprocessing, the experimental results of the original data are: the average error of the Australian dollar is 1.31%, the average error of the Hong Kong dollar is 0.17%, and the average error of the Canadian dollar is 1.31%. The average error is 0.28%, the average error for USD is 0.26%, the average error for EUR is 1.41%, the average error for JPY is 1.24%, the average error for CHF is 1.65%, and the average error for SGD is 0.21%. The average error is 0.82%; when the order of magnitude is normalized to 100, the experimental results are: the average error of the Australian dollar is 1.31%, the average error of the Hong Kong dollar is 0.24%, the average error of the Canadian dollar is 0.46%, and the average error of the US dollar is 0.03%, the average error of the Euro is 2.21%, the average error of the Japanese Yen is 1.14%, the average error of the Swiss Franc is 1.59%, the average error of the Singapore Dollar is 2.98%, and the average error of the experiment is 1.25%. When level normalization is 10, the experimental results are: the average error of Australian dollar is 1.09%, the average error of Hong Kong dollar is 0.28%, the average error of Canadian dollar is 0.13%, the average error of US dollar is 0.48%, and the average error of euro is 1.38%, the average error of the Japanese Yen is 2.54%, the average error of the Swiss Franc is 1.93%, the average error of the Singapore Dollar is 0.06%, and the average error of the experiment is 0.99%. When the data magnitude is normalized to 1, The experimental results are: the average error of the Australian dollar is 0.39%, the average error of the Hong Kong dollar is 0.18%, the average error of the Canadian dollar is 0.37%, the average error of the US dollar is 0.40%, the average error of the euro is 1.43%, and the average error of the Japanese yen The error is 1.18%, the average error is 1.74% for Swiss francs, 0.28% for Singapore dollars, and 0.75% for experiments. The conclusion is that when the data magnitude is normalized to 1, the best prediction result is obtained, and the average prediction accuracy rate is as high as 99.25%.

以上实验数据说明了此数据预处理方法对同种类不同商品的通用性，为了说明此数据预处理方法对不同种类商品的通用性，利用从网页上抽取的10种不同农产品从2011年1月至2012年2月共59周的周平均价格数据作为原始数据，计算得出原始数据的量级为1和10，对原始数据数量级进行归一化预处理实验。The above experimental data shows the generality of this data preprocessing method for different commodities of the same type. In order to illustrate the generality of this data preprocessing method for different types of commodities, 10 different agricultural products extracted from The weekly average price data of 59 weeks in February 2012 was used as the original data, and the order of magnitude of the original data was calculated as 1 and 10, and the normalization preprocessing experiment was carried out on the order of magnitude of the original data.

在改进的RBF神经网络的实验环境下，在不进行归一化量级预处理时，原始数据的实验结果为：牛肉的平均误差为3149934％，豆油的平均误差为17.96％，鸡蛋的平均误差为1.61％，花生油的平均误差为2.89％，面粉的平均误差为0.11％，猪肉的平均误差为542574.4％，大米的平均误差为0.34％，白砂糖的平均误差为0.44％，调和油的平均误差为6.61％，羊肉的平均误差为325260％，实验的平均误差为401779.9％；在数据量级归一化为10时，实验结果为：牛肉的平均误差为3149934％，豆油的平均误差为17.96％，鸡蛋的平均误差为1.61％，花生油的平均误差为2.89％，面粉的平均误差为0.12％，猪肉的平均误差为542574.4％，大米的平均误差为0.34％，白砂糖的平均误差为0.44％，调和油的平均误差为6.61％，羊肉的平均误差为325260％，实验的平均误差为401779.9％；在数据量级归一化为1时，实验结果为：牛肉的平均误差为2.44％，豆油的平均误差为17.96％，鸡蛋的平均误差为1.61％，花生油的平均误差为0.91％，面粉的平均误差为0.11％，猪肉的平均误差为7.35％，大米的平均误差为0.34％，白砂糖的平均误差为0.44％，调和油的平均误差为0.13％，羊肉的平均误差为0.41％，实验的平均误差为3.17％。结论是数据量级归一化为1时的实验取得了最好的预测结果，预测的平均准确率达到96.83％。In the experimental environment of the improved RBF neural network, without normalization magnitude preprocessing, the experimental results of the original data are: the average error of beef is 3149934%, the average error of soybean oil is 17.96%, and the average error of eggs The average error of peanut oil is 1.61%, the average error of peanut oil is 2.89%, the average error of flour is 0.11%, the average error of pork is 542574.4%, the average error of rice is 0.34%, the average error of white sugar is 0.44%, the average error of blended oil is 6.61%, the average error of mutton is 325260%, and the average error of the experiment is 401779.9%; when the data magnitude is normalized to 10, the experimental results are: the average error of beef is 3149934%, and the average error of soybean oil is 17.96% , the average error of eggs is 1.61%, the average error of peanut oil is 2.89%, the average error of flour is 0.12%, the average error of pork is 542574.4%, the average error of rice is 0.34%, and the average error of white sugar is 0.44%, The average error of blended oil is 6.61%, the average error of mutton is 325260%, and the average error of experiment is 401779.9%. When the data level is normalized to 1, the experimental results are: the average error of beef is 2.44%, the The average error is 17.96%, the average error of eggs is 1.61%, the average error of peanut oil is 0.91%, the average error of flour is 0.11%, the average error of pork is 7.35%, the average error of rice is 0.34%, and the average error of white sugar The error is 0.44%, the average error of blended oil is 0.13%, the average error of mutton is 0.41%, and the average error of experiment is 3.17%. The conclusion is that the experiment when the data magnitude is normalized to 1 has achieved the best prediction results, and the average prediction accuracy rate reaches 96.83%.

本发明可与计算机系统结合，从而自动完成商品价格的预测。The invention can be combined with a computer system to automatically complete the forecast of commodity prices.

本发明创造性的提出了一种基于神经网络的多品种商品价格预测的数据预处理方法，并将该数据预处理方法应用于人民币汇率、农产品等商品价格数据的预处理，利用改进的RBF神经网络和BP神经网络在预处理后的价格数据上进行商品价格的预测，提高了预测方法的通用性，获得了更高的预测准确率，具有很高的实用价值。The present invention creatively proposes a data preprocessing method based on neural network multi-variety commodity price prediction, and applies the data preprocessing method to the preprocessing of commodity price data such as RMB exchange rate and agricultural products, and utilizes the improved RBF neural network Using the BP neural network to predict commodity prices on the preprocessed price data improves the versatility of the prediction method and obtains higher prediction accuracy, which has high practical value.

本发明提出的一种基于神经网络的多品种商品价格预测的数据预处理方法不但可以用于人民币汇率和农产品生产与销售领域价格预测时的数据预处理，也可以用于其他消费类商品价格预测时的数据预处理。A neural network-based data preprocessing method for multi-variety commodity price prediction proposed by the present invention can not only be used for data preprocessing in RMB exchange rate and agricultural product production and sales field price prediction, but also can be used for other consumer commodity price prediction time data preprocessing.

Claims

1. A data preprocessing method based on neural network-based multi-species commodity price prediction, characterized in that: utilize improved RBF neural network and BP neural network to calculate its optimal order of magnitude for the commodity price data of webpage mining, and obtain The optimal order of magnitude preprocesses the normalized data magnitude of commodity prices, thereby improving the prediction accuracy of RBF neural network and BP neural network, and also improving the performance of RBF neural network and BP neural network for price prediction of different commodities Versatility, specifically including the following steps:

Step 1. Extract the name, model, type and price data of commodities in the webpage, and establish a data set X={X ₁ , A ₂ ,...,A _h } with h commodities, and set the extracted price of the i-th commodity There are n pieces of data, A _i = {x ₁ , x ₂ , ..., x _n }, where i∈[1, h], x ₁ , x ₂ , ..., x _n refers to the item A _i The extracted n price data;

Step 2. Calculate the price magnitudes of i different commodities, and obtain the price magnitudes of different commodities M={b ₁ , b ₂ ,..., b _h };

Step 3. Customize a forecast sample that contains the number of data z, and the total number of predicted prices is D;

Step 4, select the prediction model;

Step 5, when the selected prediction model is RBF neural network, perform steps 6 to 12; when the selected prediction model is BP neural network, perform steps 14 to 21;

Step 6, set the model training function as the newrbe (P, T, SPREAD) function in the technical computing language MATLAB, which is used to design a strict radial basis network, where P is the input vector, T is the target vector, and SPREAD is the distribution of radial basis functions; the model prediction function is the sim('MODEL', PARAMETERS) function in the technical computing language MATLAB, which is used to simulate a neural network, where MODEL is the trained network model, and PARAMETERS is the input vector ;Set the distribution values of j different radial basis functions Spreads={spread ₁ , spread ₂ ,..., spread _j };

Step 9. The j predicted value Y _ij of commodity A _i on day n+1 = sim(net _ij , Test), and the best predicted value of commodity A _i on day n+1 is y _i , t _i ∈ Y _ij ;

Step 10. Define the coupling weight W=(w ₁ , w ₂ , w ₃ ), and set the distribution values of the three best forecast radial basis functions of commodity A _i on day n+1 as Bspread _i1 ∈ Spreads, Bspread _i2 ∈ Spreads, Bspread _i3 ∈ Spreads, find the value of the distribution of the best radial basis function

Bspread = \frac{{Bspread}_{i 1} * w_{1} + {Bespread}_{i 2} * w_{2} + {Bspread}_{i 3} * w_{3}}{w_{1} + w_{2} + w_{3}};

Step 11, training invariant network net=newrbe(P, T, Bspread);

Step 12. Bring in the best forecast value y _i as the forecast sample for the next forecast. The method is t ₁ = last forecast sample [t ₁ ] in the new forecast sample [t ₁ , t ₂ , ..., t _z ] _. _{_} _{_} _{_} _{_} _{_} _{_} _{_} _{_} .., t _z ] in t ₃ , ..., t _z- 1 in the new forecast sample [t ₁ , t ₂ , ..., t _z ] = last forecast sample [t ₁ , t ₂ , .. ., t _z ] in t _z , in the new forecast sample [t ₁ , t ₂ ,..., t _z ]], t _z =y _i , get the new forecast sample Test=[t ₁ , t ₂ , ..., t _z ], the predicted value y _i of the product on the n+2th day = sim(net, Test);

Step 13. Repeat step 12 to obtain all predicted values of commodity A _i ; repeat steps 7 to 12 to obtain predicted values of all commodities in data set X at different orders of magnitude, and obtain the best predicted order of magnitude O, O∈M;

Step 14, setting the model training function as the NET=newff(P, T, NEURON) function and NET'=train(NET, P, T) function in the technical computing language MATLAB, wherein the newff() function is used to create a previous Feed BP network, P is the input vector, T is the target vector, NEURON is the number of neurons in the hidden layer, the train() function is used to train a neural network, NET is the created feed-forward BP network; the model prediction function is NET' (Test), wherein Test is a prediction sample; The value Neurons={neuron ₁ , neuron ₂ ,..., neuron _j } of j different hidden layer neuron numbers is set;

Step 17. The j predicted value Y _ij of commodity A _i on day n+1 = net _i (Test), assuming that the best predicted value of commodity A _i on day n+1 is y _i , y _i ∈ _{Y ij} ;

Step 18. Define the coupling weight W=(w ₁ , w ₂ , w ₃ ), and set the value of the three best predicted hidden layer neurons on day n+1 of commodity A _i as Bneuron _i1 ∈ Neurons, Bneuron _i2 ∈ Neurons, Bneuron _i3 ∈ Neurons, find the value of the optimal number of neurons in the hidden layer

Bneuron = \frac{{Bneuron}_{i 1} * w_{1} + {Bneuron}_{i 2} * w_{2} + {Bneuron}_{i 3} * w_{3}}{w_{1} + w_{2} + w_{3}};

Step 19, training invariant network net=newff (P, T, Bneuron), net=train (net, P, T);

Step 20, bring in the best predicted value y _i as the forecast sample for the next forecast, the method is t ₁ in the new forecast sample [t ₁ , t ₂ , ..., t _z ] = last forecast sample [t ₁ _. _{_} _{_} _{_} _{_} _{_} _{_} _{_} _{_} .., t _z ] in t ₃ , ..., t _z- 1 in the new forecast sample [t ₁ , t ₂ , ..., t _z ] = last forecast sample [t ₁ , t ₂ , .. ., t _z ], t _z in the new prediction sample [t ₁ , t ₂ ,..., t _z ], t _z =y _i , and get a new prediction sample Test=[t ₁ , t ₂ ,. .., t _z ], the predicted value y _i =net(Test) of the commodity on the n+2th day;

Step 21. Repeat step 20 to obtain all predicted values of commodity A _i ; repeat steps 15 to 20 to obtain predicted values of all commodities in data set X at different orders of magnitude, and obtain the best predicted order of magnitude O, O∈M.

2. The data preprocessing method of a kind of neural network-based multi-variety commodity price prediction according to claim 1, characterized in that: the name, model, type and price data of the commodity in the web page extracted as described in step 1 refer to , use any web data extraction algorithm to extract the name, model, type and price data of the product displayed on the web page; where x ₁ , x ₂ ,..., x _n can be the i-th product A _i extracted from a web page n pieces of price data, or n pieces of average price data extracted from multiple web pages.

3. A neural network-based data preprocessing method for multi-variety commodity price prediction according to claim 1, characterized in that: step 2 is to calculate the price data of any commodity to obtain the magnitude of the commodity price data.

4. A neural network-based data preprocessing method for multi-species commodity price prediction according to claim 1, characterized in that: Steps 3 to 5 are for parameter setting and prediction of any commodity in price prediction The model is selected, where the z value is generally 3, 5, 7, and the D value is generally 3, 7.

5. The data preprocessing method of a kind of neural network-based multi-variety commodity price prediction according to claim 1, characterized in that: in step 6 and step 14, the technical computing language MATLAB is a product of MathWorks, and its version is R2011b.

6. The data preprocessing method of a kind of neural network-based multi-variety commodity price prediction according to claim 1, characterized in that: step 6 to step 12 is for any one commodity in a webpage for price data of different dates in The predicted value under the improved RBF neural network, or the predicted value of the average price data on different dates in multiple web pages under the improved RBF neural network.

7. The data preprocessing method of a kind of neural network-based multi-variety commodity price prediction according to claim 1, characterized in that: step 14 to step 20 is for any one commodity in a webpage for price data of different dates in The predicted value under the improved BP neural network, or the predicted value of the average price data of different dates in multiple web pages under the improved BP neural network.

8. The data processing method of a kind of neural network-based multi-species commodity price prediction according to claim 1, characterized in that: the input vector P in step 6, step 8, step 14 and step 16 is a training sample set , and the target vector T is the dataset for training and testing predictors.

9. The data processing and sending method of a kind of neural network-based multi-variety commodity price prediction according to claim 1, characterized in that: the preset j value in step 6 is generally 40, and the preset j value in step 14 is generally 40. The value of j is generally 10.

10. The data processing method of a kind of neural network-based multi-variety commodity price prediction according to claim 1, characterized in that: in step 7 and step 15, the order of magnitude of the price data of any commodity is normalized to a unified The magnitude of the commodity's price data is the same as the normalized magnitude, and the magnitude of the commodity's price data is not subjected to normalized magnitude preprocessing; the magnitude of the commodity's price data is different from the normalized magnitude , the order of magnitude of the price data of the product is preprocessed with normalized order of magnitude, and the order of magnitude is generally 1, 10, 100, 1000.

11. A neural network-based data preprocessing method for multi-variety commodity price prediction according to claim 1, characterized in that: the coupling weight w=[2, 4, 2] defined in step 10 and step 18.