CN116703003A

CN116703003A - A Prediction Method of Residential Water Consumption

Info

Publication number: CN116703003A
Application number: CN202310783082.0A
Authority: CN
Inventors: 林涛; 刘康乐; 张雪; 孙军益; 沈月生
Original assignee: China And Korea Dooch Pump Manufacturing Shanghai Co ltd; Jiangsu Province Urban Water Supply Security Support Center; Suzhou Water Supply Co ltd; Hohai University HHU
Current assignee: China And Korea Dooch Pump Manufacturing Shanghai Co ltd; Jiangsu Province Urban Water Supply Security Support Center; Suzhou Water Supply Co ltd; Hohai University HHU
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-09-05

Abstract

The invention discloses a prediction method of residential water consumption, which comprises the following steps: s1, acquiring water consumption data of a research area in a research period through an intelligent water meter; s2, cleaning the data by adopting a Z-Score method, and removing and repairing abnormal values in the original data; s3, decomposing resident water consumption data into a plurality of subsequences with high frequency from high to low frequency through empirical mode decomposition of a time-varying filter; s4, dividing the subsequence into a high-frequency subsequence and a low-frequency subsequence according to the frequency, and reconstructing water consumption data into the high-frequency subsequence and the low-frequency subsequence; s5, respectively predicting the subsequences by using a transducer model; and S6, superposing the predicted values of the subsequences to obtain predicted water consumption of residents. According to the invention, the TVF-EMD is adopted to decompose the original data, so that the nonlinearity and the non-stationarity of the data are improved, and the prediction is easier; the transducer model increases the prediction accuracy and increases the running speed.

Description

A Prediction Method of Residential Water Consumption

技术领域technical field

本发明涉及水量预测方法，具体为一种居民用水量的预测方法。The invention relates to a water quantity prediction method, in particular to a method for predicting residential water consumption.

背景技术Background technique

近年来，供水行业的节能减排成为了热点，对用户水量进行精准预测是实现送水泵房水量精准泵送的基础，从而降低水厂能耗，减少供水系统碳排放。用户水量的预测分为长期预测、中期预测与短期预测，短期预测主要指日、时、分钟三个尺度上的水量预测。为了科学调度用户水量，水量的短期预测成为了亟待解决的难题。In recent years, energy conservation and emission reduction in the water supply industry has become a hot spot. Accurate prediction of user water volume is the basis for accurate water pumping in water delivery pump rooms, thereby reducing energy consumption in water plants and reducing carbon emissions in water supply systems. The forecast of user water volume is divided into long-term forecast, medium-term forecast and short-term forecast. The short-term forecast mainly refers to the water volume forecast on three scales of day, hour and minute. In order to scientifically regulate the water volume of users, the short-term prediction of water volume has become an urgent problem to be solved.

传统的水量预测方法有着精度与颗粒度较低的缺陷。近年来，越来越多的学者提出了用机器学习的方法来预测用户水量。传统的机器学习预测居民用水量时，不能包含水量的时序性信息，预测效果并不理想，近年来发展起来的循环神经网络考虑了数据的时间序列信息，使得用户水量的短期预测成为可能。现有的用水量预测方法大多都使用了循环神经网络，然而循环神经网络因其结构问题不能进行并行运算，预测效率低下，且占用内存较多。此外，现有的用水量预测基本上为日用水量和时用水量的预测，缺少分钟级用水量预测。The traditional water quantity prediction method has the defects of low accuracy and granularity. In recent years, more and more scholars have proposed the method of using machine learning to predict the water consumption of users. When traditional machine learning predicts residential water consumption, it cannot include time-series information of water quantity, and the prediction effect is not ideal. The recurrent neural network developed in recent years considers the time-series information of data, making short-term prediction of user water consumption possible. Most of the existing water consumption prediction methods use recurrent neural networks. However, due to structural problems, recurrent neural networks cannot perform parallel operations, and the prediction efficiency is low, and it takes up a lot of memory. In addition, the existing water consumption forecasts are basically daily and hourly water consumption forecasts, lacking minute-level water consumption forecasts.

发明内容Contents of the invention

发明目的：为了克服现有技术中存在的不足，本发明的目的是提供一种预测精度和颗粒度高、运行时间短的居民用水量的预测方法。Purpose of the invention: In order to overcome the deficiencies in the prior art, the purpose of the invention is to provide a prediction method of residential water consumption with high prediction accuracy and granularity and short running time.

技术方案：本发明所述的一种居民用水量的预测方法，包括以下步骤：Technical solution: a method for predicting residential water consumption according to the present invention, comprising the following steps:

S1、通过智能水表获取研究区在研究时段内的用水量数据；S1. Obtain the water consumption data of the research area during the research period through the smart water meter;

S2、采用Z-Score法对数据进行清理，清除和修复原始数据中的异常值；S2. Use the Z-Score method to clean up the data, remove and repair the abnormal values in the original data;

S3、用时变滤波器经验模态分解将居民用水量数据分解为频率由高到低的多个子序列；S3. Using the time-varying filter empirical mode decomposition to decompose the residential water consumption data into multiple subsequences with frequencies ranging from high to low;

S4、根据频率将子序列划分为高频子序列和低频子序列，将用水量数据重构为高频和低频两个子序列；S4. Divide the subsequences into high frequency subsequences and low frequency subsequences according to frequency, and reconstruct the water consumption data into high frequency subsequences and low frequency subsequences;

S5、使用Transformer模型分别对两个子序列进行预测；S5. Using the Transformer model to predict the two subsequences respectively;

S6、将子序列的预测值进行叠加，得到预测的居民用水量。S6. Superimpose the predicted values of the subsequences to obtain the predicted residential water consumption.

进一步地，步骤S2具体包括以下步骤：Further, step S2 specifically includes the following steps:

计算用水量数据的Z得分，计算方法为：To calculate the Z-score of the water consumption data, the calculation method is:

式中，Z_i为第i个数据的Z得分，x_i为第i个数据的数据值，为数据集的平均值，σ为数据集的标准差；In the formula, Z _i is the Z score of the i-th data, x _i is the data value of the i-th data, is the mean value of the data set, σ is the standard deviation of the data set;

接着设置数据集的临界Z分数(Z_t)，如果|Z_i|＞Z_t，则认为此数据为异常值，并使用随机插值法对其进行替换，根据数据集的具体情况进行判断选用。Then set the critical Z score (Z _t ) of the data set. If |Z _i |>Z _t , the data is considered to be an outlier, and the random interpolation method is used to replace it, and the selection is made according to the specific situation of the data set.

进一步地，步骤S3具体包括以下步骤：Further, step S3 specifically includes the following steps:

S3-1、计算局部截止频率，公式为：S3-1. Calculate the local cut-off frequency, the formula is:

式中，为局部截止频率，η₁(t)，η₂(t)，a₁(t)，a₂(t)均为构造的函数，构造方法详见具体实施方式；In the formula, Be the local cut-off frequency, η ₁ (t), η ₂ (t), a ₁ (t), a ₂ (t) are all constructed functions, and the construction method is detailed in the specific implementation;

对局部截止频率进行重排，得到最终的局部截止频率；Rearrange the local cutoff frequency to obtain the final local cutoff frequency;

S3-2、利用时变滤波器对输入信号进行滤波，获得逼近结果；S3-2. Using a time-varying filter to filter the input signal to obtain an approximation result;

在获取局部截止频率后，可得信号h(t)：After obtaining the local cutoff frequency, the signal h(t) can be obtained:

以信号h(t)极值的时间点作为节点m，构造B样条逼近时变滤波器，将该滤波器的截止频率设置为随后对输入信号进行B样条逼近时变滤波，将逼近结果记录为m(t)；Taking the time point of the extremum value of the signal h(t) as the node m, construct a B-spline approximation time-varying filter, and set the cut-off frequency of the filter as Then B-spline approximation time-varying filtering is performed on the input signal, and the approximation result is recorded as m(t);

S3-3、判断m(t)是否满足窄带信号条件，如果满足，则停止对输入信号进行分解，如果不满足，则令x(t)＝x(t)-m(t)并重复S3-1和S3-2步骤，继续对输入信号进行分解，直至达到停止条件。S3-3, judging whether m(t) satisfies the narrowband signal condition, if so, stop decomposing the input signal, if not, make x(t)=x(t)-m(t) and repeat S3- Steps 1 and S3-2, continue to decompose the input signal until the stop condition is met.

进一步地，判断m(t)是否满足窄带信号条件方法为：若θ(t)≤Δ，则信号可以认为是窄带信号，θ(t)的计算公式为：Further, the method of judging whether m(t) satisfies the narrowband signal condition is: if θ(t)≤Δ, the signal can be considered as a narrowband signal, and the calculation formula of θ(t) is:

式中，B_Loughlin(t)为Loughlin瞬时带宽，为加权平均瞬时频率；In the formula, B _Loughlin (t) is Loughlin instantaneous bandwidth, is the weighted average instantaneous frequency;

S3-4、通过S3-1与S3-2获得的m(t)均为TVF-EMD法分解初始信号得到的一个子序列，根据m(t)的先后顺序，将其记录并命名为C1、C2…Cn，共获得了频率由高到低的n个子序列。S3-4. The m(t) obtained through S3-1 and S3-2 is a subsequence obtained by decomposing the initial signal by the TVF-EMD method. According to the sequence of m(t), record it and name it as C1, C2...Cn, a total of n subsequences with frequency from high to low are obtained.

进一步地，步骤S4具体包括以下步骤：Further, step S4 specifically includes the following steps:

S4-1、将C1记为指标1，C1+C2为指标2，以此类推，前i个C序列的和加成为指标i，计算指标1至指标n的均值，并对该均值是否显著区别于0进行t检验，如果t检验在指标k处显著不为0，那么将C1～C(k-1)归类为高频分量，将Ck～Cn归类为低频分量；S4-1. Record C1 as index 1, C1+C2 as index 2, and so on, the sum of the previous i C sequences is index i, calculate the average value of index 1 to index n, and check whether the average is significantly different Perform t-test at 0, if the t-test is significantly different from 0 at index k, then classify C1~C(k-1) as high-frequency components, and classify Ck~Cn as low-frequency components;

S4-2、根据信号频率对子序列进行重构以减少建立的模型数，C1～C(k-1)代表高频分量，则重构后的高频分量Cnew1为：S4-2. Reconstruct the subsequence according to the signal frequency to reduce the number of established models. C1～C(k-1) represent high-frequency components, and the reconstructed high-frequency component Cnew1 is:

Cnew1＝C1+C2+…+C(k-1)Cnew1＝C1+C2+...+C(k-1)

Ck～Cn代表低频分量，则重构后的低频分量Cnew2为：Ck～Cn represent low-frequency components, and the reconstructed low-frequency component Cnew2 is:

Cnew2＝Ck+C(k+1)+…+Cn。Cnew2=Ck+C(k+1)+...+Cn.

进一步地，步骤S5具体包括以下步骤：Further, step S5 specifically includes the following steps:

S5-1、将输入序列的用水量数据转化为嵌入向量，随后对每个向量进行位置编码；S5-1. Convert the water consumption data of the input sequence into embedded vectors, and then perform position encoding on each vector;

S5-2、将用水数据的嵌入向量和位置编码相加得到输入序列的向量表示矩阵，将输入序列的向量表示矩阵传入编码器中，编码器由多个编码器层组成，每个编码器层包括多头注意力机制、残差连接和层归一化、前馈神经网络、残差连接和层归一化；S5-2. Add the embedding vector and position code of the water data to obtain the vector representation matrix of the input sequence, and pass the vector representation matrix of the input sequence into the encoder. The encoder is composed of multiple encoder layers, each encoder Layers include multi-head attention mechanisms, residual connections and layer normalization, feed-forward neural networks, residual connections and layer normalization;

S5-3、解码器包含多个与编码器进行对应的解码器层，每个解码器层包括掩码多头注意力机制、残差连接和层归一化、多头注意力机制、残差连接和层归一化、前馈神经网络、残差连接和层归一化；S5-3. The decoder contains multiple decoder layers corresponding to the encoder. Each decoder layer includes masked multi-head attention mechanism, residual connection and layer normalization, multi-head attention mechanism, residual connection and Layer normalization, feed-forward neural networks, residual connections, and layer normalization;

S5-4、使用数据的前80％作为训练集，数据的后20％作为测试集，将用水量数据分解重构信号序列输入Transformer模型进行训练与结果预测，输出模型预测序列。S5-4. Using the first 80% of the data as a training set and the last 20% of the data as a test set, input the decomposed and reconstructed signal sequence of the water consumption data into the Transformer model for training and result prediction, and output the model prediction sequence.

进一步地，步骤S5-1的位置编码的公式为：Further, the formula of the position encoding in step S5-1 is:

式中，PE_2i(p)为偶数位置编码，PE_2i+1(p)为奇数位置编码，p为该时刻用水量在n个时刻中的位置，d为PE的维度。In the formula, PE _2i (p) is an even-numbered position code, PE _2i+1 (p) is an odd-numbered position code, p is the position of the water consumption at this moment in n moments, and d is the dimension of PE.

进一步地，步骤S5-2中，多头注意力机制的输出通过残差连接以及层归一化进行加速收敛，最后经过前馈神经网络使用多层全连接层得到最后的输出编码信息。Further, in step S5-2, the output of the multi-head attention mechanism is accelerated to converge through residual connection and layer normalization, and finally the final output coding information is obtained through the feed-forward neural network using multiple layers of fully connected layers.

进一步地，S5-3中，掩码多头注意力机制在多头注意力机制的基础上加入了上三角矩阵以屏蔽未来序列的信息，解码器部分的多头注意力机制的Q、K来源于相应编码器层的输出，V来源于解码器的输入特征。Furthermore, in S5-3, the masked multi-head attention mechanism adds an upper triangular matrix on the basis of the multi-head attention mechanism to shield the information of the future sequence, and the Q and K of the multi-head attention mechanism in the decoder part come from the corresponding encoding The output of the decoder layer, V is derived from the input features of the decoder.

进一步地，步骤S6的叠加公式为：Further, the superposition formula of step S6 is:

W_pred＝W_new1,p+W_new2,p W _pred = W _new1,p +W _new2,p

式中，W_pred为用户水量预测值，W_new1,p与W_new2,p分别为高频分量子序列预测值与低频分量子序列预测值。In the formula, W _pred is the predicted value of the user's water volume, W _new1,p and W _new2,p are the predicted value of the high-frequency component subsequence and the predicted value of the low-frequency component subsequence, respectively.

有益效果：本发明和现有技术相比，具有如下显著性特点：Beneficial effects: compared with the prior art, the present invention has the following remarkable features:

1、采用TVF-EMD对原始数据进行分解，显著改善了数据的非线性和非平稳性，使得数据更易于预测；1. Use TVF-EMD to decompose the original data, which significantly improves the nonlinearity and non-stationarity of the data, making the data easier to predict;

2、对分解子序列进行重构，减少了需要建立的模型数量，减少了设备运行要求及运行时间；2. Reconstruct the decomposed subsequence, reducing the number of models that need to be established, reducing equipment operation requirements and operation time;

3、使用Transformer模型进行预测，数据位置关联操作不受限，建模能力强，通用性强，可扩展性强，能更好的进行并行运算；3. Using the Transformer model for prediction, data location association operations are not limited, strong modeling capabilities, strong versatility, strong scalability, and better parallel computing;

4、TVF-EMD-Transformer混合模型可以更好的预测分钟级用户水量，达到颗粒度与精度双高的预测结果。4. The TVF-EMD-Transformer hybrid model can better predict the water volume of users at the minute level, and achieve the prediction results with high granularity and precision.

附图说明Description of drawings

图1是本发明的流程图；Fig. 1 is a flow chart of the present invention;

图2是本发明Transformer模型结构图。Fig. 2 is a structure diagram of the Transformer model of the present invention.

具体实施方式Detailed ways

如图1，一种居民用水量的预测方法，包括以下步骤：As shown in Figure 1, a method for predicting residential water consumption includes the following steps:

S1、通过智能水表获取研究区在研究时段内的用水量数据，数据的时间间隔为15min。S1. Obtain the water consumption data in the study area during the study period through the smart water meter, and the time interval of the data is 15 minutes.

S2、采用Z-Score法对数据进行清理，以清除和修复原始数据中的异常值，具体包括：S2. Use the Z-Score method to clean the data to remove and repair the abnormal values in the original data, including:

式中，Z_i为第i个数据的Z得分；x_i为第i个数据的数据值；为数据集的平均值；σ为数据集的标准差，数据集的平均值和标准差分别通过式(2)-(3)进行计算。In the formula, Z _i is the Z score of the i-th data; x _i is the data value of the i-th data; is the average value of the data set; σ is the standard deviation of the data set, and the average value and standard deviation of the data set are calculated by formulas (2)-(3) respectively.

接着设置数据集的临界Z分数(Z_t)。如果|Z_i|＞Z_t，则认为此数据为异常值，并使用随机插值法对其进行替换。Z_t通常为2.5、3.0、3.5，根据数据集的具体情况进行判断选用。A critical Z-score (Z _t ) for the data set is then set. If |Z _i | > Z _t , the data is considered an outlier and replaced using random interpolation. Z _t is usually 2.5, 3.0, or 3.5, and is selected according to the specific conditions of the data set.

式(2)-(3)：Formula (2)-(3):

式中，n为数据集的数据数量，其他符号同公式(1)。In the formula, n is the number of data in the data set, and other symbols are the same as formula (1).

S3、考虑用水量数据的复杂性，使用时变滤波器经验模态分解(TVF-EMD)将居民用水量数据分解为频率由高到低的多个子序列，具体包括：S3. Considering the complexity of water consumption data, use time-varying filter empirical mode decomposition (TVF-EMD) to decompose residential water consumption data into multiple subsequences with frequencies ranging from high to low, specifically including:

S3-1、计算局部截止频率。S3-1. Calculating the local cutoff frequency.

利用希尔伯特变换计算输入信号x(t)的瞬时幅值及瞬时频率，即：Use the Hilbert transform to calculate the instantaneous amplitude and instantaneous frequency of the input signal x(t), namely:

式中，A(t)、分别为输入信号的瞬时幅值和瞬时频率；x(t)和/>分别表示原始输入信号(居民用水量的时间序列数据)和经过希尔伯特变换后的输入信号。In the formula, A(t), are the instantaneous amplitude and instantaneous frequency of the input signal, respectively; x(t) and /> Respectively represent the original input signal (time series data of residential water consumption) and the input signal after Hilbert transformation.

接着确定瞬时幅值A(t)的最大值序列与最小值序列，记作{t_max}和{t_min}。其解析信号z(t)可表示为：Then determine the maximum value sequence and minimum value sequence of the instantaneous amplitude A(t), denoted as {t _max } and {t _min }. Its analytical signal z(t) can be expressed as:

式中，a_i和分别为第i阶分量的幅值和相位。In the formula, a _i and are the magnitude and phase of the i-th order component, respectively.

将式(6)处理后可得：After processing formula (6), we can get:

当t＝t_min时，由式(7)可得：When t=t _min , it can be obtained from formula (7):

将式(9)代入式(7)和式(8)可得：Substituting formula (9) into formula (7) and formula (8) can get:

A(t_min)＝|a₁(t_min)-a₂(t_min)| (10)A(t _min )＝|a ₁ (t _min )-a ₂ (t _min )| (10)

由于A(t_min)是一个局部极小值，故有A’(t_min)＝0，则：Since A(t _min ) is a local minimum, so A'(t _min )=0, then:

a′₁(t_min)-a′₂(t_min)＝0 (12)a′ ₁ (t _min )-a′ ₂ (t _min )=0 (12)

通过求解式(9)-(12)可得a₁(t_min)，a₂(t_min)，和/>同理，根据式(13)-(16)可得a₁(t_max)，a₂(t_max)，/>和/> By solving formulas (9)-(12), a ₁ (t _min ), a ₂ (t _min ), and /> Similarly, according to formulas (13)-(16), a ₁ (t _max ), a ₂ (t _max ), /> and />

A(t_max)＝a₁(t_max)+a₂(t_max) (14)A(t _max )＝a ₁ (t _max )+a ₂ (t _max ) (14)

a′₁(t_max)+a′₂(t_max)＝0 (16)a' ₁ (t _max )+a' ₂ (t _max )=0 (16)

令：make:

β₁(t)＝|a₁(t)-a₂(t)| (17)β ₁ (t)=|a ₁ (t)-a ₂ (t)| (17)

β₂(t)＝a₁(t)+a₂(t) (18)β ₂ (t) = a ₁ (t) + a ₂ (t) (18)

将t＝t_min和t＝t_max分别代入式(17)和式(18)得：Substitute t=t _min and t=t _max into formula (17) and formula (18) respectively to get:

β₁(t_min)＝|a₁(t_min)-a₂(t_min)|＝A(t_min) (19)β ₁ (t _min )=|a ₁ (t _min )-a ₂ (t _min )|=A(t _min ) (19)

β₂(t_max)＝a₁(t_max)+a₂(t_max)＝A(t_max) (20)β ₂ (t _max )=a ₁ (t _max )+a ₂ (t _max )=A(t _max ) (20)

因为a₁(t)与a₂(t)变化慢，β₁(t)与β₂(t)可通过序列A{t_min}和A{t_max}插值得到，因此：Because a ₁ (t) and a ₂ (t) change slowly, β ₁ (t) and β ₂ (t) can be obtained by interpolating the sequences A{t _min } and A{t _max }, therefore:

同理，构造如下函数：Similarly, construct the following function:

将t＝t_min和t＝t_max分别代入式(23)和式(24)得：Substitute t=t _min and t=t _max into formula (23) and formula (24) respectively to get:

类似地，通过求解式(23)和式(24)得：Similarly, by solving formula (23) and formula (24):

将上述步骤中求解出来的各个参数，代入局部截止频率计算公式可得：Substitute the parameters obtained in the above steps into the local cutoff frequency The calculation formula can be obtained:

为了解决截止频率受到噪声影响的问题，对截止频率进行重排，规则如下：In order to solve the problem that the cutoff frequency is affected by noise, the cutoff frequency is rearranged according to the following rules:

找出信号x(t)的局部最大值，记为u_i(i＝1,2,3…)，若u_i满足下式，则记为e_j＝u_i(j＝1,2,3…)。Find the local maximum value of the signal x(t), which is recorded as u _i (i=1,2,3…), if u _i satisfies the following formula, it is recorded as e _j =u _i (j=1,2,3 ...).

式中，ρ为一个阈值参数，取0.25。如果则e_j为上升沿，反之为下降沿。对每个e_j进行判断，如果处于上升沿，则/>为底，如果处于下降沿，则/>为底，其余部分为峰。在两个峰之间进行插值，得到最终的截止频率。In the formula, ρ is a threshold parameter, which is 0.25. if Then e _j is the rising edge, otherwise it is the falling edge. Judge each e _j , if it is on the rising edge, then /> is the bottom, if it is on the falling edge, then /> is the bottom, and the rest are peaks. Interpolate between the two peaks to get the final cutoff frequency.

S3-2、利用时变滤波器对输入信号进行滤波，获得逼近结果。S3-2. Using a time-varying filter to filter the input signal to obtain an approximation result.

以信号h(t)极值的时间点作为节点m，构造B样条逼近时变滤波器，将该滤波器的截止频率设置为随后对输入信号进行B样条逼近时变滤波，将逼近结果记录为m(t)。Taking the time point of the extremum value of the signal h(t) as the node m, construct a B-spline approximation time-varying filter, and set the cut-off frequency of the filter as Then B-spline approximation time-varying filtering is performed on the input signal, and the approximation result is recorded as m(t).

S3-3、判断停止条件。S3-3. Determine the stop condition.

判断m(t)是否满足窄带信号条件，如果满足，则停止对输入信号进行分解，如果不满足，则令x(t)＝x(t)-m(t)并重复S3-1和S3-2步骤，继续对输入信号进行分解，直至达到停止条件。Judge whether m(t) satisfies the narrowband signal condition, if so, stop decomposing the input signal, if not, set x(t)=x(t)-m(t) and repeat S3-1 and S3- 2 step, continue to decompose the input signal until the stop condition is reached.

判断信号是否满足窄带信号条件的方法为：若θ(t)≤Δ，则信号可以认为是窄带信号。θ(t)通过式(32)进行计算，式中，θ(t)为准则值；为加权平均瞬时频率，通过式(33)计算；B_Loughlin(t)为Loughlin瞬时带宽，通过式(34)计算。The method for judging whether the signal satisfies the narrowband signal condition is: if θ(t)≤Δ, the signal can be considered as a narrowband signal. θ(t) is calculated by formula (32), where θ(t) is the criterion value; is the weighted average instantaneous frequency, calculated by formula (33); B _Loughlin (t) is the Loughlin instantaneous bandwidth, calculated by formula (34).

S3-4、子序列的获得与存储。S3-4. Obtaining and storing the subsequence.

每一步通过S3-1与S3-2获得的m(t)均为TVF-EMD法分解初始信号得到的一个子序列，根据m(t)的先后顺序，将其记录并命名为C1、C2…Cn。共获得了频率由高到低的n个子序列。The m(t) obtained through S3-1 and S3-2 in each step is a subsequence obtained by decomposing the initial signal by the TVF-EMD method. According to the sequence of m(t), record and name them as C1, C2... Cn. A total of n subsequences with frequency from high to low are obtained.

S4、根据频率将子序列划分为高频子序列和低频子序列，将用水量数据重构为高频和低频两个子序列，具体包括：S4. Divide subsequences into high-frequency subsequences and low-frequency subsequences according to frequency, and reconstruct water consumption data into high-frequency and low-frequency subsequences, specifically including:

S4-1、判断子序列的信号频率。S4-1. Determine the signal frequency of the subsequence.

将C1记为指标1，C1+C2为指标2，以此类推，前i个C序列的和加成为指标i，计算指标1至指标n的均值，并对该均值是否显著区别于0进行t检验。如果t检验在指标k处显著不为0，那么将C1-C(k-1)归类为高频分量，将Ck-Cn归类为低频分量。Record C1 as index 1, C1+C2 as index 2, and so on, the sum of the previous i C sequences is index i, calculate the mean value from index 1 to index n, and perform t on whether the mean value is significantly different from 0 test. If the t-test is significantly different from 0 at index k, then C1-C(k-1) is classified as a high-frequency component, and Ck-Cn is classified as a low-frequency component.

S4-2、根据信号频率进行分量重构。S4-2. Perform component reconstruction according to the signal frequency.

因为根据原始信号分解的子序列有多个，如果对每一个子序列都进行建模，则需要建立的模型数量很多，需要的电脑内存和运算时间都要增加，所以根据信号频率对子序列进行重构以减少建立的模型数，提高预测效率。Because there are multiple subsequences decomposed according to the original signal, if each subsequence is modeled, a large number of models need to be established, and the required computer memory and computing time will increase, so the subsequences are calculated according to the signal frequency Refactoring to reduce the number of models built and improve prediction efficiency.

C1-C(k-1)代表高频分量，则重构后的高频分量Cnew1为：C1-C(k-1) represents the high-frequency component, and the reconstructed high-frequency component Cnew1 is:

Cnew1＝C1+C2+…+C(k-1) (35)Cnew1＝C1+C2+...+C(k-1) (35)

Ck-Cn代表低频分量，则重构后的低频分量Cnew2为：Ck-Cn represents the low frequency component, then the reconstructed low frequency component Cnew2 is:

Cnew2＝Ck+C(k+1)+…+Cn (36)Cnew2＝Ck+C(k+1)+...+Cn (36)

S5、如图2，使用Transformer模型分别对两个子序列进行预测，具体包括：S5, as shown in Figure 2, use the Transformer model to predict the two subsequences respectively, including:

S5-1、词嵌入与位置编码。S5-1. Word embedding and position encoding.

通过Input Embedding将输入序列的用水量数据转化为嵌入向量，随后对每个向量进行位置编码，位置编码公式如式(37)-(38)所示。相似地，对目标序列的用水量数据也进行类似的词嵌入与位置编码操作。The water consumption data of the input sequence is converted into embedded vectors through Input Embedding, and then each vector is position-encoded. The position-encoding formulas are shown in equations (37)-(38). Similarly, similar word embedding and position encoding operations are performed on the water consumption data of the target sequence.

式中，PE_2i(p)为偶数位置编码，PE_2i+1(p)为奇数位置编码，p为该时刻用水量在n个时刻中的位置；d为PE的维度。In the formula, PE _2i (p) is an even-numbered position code, PE _2i+1 (p) is an odd-numbered position code, p is the position of the water consumption at this moment in n moments; d is the dimension of PE.

S5-2、编码器操作。S5-2. Encoder operation.

将用水数据的嵌入向量和位置编码相加得到输入序列的向量表示矩阵X_n×d，n为窗口大小，即用n个前序时间数据预测第n+1个时间序列数据；d为嵌入维度，一般取512。将得到的序列向量表示矩阵传入编码器中，编码器由多个编码器层组成，通常选六个编码器层，本发明使用六个编码器层作为编码器结构。每个编码器层分别由多头注意力机制、残差连接和层归一化、前馈神经网络、残差连接和层归一化四个部分组成。多头注意力机制原理及编码器层的输入与输出如下所示。The vector representation matrix X _n×d of the input sequence is obtained by adding the embedding vector of the water data and the position code, n is the window size, that is, the n+1th time series data is predicted by n pre-order time data; d is the embedding dimension , generally take 512. The obtained sequence vector representation matrix is passed into the encoder, and the encoder is composed of multiple encoder layers, usually six encoder layers are selected, and the present invention uses six encoder layers as the encoder structure. Each encoder layer consists of four parts: multi-head attention mechanism, residual connection and layer normalization, feed-forward neural network, residual connection and layer normalization. The principle of the multi-head attention mechanism and the input and output of the encoder layer are shown below.

(1)多头注意力机制(1) Multi-head attention mechanism

首先在自注意力的基础上，对Q、K、V根据head的数量进行拆分。Q(Query)、K(Key)、V(Value)都源于输入特征本身，是根据输入特征产生的向量。随后将上标第二位参数为1的Q、K、V(e.g.Q¹、K¹、V¹)归为Head1。Head2，…，Headn同理。每个head的输出为： First, on the basis of self-attention, Q, K, and V are split according to the number of heads. Q(Query), K(Key), and V(Value) all originate from the input features themselves, and are vectors generated according to the input features. Then Q, K, V (eg Q ¹ , K ¹ , V ¹ ) whose superscript second parameter is 1 are classified as Head1. Head2,..., Headn are the same. The output of each head is:

对每个Head的输出进行拼接，然后将拼接后的输出乘以得到多头注意力机制的输出。这里有d_k＝d_v＝d_model/h，h为多头注意力机制的头数，Transformer使用的多头注意力机制头数为8，即h＝8。Splice the output of each Head, and then multiply the spliced output by Get the output of the multi-head attention mechanism. Here d _k =d _v =d _model /h, h is the number of heads of the multi-head attention mechanism, and the number of heads of the multi-head attention mechanism used by Transformer is 8, that is, h=8.

(2)编码器层的输入与输出(2) Input and output of the encoder layer

多头注意力机制的输出通过残差连接以及层归一化进行加速收敛，最后经过前馈神经网络使用多层全连接层得到最后的输出编码信息。The output of the multi-head attention mechanism accelerates the convergence through residual connection and layer normalization, and finally obtains the final output encoding information through the feedforward neural network using multiple layers of fully connected layers.

编码器层的输入与输出如式(39)-(40)所示。The input and output of the encoder layer are shown in equations (39)-(40).

e₀＝Embedding(inputs)+pos_Enc(inputs_position) (39)e ₀ ＝Embedding(inputs)+pos_Enc(inputs _position ) (39)

e_i＝EncoderLayer(e_i-1) (40)e _i =EncoderLayer(e _i-1 ) (40)

式中，e₀为编码器的输入；e_i-1和e_i分别为编码器第i-1层和第i层的输出；EncoderLayer(·)表示编码器层操作；i∈[1,N]，N为编码器层层数。In the formula, e ₀ is the input of the encoder; e _i-1 and e _i are the outputs of the i-1th layer and the i-th layer of the encoder respectively; EncoderLayer( ) represents the operation of the encoder layer; i∈[1,N ], N is the number of encoder layers.

S5-3、解码器操作。S5-3. Decoder operation.

解码器也包含多个解码器层与编码器进行对应，因为本发明使用了六个编码器层，所以使用六个解码器层组成解码器与之对应。每个解码器层由掩码多头注意力机制、残差连接和层归一化、多头注意力机制、残差连接和层归一化、前馈神经网络、残差连接和层归一化六个部分组成。掩码多头注意力机制在多头注意力机制的基础上加入了上三角矩阵以屏蔽未来序列的信息。解码器部分的多头注意力机制与编码器类似，与之不同的是Q、K并非来源于解码器输入特征而是来源于相应编码器层的输出，V来源于解码器的输入特征。解码器层的输入与输出如下所示：The decoder also includes a plurality of decoder layers corresponding to the encoder, because the present invention uses six encoder layers, so six decoder layers are used to form a decoder corresponding to it. Each decoder layer consists of masked multi-head attention mechanism, residual connection and layer normalization, multi-head attention mechanism, residual connection and layer normalization, feed-forward neural network, residual connection and layer normalization6 consists of parts. The masked multi-head attention mechanism adds an upper triangular matrix to mask the information of future sequences based on the multi-head attention mechanism. The multi-head attention mechanism of the decoder part is similar to that of the encoder. The difference is that Q and K are not derived from the input features of the decoder but from the output of the corresponding encoder layer, and V is derived from the input features of the decoder. The input and output of the decoder layer are as follows:

e₀＝Embedding(outputs)+pos_Enc(outputs_position) (41)e ₀ ＝Embedding(outputs)+pos_Enc(outputs _position ) (41)

e_i＝DecoderLayer(e_i-1) (42)e _i ＝DecoderLayer(e _i-1 ) (42)

e₀为解码器的输入；e_i-1和e_i分别为解码器第i-1层和第i层的输出；DecoderLayer(·)表示解码器层操作；i∈[1,N]，N为解码器层层数。e ₀ is the input of the decoder; e _i-1 and e _i are the outputs of the i-1th layer and the i-th layer of the decoder respectively; DecoderLayer( ) represents the operation of the decoder layer; i∈[1,N],N is the number of decoder layers.

S5-4、对两个重构子序列进行预测。S5-4. Predict the two reconstructed subsequences.

使用数据的前80％作为训练集，数据的后20％作为测试集。将用水量数据分解重构信号序列输入Transformer模型进行训练与结果预测，输出模型预测序列。Use the first 80% of the data as the training set and the last 20% of the data as the test set. The decomposed and reconstructed signal sequence of the water consumption data is input into the Transformer model for training and result prediction, and the model prediction sequence is output.

S6、将子序列的预测值进行叠加得到预测的居民用水量，具体包括：S6. Superimpose the predicted values of the subsequences to obtain the predicted residential water consumption, specifically including:

将S5步骤中两个子序列的预测值相加，得到用户水量预测值，即：Add the predicted values of the two subsequences in step S5 to obtain the predicted value of the user’s water volume, namely:

W_pred＝W_new1,p+W_new2,p (43)W _pred = W _new1,p +W _new2,p (43)

Claims

1. A method for forecasting residential water consumption, comprising the following steps:

S1. Obtain the water consumption data of the research area during the research period through the smart water meter;

S2. Use the Z-Score method to clean up the data, remove and repair the abnormal values in the original data;

S3. Using the time-varying filter empirical mode decomposition to decompose the residential water consumption data into multiple subsequences with frequencies ranging from high to low;

S4. Divide the subsequences into high frequency subsequences and low frequency subsequences according to frequency, and reconstruct the water consumption data into high frequency subsequences and low frequency subsequences;

S5. Using the Transformer model to predict the two subsequences respectively;

S6. Superimpose the predicted values of the subsequences to obtain the predicted residential water consumption.

2. A method for predicting residential water consumption according to claim 1, wherein said step S2 specifically comprises the following steps:

To calculate the Z-score of the water consumption data, the calculation method is:

In the formula, Z _i is the Z score of the i-th data, x _i is the data value of the i-th data, is the mean value of the data set, σ is the standard deviation of the data set;

Then set the critical Z score (Z _t ) of the data set. If |Z _i |>Z _t , the data is considered to be an outlier, and the random interpolation method is used to replace it, and the selection is made according to the specific situation of the data set.

3. A method for predicting residential water consumption according to claim 1, characterized in that: said step S3 specifically comprises the following steps:

S3-1. Calculate the local cut-off frequency, the formula is:

In the formula, Be local cut-off frequency, η ₁ (t), η ₂ (t), a ₁ (t), a ₂ (t) are all constructed functions;

Rearrange the local cutoff frequency to obtain the final local cutoff frequency;

S3-2. Using a time-varying filter to filter the input signal to obtain an approximation result;

After obtaining the local cutoff frequency, the signal h(t) can be obtained:

Taking the time point of the extremum value of the signal h(t) as the node m, construct a B-spline approximation time-varying filter, and set the cut-off frequency of the filter as Then B-spline approximation time-varying filtering is performed on the input signal, and the approximation result is recorded as m(t);

S3-3, judging whether m(t) satisfies the narrowband signal condition, if so, stop decomposing the input signal, if not, make x(t)=x(t)-m(t) and repeat S3- Steps 1 and S3-2, continue to decompose the input signal until the stop condition is reached;

S3-4. The m(t) obtained through S3-1 and S3-2 is a subsequence obtained by decomposing the initial signal by the TVF-EMD method. According to the sequence of m(t), record it and name it as C1, C2...Cn, a total of n subsequences with frequency from high to low are obtained.

4. A method for predicting residential water consumption according to claim 3, characterized in that: in said S3-3, the method for judging whether m(t) satisfies the narrowband signal condition is: if θ(t)≤Δ, Then the signal can be considered as a narrowband signal, and the calculation formula of θ(t) is:

In the formula, B _Loughlin (t) is Loughlin instantaneous bandwidth, is the weighted average instantaneous frequency.

5. A method for predicting residential water consumption according to claim 1, characterized in that: said step S4 specifically comprises the following steps:

S4-1. Record C1 as index 1, C1+C2 as index 2, and so on, the sum of the previous i C sequences is index i, calculate the average value of index 1 to index n, and check whether the average is significantly different Perform t-test at 0, if the t-test is significantly different from 0 at index k, then classify C1～C(k-1) as high-frequency components, and classify Ck～Cn as low-frequency components;

S4-2. Reconstruct the subsequence according to the signal frequency to reduce the number of established models. C1～C(k-1) represent high-frequency components, and the reconstructed high-frequency component Cnew1 is:

Cnew1＝C1+C2+...+C(k-1)

Ck～Cn represent low-frequency components, and the reconstructed low-frequency component Cnew2 is:

Cnew2=Ck+C(k+1)+...+Cn.

6. A method for predicting residential water consumption according to claim 1, characterized in that: said step S5 specifically comprises the following steps:

S5-1. Convert the water consumption data of the input sequence into embedded vectors, and then perform position encoding on each vector;

S5-2. Add the embedding vector and position code of the water data to obtain the vector representation matrix of the input sequence, and pass the vector representation matrix of the input sequence into the encoder. The encoder is composed of multiple encoder layers, each encoder Layers include multi-head attention mechanisms, residual connections and layer normalization, feed-forward neural networks, residual connections and layer normalization;

S5-3. The decoder contains multiple decoder layers corresponding to the encoder. Each decoder layer includes masked multi-head attention mechanism, residual connection and layer normalization, multi-head attention mechanism, residual connection and Layer normalization, feed-forward neural networks, residual connections, and layer normalization;

S5-4. Using the first 80% of the data as a training set and the last 20% of the data as a test set, input the decomposed and reconstructed signal sequence of the water consumption data into the Transformer model for training and result prediction, and output the model prediction sequence.

7. A method for predicting residential water consumption according to claim 6, characterized in that: the formula for the position coding in step S5-1 is:

In the formula, PE _2i (p) is an even-numbered position code, PE _2i+1 (p) is an odd-numbered position code, p is the position of the water consumption at this moment in n moments, and d is the dimension of PE.

8. The prediction method of a kind of resident water consumption according to claim 6, it is characterized in that: in described step S5-2, the output of multi-head attention mechanism carries out accelerated convergence through residual connection and layer normalization, finally The final output encoding information is obtained by using a multi-layer fully connected layer through a feed-forward neural network.

9. The prediction method of a kind of resident water consumption according to claim 6, it is characterized in that: in described S5-3, mask multi-head attention mechanism has added upper triangular matrix on the basis of multi-head attention mechanism to shield For the information of the future sequence, Q and K of the multi-head attention mechanism in the decoder part come from the output of the corresponding encoder layer, and V comes from the input features of the decoder.

10. A method for predicting residential water consumption according to claim 1, characterized in that: the superposition formula of step S6 is:

W _pred = W _new1,p +W _new2,p

In the formula, W _pred is the predicted value of the user's water volume, W _new1,p and W _new2,p are the predicted value of the high-frequency component subsequence and the predicted value of the low-frequency component subsequence, respectively.