WO2022151829A1 - 一种基于动态网格划分的时序数据趋势特征提取方法 - Google Patents
一种基于动态网格划分的时序数据趋势特征提取方法 Download PDFInfo
- Publication number
- WO2022151829A1 WO2022151829A1 PCT/CN2021/130798 CN2021130798W WO2022151829A1 WO 2022151829 A1 WO2022151829 A1 WO 2022151829A1 CN 2021130798 W CN2021130798 W CN 2021130798W WO 2022151829 A1 WO2022151829 A1 WO 2022151829A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- time series
- points
- trend
- series data
- Prior art date
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 10
- 238000000034 method Methods 0.000 claims abstract description 24
- 238000007418 data mining Methods 0.000 claims abstract description 10
- 230000000717 retained effect Effects 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 abstract description 5
- 230000011218 segmentation Effects 0.000 abstract description 2
- 238000004364 calculation method Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 10
- 238000011156 evaluation Methods 0.000 description 3
- 230000009977 dual effect Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
Definitions
- the invention relates to the technical field of time series data, in particular to a method for extracting trend features of time series data based on dynamic grid division.
- Time series data as an orderly collection of time-varying data, has been widely used in industry, agriculture, finance, science and engineering, social sciences and other fields, and is mostly high-dimensional and multivariable data. Therefore, in recent years, the amount of time series data has shown an explosive growth, leading to certain challenges in data storage and data value mining. Especially in many industries, time series data is distributed smoothly most of the time, so the data contains a lot of redundant information. In order to improve the computational efficiency and analysis accuracy of the data mining model, it is often necessary to extract the features of time series data to compress the amount of data, usually by extracting data at equal intervals. However, this method often only retains the main trend of time series data, and misses many key features.
- Chinese Patent Publication No. CN108804731A A method for extracting trend features of time series based on dual evaluation factors of important points.
- a method for extracting trend features of time series based on dual evaluation factors of important points is proposed. This method calculates important points on the basis of piecewise linear representation of time series. The distance factor and trend factor are used to comprehensively evaluate the importance of important points to the overall trend to determine the segmentation points, while compressing the data and ensuring the extraction accuracy. If the fluctuations in the segmented data are large, this method is easy to ignore the less important feature points, and the weights and thresholds in the method need to be parameterized according to the specific data set, which is insufficient in applicability and flexibility.
- the trend feature extraction method based on the double evaluation factor of important points, based on piecewise linearity, supplemented by distance factor and trend factor, can extract the main trend and key features of time series data, but it is easy to ignore the secondary key features, and the method in The threshold and weight of the parameter need to be identified according to the specific data set, and there are certain limitations.
- the invention solves the problems of low efficiency and large error when mining and modeling the original data caused by the mass and complexity of time series data, and proposes a method for extracting trend features of time series data based on dynamic grid division, which can be reserved by a small number of data points. Key feature points and trend information in time series data improve the efficiency and accuracy of subsequent data modeling and analysis.
- a method for extracting trend features of time series data based on dynamic grid division comprising the following steps:
- Step A Set the target number N, dynamically divide the grid according to the density distribution of the time series data, and divide the time and value of the time series data into m segments and n segments respectively;
- Step B Traverse the local data in each grid, and obtain the priority queue of key feature points by linearly dividing and calculating the distance;
- Step C Summarize the priority queues of the key feature points extracted from each grid to obtain a one-dimensional feature subsequence S1 of the original data;
- Step D According to the target number N, extract data points at regular intervals in the time series to obtain a one-dimensional trend subsequence S2 of the original data;
- Step E Integrate the feature subsequence S1 and the trend subsequence S2 to obtain a new sequence S for data mining.
- the present invention extracts the feature information and trend information of the original data by constructing the feature subsequence and the trend subsequence respectively.
- the target number of subsequences is set to N.
- the grid is dynamically divided, and the key feature points including the local extreme points and inflection points are obtained by linearly dividing the local data to calculate the distance, and the priority queue of the key feature points is obtained.
- the obtained feature subsequence is an equally spaced one-dimensional array, which can amplify the feature-dense local data and weaken the stationary redundant local data.
- the trend subsequence of the one-dimensional array is obtained according to the extraction data at equal intervals of the target number N.
- a new sequence is constructed based on the characteristic subsequence and the trend subsequence as the basis of data mining.
- the present invention divides the grid dynamically according to the density distribution of the data, and uses as few segments as possible to divide the densely distributed area, and uses more segments to divide the area with relatively sparse distribution and large value change.
- the method of calculating the distance by linear division is adopted to obtain the priority queue of key feature points, and the secondary feature points can be retained while obtaining the key feature points.
- the present invention obtains the one-dimensional trend subsequence of the original data by extracting the target number at equal intervals, and retains the overall trend information of the original data.
- the new sequence constructed by concatenating feature subsequences and trend subsequences reduces the length of the original data and removes redundant information, while retaining the key feature points and trend information of the original data, improving the efficiency and accuracy of data mining modeling.
- the parameter setting is simple, and the method has certain applicability.
- the step A specifically includes the following steps:
- Step A1 Take the time of the time series data as the x-axis and the value as the y-axis, divide the values into n segments at equal intervals within the range of the time series data, and the value range of n is [3N/4, N/4];
- the step B specifically includes the following steps:
- Step B1 Set the scale coefficient ⁇ , and traverse the local data in the grid in turn;
- Step B2 The endpoints of the data in the area are a and b, calculate the vertical distance d i from each data point to the line a and b in turn, obtain the maximum value of the vertical distance as d max , and calculate the mean value of the vertical distance as d mean , if d max greater than or equal to ⁇ *d mean , the corresponding data point is recorded as the important point P i ;
- Step B3 take the important point P i as the dividing point, divide the data in the area into two parts, and perform step B2 respectively;
- Step B4 First cycle Step B2 and Step B3 until no important points appear, and integrate the important points Pi obtained in Step B2 and Step B3 into a priority queue of data in the grid according to the vertical distance.
- the step B4 further includes: if the priority queue is empty, taking the point corresponding to the median of the data in the grid as an important point and adding it to the priority queue.
- the step C specifically includes the following steps:
- Step C1 Summarize the important points in each grid. If the number of important points is less than N, reduce the proportional coefficient ⁇ , and repeat steps B1 to B4; if the number of important points is greater than N, remove redundant data points;
- Step C2 Arrange the extracted important points in the order of the time series, and delete the time information to obtain an equally spaced one-dimensional array, which is the characteristic subsequence S1 of the original data.
- the step C1 removes redundant data points according to the following principles:
- the step E specifically includes the following steps:
- Step E1 perform reverse order processing on the trend subsequence S2 to obtain a new subsequence S2';
- Step E2 Connect the subsequences S1 and S2' in series to obtain a one-dimensional equally spaced array S with a data length of 2N.
- the beneficial effects of the present invention are that the method of calculating distance by linear division of local data is used to obtain the priority queue of key feature points, and finally the one-dimensional feature sub-sequence of the original time series data is obtained by summarizing.
- the one-dimensional trend subsequence of the original data is obtained by extracting the target number at equal intervals. Finally, based on the characteristic subsequence and the trend subsequence, a new sequence reduced from the original data is constructed.
- the invention can reduce the data length according to the target number, while retaining the key feature points and trend information of the original data, and improve the efficiency and accuracy of data mining and modeling analysis.
- Fig. 1 is a schematic diagram of the algorithm flow
- Fig. 2 embodiment original timing data diagram
- Figure 3 is a schematic diagram of uniform downsampling of original time series data at equal intervals
- 5 is a schematic diagram of uneven downsampling based on dynamic grid division
- FIG. 7 constructs a new sequence diagram based on the characteristic subsequence and the trend subsequence.
- FIG. 2 is a schematic diagram of the time series data and its uniform downsampling at equal intervals. It can be seen from the figure that even if the sampling point is 2000, the characteristics of the initial stage of the data cannot be extracted.
- This embodiment proposes a method for extracting trend features of time series data based on dynamic grid division.
- the method includes the following steps:
- Step A Set the target number to 120, dynamically divide the grid according to the density distribution of the time series data, and divide the time and value of the time series data into 62 segments and 58 segments respectively;
- Step A specifically includes the following steps:
- Step A1 Take the time of the time series data as the x-axis and the value as the y-axis, divide the values into n segments at equal intervals within the range of the time series data, and the value range of n is [75, 25];
- Step A3 After Step A1 and Step A2, the condition can be satisfied when n is 58. Within the range of time series data, the time is divided into 62 segments at equal intervals, and finally the original data is divided into 62*58 grids.
- Step B Traverse the local data in each grid, and obtain the priority queue of key feature points by linearly dividing and calculating the distance;
- Step B specifically includes the following steps:
- Step B1 Set the initial value of the scale coefficient ⁇ to 1.5, and traverse the local data in the grid in turn;
- Step B2 Referring to Figure 5,
- Figure 5 is a schematic diagram of uneven downsampling based on dynamic grid division.
- the data framed in the figure is used as an example to introduce the extraction process of important points.
- the endpoints of the data in the area are a and b.
- the coordinates take the line connecting a and b as the x-axis, calculate the vertical distance d i from each data point to the line connecting a and b in turn, obtain the maximum value of the vertical distance as d max , and calculate the mean value of the vertical distance as d mean , if d max is greater than or is equal to ⁇ *d mean , then the corresponding data point is recorded as the important point P1;
- Step B3 take the important point P1 as the dividing point, divide the data in the area into two parts, and perform step B2 respectively to obtain the second important point P2 that satisfies the condition;
- Step B4 First cycle Step B2 and Step B3 until no important points appear, and integrate the important points P1 and P2 obtained in Step B2 and Step B3 into a priority queue [P1, P2]. If the priority queue is empty, the point corresponding to the median of the data in the grid is taken as the important point and included in the priority queue.
- Step C Summarize the priority queues of the key feature points extracted from each grid to obtain a one-dimensional feature subsequence S1 of the original data;
- Step C specifically includes the following steps:
- Step C1 Summarize the important points in each grid. If the number of important points is less than 120, reduce the proportional coefficient ⁇ , and repeat steps B1 to B4; if the number of important points is greater than 120, remove redundant data points; The following principles eliminate redundant data points:
- Step C2 Arrange the extracted important points in the order of the time series, and delete the time information to obtain an equally spaced one-dimensional array, which is the characteristic subsequence S1 of the original data.
- FIG. 6 is a schematic diagram of the non-uniform downsampling of 120 points based on the dynamic grid division of the embodiment.
- the subsequence S1 not only retains the feature information of the original data, but also amplifies the local features of the feature-dense data, and weakens the smooth redundancy. Local features of the remaining data;
- Step D According to the target number of 120, extract data points at regular intervals in the time series to obtain a one-dimensional trend subsequence S2 of the original data.
- FIG. 3 is a schematic diagram of uniform downsampling of original time series data at equal intervals
- FIG. 4 is a schematic diagram of an embodiment of uniform downsampling at equal intervals to 120 points;
- Step E Integrate the feature subsequence S1 and the trend subsequence S2 to obtain a new sequence S for data mining.
- Step E specifically includes the following steps:
- Step E1 perform reverse order processing on the trend subsequence S2 to obtain a new subsequence S2';
- Step E2 Connect the subsequences S1 and S2' in series to obtain a one-dimensional array S with a data length of 2*120 at equal intervals. Referring to FIG. 7 , this embodiment constructs a new sequence S based on the characteristic subsequence and the trend subsequence.
- the present invention extracts the feature information and trend information of the original data by constructing the feature subsequence and the trend subsequence respectively.
- the target number of subsequences is set to N.
- the grid is dynamically divided, and the key feature points including the local extreme points and inflection points are obtained by linearly dividing the local data to calculate the distance, and the priority queue of the key feature points is obtained.
- the obtained feature subsequence is an equally spaced one-dimensional array, which can amplify the feature-dense local data and weaken the stationary redundant local data.
- the trend subsequence of the one-dimensional array is obtained according to the extraction data at equal intervals of the target number N.
- a new sequence is constructed based on the characteristic subsequence and the trend subsequence as the basis of data mining.
- the method of calculating the distance by linear division of local data is used to obtain the priority queue of key feature points, and finally the one-dimensional feature sub-sequence of the original time series data is obtained by summarizing.
- the one-dimensional trend subsequence of the original data is obtained by extracting the target number at equal intervals.
- a new sequence reduced from the original data is constructed.
- the invention can reduce the data length according to the target number, while retaining the key feature points and trend information of the original data, and improve the efficiency and accuracy of data mining and modeling analysis.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Databases & Information Systems (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Algebra (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Complex Calculations (AREA)
Abstract
一种基于动态网格划分的时序数据趋势特征提取方法,包括以下步骤:设定目标数N,根据时序数据的密度分布动态划分网格,将时序数据的时间和值分别划分为m段和n段;遍历每个网格内的局部数据,采用线性分割计算距离的方式,获取关键特征点的优先级队列;汇总各网格中提取的关键特征点的优先级队列,得到原数据的一维特征子序列S1;根据目标数N,在时间序列中等间隔的抽取数据点,得到原数据的一维趋势子序列S2;整合特征子序列S1和趋势子序列S2得到用于数据挖掘的新序列S。上述方法可用少量数据点保留时序数据中的关键特征点和趋势信息,提高后续数据建模分析的效率和准确性。
Description
本发明涉及时序数据技术领域,尤其是一种基于动态网格划分的时序数据趋势特征提取方法。
时序数据作为一种随时间变化的数据有序集合,在工业、农业、金融业、科学工程、社会学科等领域内得到广泛应用,且多表现为高维多变量的数据。因此,近年来时序数据量呈现爆炸性的增长,导致一定的数据存储和数据价值挖掘挑战。尤其是在诸多行业中,时序数据多数时候呈平稳分布,因此数据中包含了大量的冗余信息。为了提高数据挖掘模型的计算效率和分析准确性,往往需要提取时序数据的特征以压缩数据量,通常做法为等间隔的抽取数据。但该方法往往只能保留时序数据的主体趋势,而遗漏很多关键的特征。
中国专利公开号为CN108804731A的基于重要点双重评价因子时间序列趋势特征提取方法提出了基于重要点双重评价因子时间序列趋势特征提取方法,该方法在时间序列分段线性表示的基础上,计算重要点距离因子和趋势因子,综合评价重要点对整体趋势的重要程度以确定分段点,压缩数据的同时保证了提取精度。若分段数据内波动较大,此方法容易忽略次重要的特征点,同时方法中的权重和阈值需要根据具体的数据集进行参数辨识,适用性和灵活性不足。
现有技术在等间隔抽取数据时,只能保证主体的趋势而容易遗漏时序数据的关键特征。基于重要点双重评价因子的趋势特征提取方法,以分段线性为基础,辅以距离因子和趋势因子,可以提取时序数据的主体趋势和关键特征,但容易忽略次要关键特征,且该方法中的阈值和权重需要根据具体的数据集进行参数辨识,存在一定局限性。
因此,需要提出一种时序数据趋势特征的提取方法,保留时序数据的整体趋势和局部特征信息,保证后续数据挖掘模型的效率和准确性。
发明内容
本发明解决了时序数据的海量性和复杂性导致对原始数据进行挖掘建模时效率低、误差大的问题,提出一种基于动态网格划分的时序数据趋势特征提取方法,可用少量数据点保留时序数据中的关键特征点和趋势信息,提高后续数据建模分析的效率和准确性。
为实现上述目的,提出以下技术方案:
一种基于动态网格划分的时序数据趋势特征提取方法,包括以下步骤:
步骤A:设定目标数N,根据时序数据的密度分布动态划分网格,将时序数据的时间和值分别划分为m段和n段;
步骤B:遍历每个网格内的局部数据,采用线性分割计算距离的方式,获取关键特征点的优先级队列;
步骤C:汇总各网格中提取的关键特征点的优先级队列,得到原数据的一维特征子序列S1;
步骤D:根据目标数N,在时间序列中等间隔的抽取数据点,得到原数据的一维趋势子序列S2;
步骤E:整合特征子序列S1和趋势子序列S2得到用于数据挖掘的新序列S。
本发明通过分别构建特征子序列和趋势子序列,提取原数据的特征信息和趋势信息。为了保证数据挖掘模型中样本数据的长度一致,设置子序列的目标数为N。根据数据的密度分布动态的划分网格,采用局部数据线性分割计算距离的方式得到包括局部极值点和拐点在内的关键特征点,并获取关键特征点的优先级队列。汇总各网格内局部数据,得到的特征子序列为等间隔的一维数组,该数组能够放大特征密集的局部数据,弱化平稳冗余的局部数据。根据目标数N等间隔的抽取数据,得到一维数组的趋势子序列。最终基于特征子序列和趋势子序列构建新序列作为数据挖掘的基础。
本发明根据数据的密度分布动态的划分网格,对于分布密集的区域用尽可能少的段分割,对于分布相对稀疏、数值变化大的区域用较多的段分割。对于网格内的局部数据,采用线性分割计算距离的方式,得到关键特征点的优先级 队列,获取关键特征点的同时还能保留次要特征点。汇总提取的数据特征点并转化为等间隔的一维数组,得到原数据的特征子序列,该序列不仅保留了原数据的关键特征,同时放大特征密集数据的局部特征,弱化平稳冗余数据的局部特征。同时,本发明通过等间隔抽取目标数的方式得到原始数据的一维趋势子序列,保留了原数据的整体趋势信息。通过串联特征子序列和趋势子序列构建的新序列,在降低原数据长度、去除冗余信息的同时,保留了原数据的关键特征点和趋势信息,提高数据挖掘建模的效率和准确性,且参数设置简单,方法具有一定的适用性。
作为优选,所述步骤A具体包括以下步骤:
步骤A1:以时序数据的时间为x轴,数值为y轴,在时序数据的范围内将数值等间隔的划分为n段,n的取值范围为[3N/4,N/4];
步骤A2:判断能否满足条件:统计任意n/2段内包含的数据点超过原数据长度的80%,若不能满足所述条件则调整n,重复步骤A1,直到满足条件或n=N/4;
步骤A3:在时序数据的范围内将时间等间隔的划分为m段,m=N-n,最终将原数据划分为m*n个网格。
作为优选,所述步骤B具体包括以下步骤:
步骤B1:设置比例系数γ,依次遍历网格内的局部数据;
步骤B2:区域内数据的端点为a和b,依次计算各数据点到a、b连线的垂直距离d
i,获取垂直距离最大值为d
max,计算垂直距离均值为d
mean,若d
max大于或等于γ*d
mean,则所对应的数据点记为重要点P
i;
步骤B3:以重要点P
i为分割点,将区域内的数据分割为两部分,分别进行步骤B2;
步骤B4:先循环步骤B2和步骤B3,直到无重要点出现,将步骤B2和步骤B3中得到的重要点Pi按垂直距离整合为该网格内数据的优先级队列。
作为优选,所述步骤B4还包括:若所述优先级队列为空,则以该网格内数据中位数对应的点为重要点,并纳入优先级队列。
作为优选,所述步骤C具体包括以下步骤:
步骤C1:汇总各网格内的重要点,若重要点的数量小于N,则减小比例系数γ,重复步骤B1至步骤B4;若重要点的数量大于N,则剔除多余的数据点;
步骤C2:将提取的重要点按时间序列的顺序排列,并删除时间信息,得到等间隔的一维数组,此数组为原数据的特征子序列S1。
作为优选,所述步骤C1根据以下原则剔除多余的数据点:
(1)保证每个网格内至少保留一个重要点;
(2)将重要点按垂直距离由小到大的顺序删除。
作为优选,所述步骤E具体包括以下步骤:
步骤E1:对趋势子序列S2做逆序处理,得到新的子序列S2′;
步骤E2:串联连接子序列S1和S2′得到数据长度为2N的一维等间隔数组S。
本发明的有益效果是:采用局部数据线性分割计算距离的方式,获取关键特征点的优先级队列,并最终汇总得到原始时序数据的一维特征子序列。通过等间隔抽取目标数的方式得到原始数据的一维趋势子序列。最终基于特征子序列和趋势子序列构建原始数据缩减后的新序列。本发明能够根据目标数缩减数据长度的同时,保留原始数据的关键特征点和趋势信息,提高数据挖掘、建模分析的效率和准确性。
图1算法流程示意图;
图2实施例原始时序数据图;
图3原始时序数据等间隔均匀降采样示意图;
图4实施例等间隔均匀降采样图;
图5基于动态网格划分的不均匀降采样示意图;
图6实施例基于动态网格划分的不均匀降采样图;
图7实施例基于特征子序列与趋势子序列构建新序列图。
实施例:
以某设备的一段温度信号为例,温度的采样间隔为1s,数据长度为30000。参考图2,图2为该时序数据及其等间隔均匀降采样示意图,由图可见,即使采样点为2000,依然无法提取数据起始阶段的特征。
本实施例提出一种基于动态网格划分的时序数据趋势特征提取方法,参考图1,包括以下步骤:
步骤A:设定目标数120,根据时序数据的密度分布动态划分网格,将时序数据的时间和值分别划分为62段和58段;
步骤A具体包括以下步骤:
步骤A1:以时序数据的时间为x轴,数值为y轴,在时序数据的范围内将数值等间隔的划分为n段,n的取值范围为[75,25];
步骤A2:判断能否满足条件:统计任意n/2段内包含的数据点超过原数据长度的80%,若不能满足条件则调整n,重复步骤A1,直到满足条件或n=25;
步骤A3:经过步骤A1和步骤A2,n为58时能满足条件,在时序数据的范围内将时间等间隔的划分为62段,最终将原数据划分为62*58个网格。
步骤B:遍历每个网格内的局部数据,采用线性分割计算距离的方式,获取关键特征点的优先级队列;
步骤B具体包括以下步骤:
步骤B1:设置比例系数γ的初始值为1.5,依次遍历网格内的局部数据;
步骤B2:参考图5,图5为基于动态网格划分的不均匀降采样示意图,以图中框出的数据为例介绍重要点的提取过程,,区域内数据的端点为a和b,转换坐标以a、b连线为x轴,依次计算各数据点到a、b连线的垂直距离d
i,获取垂直距离最大值为d
max,计算垂直距离均值为d
mean,若d
max大于或等于γ*d
mean,则所对应的数据点记为重要点P1;
步骤B3:以重要点P1为分割点,将区域内的数据分割为两部分,分别进行 步骤B2,得到第二个满足条件的重要点P2;
步骤B4:先循环步骤B2和步骤B3,直到无重要点出现,将步骤B2和步骤B3中得到的重要点P1和重要点P2按垂直距离整合为该网格内数据的优先级队列[P1,P2]。若优先级队列为空,则以该网格内数据中位数对应的点为重要点,并纳入优先级队列。
步骤C:汇总各网格中提取的关键特征点的优先级队列,得到原数据的一维特征子序列S1;
步骤C具体包括以下步骤:
步骤C1:汇总各网格内的重要点,若重要点的数量小于120,则减小比例系数γ,重复步骤B1至步骤B4;若重要点的数量大于120,则剔除多余的数据点;根据以下原则剔除多余的数据点:
(1)保证每个网格内至少保留一个重要点;
(2)将重要点按垂直距离由小到大的顺序删除。
步骤C2:将提取的重要点按时间序列的顺序排列,并删除时间信息,得到等间隔的一维数组,此数组为原数据的特征子序列S1。参考图6,图6为实施例基于动态网格划分的不均匀降采样为120个点的示意图,子序列S1不仅保留了原数据的特征信息,且放大特征密集数据的局部特征,弱化平稳冗余数据的局部特征;
步骤D:根据目标数120,在时间序列中等间隔的抽取数据点,得到原数据的一维趋势子序列S2。参考图3,图3为原始时序数据等间隔均匀降采样示意图,参考图4,图4为实施例等间隔均匀降采样为120个点的示意图;
步骤E:整合特征子序列S1和趋势子序列S2得到用于数据挖掘的新序列S。
步骤E具体包括以下步骤:
步骤E1:对趋势子序列S2做逆序处理,得到新的子序列S2′;
步骤E2:串联连接子序列S1和S2′得到数据长度为2*120的一维等间隔数组S。参考图7,本实施例基于特征子序列与趋势子序列构建的新序列S。
本发明通过分别构建特征子序列和趋势子序列,提取原数据的特征信息和趋势信息。为了保证数据挖掘模型中样本数据的长度一致,设置子序列的目标数为N。根据数据的密度分布动态的划分网格,采用局部数据线性分割计算距离的方式得到包括局部极值点和拐点在内的关键特征点,并获取关键特征点的优先级队列。汇总各网格内局部数据,得到的特征子序列为等间隔的一维数组,该数组能够放大特征密集的局部数据,弱化平稳冗余的局部数据。根据目标数N等间隔的抽取数据,得到一维数组的趋势子序列。最终基于特征子序列和趋势子序列构建新序列作为数据挖掘的基础。采用局部数据线性分割计算距离的方式,获取关键特征点的优先级队列,并最终汇总得到原始时序数据的一维特征子序列。通过等间隔抽取目标数的方式得到原始数据的一维趋势子序列。最终基于特征子序列和趋势子序列构建原始数据缩减后的新序列。本发明能够根据目标数缩减数据长度的同时,保留原始数据的关键特征点和趋势信息,提高数据挖掘、建模分析的效率和准确性。
Claims (7)
- 一种基于动态网格划分的时序数据趋势特征提取方法,其特征是,包括以下步骤:步骤A:设定目标数N,根据时序数据的密度分布动态划分网格,将时序数据的时间和值分别划分为m段和n段;步骤B:遍历每个网格内的局部数据,采用线性分割计算距离的方式,获取关键特征点的优先级队列;步骤C:汇总各网格中提取的关键特征点的优先级队列,得到原数据的一维特征子序列S1;步骤D:根据目标数N,在时间序列中等间隔的抽取数据点,得到原数据的一维趋势子序列S2;步骤E:整合特征子序列S1和趋势子序列S2得到用于数据挖掘的新序列S。
- 根据权利要求1所述的一种基于动态网格划分的时序数据趋势特征提取方法,其特征是,所述步骤A具体包括以下步骤:步骤A1:以时序数据的时间为x轴,数值为y轴,在时序数据的范围内将数值等间隔的划分为n段,n的取值范围为[3N/4,N/4];步骤A2:判断能否满足条件:统计任意n/2段内包含的数据点超过原数据长度的80%,若不能满足所述条件则调整n,重复步骤A1,直到满足条件或n=N/4;步骤A3:在时序数据的范围内将时间等间隔的划分为m段,m=N-n,最终将原数据划分为m*n个网格。
- 根据权利要求1所述的一种基于动态网格划分的时序数据趋势特征提取方法,其特征是,所述步骤B具体包括以下步骤:步骤B1:设置比例系数γ,依次遍历网格内的局部数据;步骤B2:区域内数据的端点为a和b,依次计算各数据点到a、b连线的垂直距离d i,获取垂直距离最大值为d max,计算垂直距离均值为d mean,若d max大于或等于γ*d mean,则所对应的数据点记为重要点P i;步骤B3:以重要点P i为分割点,将区域内的数据分割为两部分,分别进行步骤B2;步骤B4:先循环步骤B2和步骤B3,直到无重要点出现,将步骤B2和步骤B3中得到的重要点P i按垂直距离整合为该网格内数据的优先级队列。
- 根据权利要求3所述的一种基于动态网格划分的时序数据趋势特征提取方法,其特征是,所述步骤B4还包括:若所述优先级队列为空,则以该网格内数据中位数对应的点为重要点,并纳入优先级队列。
- 根据权利要求3所述的一种基于动态网格划分的时序数据趋势特征提取方法,其特征是,所述步骤C具体包括以下步骤:步骤C1:汇总各网格内的重要点,若重要点的数量小于N,则减小比例系数γ,重复步骤B1至步骤B4;若重要点的数量大于N,则剔除多余的数据点;步骤C2:将提取的重要点按时间序列的顺序排列,并删除时间信息,得到等间隔的一维数组,此数组为原数据的特征子序列S1。
- 根据权利要求5所述的一种基于动态网格划分的时序数据趋势特征提取方法,其特征是,所述步骤C1根据以下原则剔除多余的数据点:(1)保证每个网格内至少保留一个重要点;(2)将重要点按垂直距离由小到大的顺序删除。
- 根据权利要求1所述的一种基于动态网格划分的时序数据趋势特征提取方法,其特征是,所述步骤E具体包括以下步骤:步骤E1:对趋势子序列S2做逆序处理,得到新的子序列S2′;步骤E2:串联连接子序列S1和S2′得到数据长度为2N的一维等间隔数组S。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21919020.4A EP4280088A1 (en) | 2021-01-15 | 2021-11-16 | Time series data trend feature extraction method based on dynamic grid division |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110054385.XA CN112765562B (zh) | 2021-01-15 | 2021-01-15 | 一种基于动态网格划分的时序数据趋势特征提取方法 |
CN202110054385.X | 2021-01-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022151829A1 true WO2022151829A1 (zh) | 2022-07-21 |
Family
ID=75701809
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/130798 WO2022151829A1 (zh) | 2021-01-15 | 2021-11-16 | 一种基于动态网格划分的时序数据趋势特征提取方法 |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP4280088A1 (zh) |
CN (1) | CN112765562B (zh) |
WO (1) | WO2022151829A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116823492A (zh) * | 2023-05-05 | 2023-09-29 | 陕西长瑞安驰信息技术集团有限公司 | 一种数据的存储方法及系统 |
CN116883059A (zh) * | 2023-09-06 | 2023-10-13 | 山东德源电力科技股份有限公司 | 一种配电终端管理方法及系统 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112765562B (zh) * | 2021-01-15 | 2022-07-01 | 杭州安脉盛智能技术有限公司 | 一种基于动态网格划分的时序数据趋势特征提取方法 |
CN116955932B (zh) * | 2023-09-18 | 2024-01-12 | 北京天泽智云科技有限公司 | 一种基于趋势的时间序列分割方法及装置 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030009399A1 (en) * | 2001-03-22 | 2003-01-09 | Boerner Sean T. | Method and system to identify discrete trends in time series |
CN104820779A (zh) * | 2015-04-28 | 2015-08-05 | 电子科技大学 | 一种基于极值点和转折点的时间序列降维方法 |
CN108804731A (zh) | 2017-09-12 | 2018-11-13 | 中南大学 | 基于重要点双重评价因子时间序列趋势特征提取方法 |
US20190026351A1 (en) * | 2016-01-08 | 2019-01-24 | Entit Sofware Llc | Time series trends |
CN110489810A (zh) * | 2019-07-24 | 2019-11-22 | 西安理工大学 | 一种基于数据块的趋势自动提取方法 |
CN111143442A (zh) * | 2019-12-31 | 2020-05-12 | 河海大学 | 一种融合趋势特征的时间序列符号聚集近似表示方法 |
CN112765562A (zh) * | 2021-01-15 | 2021-05-07 | 杭州安脉盛智能技术有限公司 | 一种基于动态网格划分的时序数据趋势特征提取方法 |
-
2021
- 2021-01-15 CN CN202110054385.XA patent/CN112765562B/zh active Active
- 2021-11-16 EP EP21919020.4A patent/EP4280088A1/en active Pending
- 2021-11-16 WO PCT/CN2021/130798 patent/WO2022151829A1/zh unknown
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030009399A1 (en) * | 2001-03-22 | 2003-01-09 | Boerner Sean T. | Method and system to identify discrete trends in time series |
CN104820779A (zh) * | 2015-04-28 | 2015-08-05 | 电子科技大学 | 一种基于极值点和转折点的时间序列降维方法 |
US20190026351A1 (en) * | 2016-01-08 | 2019-01-24 | Entit Sofware Llc | Time series trends |
CN108804731A (zh) | 2017-09-12 | 2018-11-13 | 中南大学 | 基于重要点双重评价因子时间序列趋势特征提取方法 |
CN110489810A (zh) * | 2019-07-24 | 2019-11-22 | 西安理工大学 | 一种基于数据块的趋势自动提取方法 |
CN111143442A (zh) * | 2019-12-31 | 2020-05-12 | 河海大学 | 一种融合趋势特征的时间序列符号聚集近似表示方法 |
CN112765562A (zh) * | 2021-01-15 | 2021-05-07 | 杭州安脉盛智能技术有限公司 | 一种基于动态网格划分的时序数据趋势特征提取方法 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116823492A (zh) * | 2023-05-05 | 2023-09-29 | 陕西长瑞安驰信息技术集团有限公司 | 一种数据的存储方法及系统 |
CN116823492B (zh) * | 2023-05-05 | 2024-04-02 | 上海原力枫林信息技术有限公司 | 一种数据的存储方法及系统 |
CN116883059A (zh) * | 2023-09-06 | 2023-10-13 | 山东德源电力科技股份有限公司 | 一种配电终端管理方法及系统 |
CN116883059B (zh) * | 2023-09-06 | 2023-11-28 | 山东德源电力科技股份有限公司 | 一种配电终端管理方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
EP4280088A1 (en) | 2023-11-22 |
CN112765562A (zh) | 2021-05-07 |
CN112765562B (zh) | 2022-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022151829A1 (zh) | 一种基于动态网格划分的时序数据趋势特征提取方法 | |
WO2019041628A1 (zh) | 基于Eclat的多元时间序列关联规则挖掘方法 | |
WO2016015683A1 (zh) | 交通数据流的聚集查询方法及系统 | |
WO2019196278A1 (zh) | 天气数据获取方法及装置、计算机装置及可读存储介质 | |
CN102982534B (zh) | 基于弦线切线法的Canny边缘检测双阈值获取方法 | |
CN117313222B (zh) | 基于bim技术的建筑施工数据处理方法 | |
WO2024103551A1 (zh) | 瞬态仿真波形存储方法、系统、设备及可读介质 | |
CN109597757B (zh) | 一种基于多维时间序列熵的软件网络间相似度的度量方法 | |
CN106354803B (zh) | 基于特性指标的电力输变电设备负荷坏数据检测方法 | |
CN109684328A (zh) | 一种高维时序数据压缩存储方法 | |
CN109726737A (zh) | 基于轨迹的异常行为检测方法及装置 | |
CN116578945A (zh) | 一种基于飞行器多源数据融合方法、电子设备及存储介质 | |
CN109389172B (zh) | 一种基于无参数网格的无线电信号数据聚类方法 | |
CN110751400B (zh) | 一种风险评估方法及装置 | |
CN112699165A (zh) | 一种用于时序数据降采样的方法和系统 | |
CN109740421A (zh) | 一种基于形状的零件分类方法 | |
CN105373583A (zh) | 基于数据压缩的支撑向量机建模方法 | |
CN111026879B (zh) | 多维度价值导向的针对意图的面向对象数值计算方法 | |
CN106452947A (zh) | 一种用于光纤安防大数据存储的方法 | |
CN113792749A (zh) | 时间序列数据异常检测方法、装置、设备及存储介质 | |
CN109492659B (zh) | 一种用于心电、脑电波形对比的计算曲线相似度的方法 | |
CN113722374B (zh) | 基于后缀树的时间序列变长模体挖掘方法 | |
CN118312571B (zh) | 一种道路数据多级数据同步方法 | |
CN118248169B (zh) | 一种基于音频数据的燃气泄漏识别方法以及相关装置 | |
CN111785296B (zh) | 基于重复旋律的音乐分段边界识别方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21919020 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2021919020 Country of ref document: EP Effective date: 20230816 |