WO2011079706A1 - Method and device for data query - Google Patents

Method and device for data query Download PDF

Info

Publication number
WO2011079706A1
WO2011079706A1 PCT/CN2010/079728 CN2010079728W WO2011079706A1 WO 2011079706 A1 WO2011079706 A1 WO 2011079706A1 CN 2010079728 W CN2010079728 W CN 2010079728W WO 2011079706 A1 WO2011079706 A1 WO 2011079706A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
sequence
subsequence
queried
matching
Prior art date
Application number
PCT/CN2010/079728
Other languages
French (fr)
Chinese (zh)
Inventor
申小次
李建军
贾学力
庄明亮
付新刚
Original Assignee
北京世纪高通科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京世纪高通科技有限公司 filed Critical 北京世纪高通科技有限公司
Publication of WO2011079706A1 publication Critical patent/WO2011079706A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Definitions

  • the present invention relates to the field of intelligent transportation systems, and in particular, to a data query method and apparatus.
  • the Advanced Traffic Information System is based on a well-established information network that can be acquired by sensors or data transmission equipment equipped in roads, cars, transfer stations, parking lots and weather centers. Various types of traffic information are comprehensively processed according to the obtained data.
  • the system provides comprehensive and accurate real-time road traffic congestion information to the community in real time.
  • the data acquired by the device cannot completely cover all the roads, so that real-time data filling needs to be performed by similar queries of historical data, and the historical data can be analyzed and predicted.
  • the historical data is a list of ordered data formed over time, and is a time series, referred to as timing.
  • the similarity query of time series is to find similar patterns of change in the time series data set, which is of great significance for the prediction, classification and knowledge discovery of time series.
  • Large-scale time series database similar query is one of the hot topics of time series data mining.
  • the filling and prediction of real-time data can be quickly realized.
  • the time series often uses a high-level data representation form as a discrete Fourier transform DFT method.
  • the inventors have found that at least the following problems exist in the prior art: Since the method of the discrete Fourier transform currently smoothes many original sequence information, it cannot be refined. It does represent the original sequence, and the time complexity of the method is 0 (", which makes the system easy to generate large errors when performing data query, and the query complexity is high, and the system resources that need to be occupied are large.
  • Embodiments of the present invention provide a data query method and apparatus.
  • a data query method including:
  • a data query device includes:
  • An information acquiring unit configured to acquire a subsequence to be queried and a corresponding time parameter thereof;
  • a history subsequence obtaining unit configured to acquire, according to the corresponding time parameter of the subsequence to be queried, a subsequence set of the corresponding time parameter from the historical data;
  • a sequence processing unit configured to perform a dimensionality reduction process on the sub-sequence in the to-be-queried sub-sequence and the acquired sub-sequence set;
  • a matching query unit configured to perform a matching query on the sub-sequence in the sub-sequence after the dimension reduction processing
  • a matching sequence obtaining unit configured to acquire a subsequence that matches the subsequence to be queried.
  • the data query method and device provided by the embodiment of the present invention obtain the sub-sequence set of the corresponding time parameter from the historical data according to the corresponding time parameter of the sub-sequence to be queried. ; the sub-sequence to be queried and the obtained sub-sequence.
  • the sub-sequences in the set of columns are subjected to a dimensionality reduction process; the sub-sequences in the reduced-dimensionally processed sub-sequences are matched with the sub-sequences in the reduced-dimensionally processed sub-sequences; Matching subsequences.
  • the embodiment of the present invention performs the dimensionality reduction processing on the subsequence in the subsequence set of the corresponding time parameter in the subsequence to be queried and the historical data, so that the query time complexity of the whole system is obtained. Reduced, increased utilization of system resources.
  • FIG. 1 is a flowchart of a data query method according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of implementing a sub-sequence set step of acquiring a corresponding time parameter from historical data according to a corresponding time parameter of the to-be-queried sub-sequence in a data query method according to an embodiment of the present disclosure
  • FIG. 3 is a flowchart of a specific implementation of performing a dimension reduction process on a sub-sequence in the sub-sequence to be queried and the obtained sub-sequence set in a data query method according to an embodiment of the present disclosure
  • FIG. 4 is a flowchart of a specific implementation process of performing a matching query process on a sub-sequence in a sub-sequence after the dimension reduction process and a sub-sequence in the dimension reduction process in the data query method according to an embodiment of the present disclosure
  • FIG. 5 is a flowchart of a specific implementation of a step of acquiring a subsequence matching the to-be-queried sub-sequence in a data query method according to an embodiment of the present disclosure
  • FIG. 6 is a schematic structural diagram of a data query apparatus according to an embodiment of the present invention.
  • a data query method includes:
  • 105 Obtain a subsequence that matches the subsequence to be queried.
  • the process of obtaining the sub-sequence corresponding to the time of the sub-sequence to be queried in the database DB, that is, obtaining the reconstructed phase space is:
  • Obtain a time parameter corresponding to each element in the sub-sequence to be queried for example: a time point corresponding to Xi in the sub-sequence to be queried is tl, and a time point corresponding to xi+T is t2, xi The time point corresponding to +2 ⁇ is t3, and so on.
  • a sub-sequence set corresponding to the time parameter from a database of a time series of historical data, that is, a first database DB; for example: setting a time parameter as tl, tl, t3...;
  • the time-series database of data that is, the first database DB sequentially queries the sub-sequences corresponding to the times t1, t2, t3... on the previous day, and the sub-sequences corresponding to the times t1, t2, t3... in the previous two days until the first database is to be All the sub-sequences of the corresponding time instants t1, t2, t3... in the DB are all queried, and the sub-sequences of all the corresponding time instants t1, t2, t3... of the query are grouped into one sub-sequence set.
  • a specific implementation process for performing dimension reduction processing on the sub-sequence in the sub-sequence to be queried and the sub-sequence obtained in the obtained data query method is provided in the data query method provided by the embodiment of the present invention.
  • the following is stored in the traffic history database as a case of a 5-minute interval speed; where, a time series is a 1-day speed value, then the length of each time series is less than or equal to 288.
  • 301 Obtain a preset error parameter e, an initial dimension value p, p ⁇ m, and a subroutine to be queried 302: Map the to-be-queried sub-sequence into its corresponding piecewise polynomial feature space; the specific mapping process is as follows:
  • V X eX its length
  • ⁇ , approximated by the following polynomial function in the sense of minimum mean square error:
  • e is the residual sequence, subject to the standard normal distribution, ie w(o, 2 ).
  • mapping of ⁇ Rp is implemented, generally > , so the mapping of ⁇ Rp realizes the dimensionality reduction of time series data.
  • the above-mentioned preset dimension value p may bring a large error, that is, the error between the sub-sequence after the dimension reduction and the pre-dimension sub-sequence exceeds the preset error e , so the embodiment of the present invention can also ensure the following steps The accuracy of the subsequence after dimensionality reduction.
  • step 304 Obtain the actual error to determine whether the value of the actual error ⁇ ⁇ ⁇ ⁇ is less than the preset error e. If it is less than the preset error e, perform step 306; if not less than the preset error e, perform step 305.
  • the result output w is a polynomial representation of ⁇ . And it realizes the process of transforming from m-dimensional time-space to P-dimensional space and realizing dimension reduction.
  • a step of matching a sub-sequence in a sub-sequence after the dimension reduction processing and a sub-sequence in the dimension reduction-processed sub-sequence are performed in a data query method Specific implementation process; the process includes:
  • MBR is the smallest outer rectangle, which is the smallest circumscribed rectangle that surrounds the primitive and is parallel to the ⁇ , ⁇ axis.
  • the trajectory of the original time series in the feature space is divided into multiple sub-tracks by MBR, so that the number of disk accesses is minimized.
  • each node in the R* tree ie, each MBR
  • each MBR needs to store data including " ⁇ , ⁇ 1, ⁇ 1 ..., ⁇ mn , ⁇ , where, Is the unique identification number of each time series; ⁇ and respectively are the starting offset position and the ending offset position in the MBR corresponding time series; F1 ' ⁇ max is the vertex coordinate value of the MBR.
  • a specific implementation process of obtaining a subsequence step matching the subsequence to be queried in a data query method includes:
  • process can also include:
  • Obtaining a set of subsequences matching the subsequence to be queried according to the Euclidean distance threshold. Specifically, a subsequence whose Euclidean distance is less than or equal to the ⁇ , that is, 0.001, is obtained. It should be noted that when the acquired sub-sequence set is the distance between the points in the phase space corresponding to the MBR in the candidate set 1 , 2 , . . . , if ⁇ ⁇ f then it is a similar Subsequence.
  • the query process may be referred to as a PQ query; the subsequence set satisfying ⁇ " ⁇ , for example: the set The sub-sequence is included, and if the sub-sequence with the smallest distance between the sub-sequences is output as a result, the query process may be referred to as a query.
  • a data query device includes:
  • the information obtaining unit 601 is configured to obtain a sub-sequence to be queried and a corresponding time parameter thereof, and the historical sub-sequence obtaining unit 602 is configured to obtain the corresponding time parameter from the historical data according to the corresponding time parameter of the sub-sequence to be queried. Subsequence set;
  • a sequence processing unit 603 configured to perform dimension reduction processing on the sub-sequence in the to-be-queried sub-sequence and the acquired sub-sequence set;
  • the matching query unit 604 is configured to perform a matching query on the sub-sequence in the sub-sequence after the dimension reduction processing;
  • the matching sequence obtaining unit 605 is configured to acquire a subsequence that matches the subsequence to be queried.
  • sequence processing unit 603 includes:
  • the subsequence processing subunit to be queried is used to map the subsequence to be queried into its corresponding piecewise polynomial feature space;
  • a historical subsequence processing subunit configured to map the subsequences in the acquired subsequence set into their corresponding piecewise polynomial feature spaces.
  • the matching query unit 604 includes:
  • a segmentation subunit configured to perform MBR segmentation on the dimension reduction processed subsequence set
  • the matching query sub-unit is configured to perform matching query on the reduced-dimensionally processed sub-sequence and the reduced-dimensional processed sub-sequence set after the MBR segmentation.
  • the matching sequence obtaining unit 605 includes:
  • a distance obtaining sub-unit configured to obtain an Euclidean distance of the sub-sequence in the sub-sequence set after the dimension reduction processing and the sub-sequence to be queried after the dimension reduction processing;
  • a matching sequence acquisition subunit for obtaining the Euclidean distance according to the Obtaining a subsequence that matches the subsequence to be queried.
  • the device also includes:
  • a threshold acquisition unit for obtaining an Euclidean distance threshold
  • a matching subsequence obtaining unit configured to obtain, according to the Euclidean distance threshold, a subsequence set that matches the subsequence to be queried.
  • the data query method and device obtained the sub-sequence set of the corresponding time parameter from the historical data according to the corresponding time parameter of the sub-sequence to be queried. Performing a dimensionality reduction process on the sub-sequence in the acquired sub-sequence and the obtained sub-sequence set; and performing the dimension-reduced sub-sequence to be processed in the reduced-dimensional processed sub-sequence set The subsequence performs a matching query; and obtains a subsequence that matches the subsequence to be queried.
  • the embodiment of the present invention performs the dimensionality reduction processing on the subsequence in the subsequence set of the corresponding time parameter in the subsequence to be queried and the historical data, so that the query time complexity of the whole system is obtained. Reduce, improve the utilization of system resources; and use the method of piecewise polynomial to represent the time series, thus reducing the error in the query process.
  • the steps of the foregoing embodiment can be implemented by a program to instruct related hardware, and the program can be stored in a computer readable manner.
  • the storage medium when the program is executed, the method includes the steps of the foregoing method embodiment, such as: FLASH, ROM/RAM, disk, optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method and a device for data query are provided, related to the field of intelligent transportation system. The method includes: acquiring a sub-series to be queried and the corresponding time parameter thereof (101); acquiring a sub-series set of the corresponding time parameter from history data according to the corresponding time parameter of the sub-series to be queried (102); performing dimensionality reduction on the sub-series to be queried and the sub-series in the obtained sub-series set (103); performing match query on the sub-series to be queried and the sub-series in the obtained sub-series set, on which dimensionality reduction has been performed (104); and acquiring sub-series matching with the sub-series to be queried (105). The method and the device for data query can reduce the time complexity of the data query of the system and improve the resource utilization rate of the system. The problem has been solved that a larger error is easily generated and the query complexity is higher and more system resources are necessarily occupied in the prior art when the system performs data query.

Description

一种数据查询方法及装置 本申请要求了 2009年 12月 30日提交的, 申请号为 200910244152.5 , 发 明名称为 "一种数据查询方法及装置" 的中国申请的优先权, 其全部内容通 过引用结合在本申请中。  The present invention claims the priority of the Chinese application filed on December 30, 2009, the application number is 200910244152.5, and the invention is entitled "a data query method and device", the entire contents of which are incorporated by reference. In this application.
技术领域 Technical field
本发明涉及智能交通系统技术领域,尤其涉及一种数据查询方法及 装置。  The present invention relates to the field of intelligent transportation systems, and in particular, to a data query method and apparatus.
背景技术 Background technique
先进交通信息服务系统( Advanced Traffic Information System , ATIS ) 建立在完善的信息网络基础之上, 该系统可以通过装备在道路、 车、 换乘站、 停车场以及气象中心的传感器或者数据传输设备来获取各类交通信息, 根据 所述获取到的数据进行综合处理。 该系统可以实时向社会提供全面、 准确的 实时道路交通拥堵信息。 但是, 通过所述设备所获取到的数据不能完全覆盖 所有的道路, 从而需要通过历史数据的相似查询进行实时数据填补, 并可用 历史数据经过分析后进行预测。  The Advanced Traffic Information System (ATIS) is based on a well-established information network that can be acquired by sensors or data transmission equipment equipped in roads, cars, transfer stations, parking lots and weather centers. Various types of traffic information are comprehensively processed according to the obtained data. The system provides comprehensive and accurate real-time road traffic congestion information to the community in real time. However, the data acquired by the device cannot completely cover all the roads, so that real-time data filling needs to be performed by similar queries of historical data, and the historical data can be analyzed and predicted.
所述历史数据是一些随着时间变化而形成的有序数据列表, 是一种时间 序列, 简称时序。 时间序列的相似性查询是在时间序列数据集中发现相似的 变化模式, 对于时间序列的预测、 分类及进行知识发现等具有重要意义。 大 规模时间序列数据库相似查询是时间序列数据挖掘的热点内容之一。 通过对 实时数据进行历史数据库中相似查询, 可快速实现对实时数据的填补及预测。 但是由于历史数据的时间序列存在海量性及高维性, 直接进行原始序列的距 离计算, 查找与待查询序列相似的子序列需要占用大量的系统资源。 其中, 所述时间序列常常釆用高级数据表示形式为离散傅立叶变换 DFT法。  The historical data is a list of ordered data formed over time, and is a time series, referred to as timing. The similarity query of time series is to find similar patterns of change in the time series data set, which is of great significance for the prediction, classification and knowledge discovery of time series. Large-scale time series database similar query is one of the hot topics of time series data mining. By performing similar queries in the historical database on real-time data, the filling and prediction of real-time data can be quickly realized. However, due to the massive and high dimensionality of the historical data, the distance calculation of the original sequence is directly performed, and the subsequences similar to the sequence to be queried need to occupy a large amount of system resources. Wherein, the time series often uses a high-level data representation form as a discrete Fourier transform DFT method.
在实现上述数据处理的过程中, 发明人发现现有技术中至少存在如下问 题: 由于目前所述离散傅立叶变换的方法平滑了许多原始序列信息, 不能精 确表示原始序列, 且该方法的时间复杂度为0 (" , 使得系统在进行数据查询 时, 容易产生较大的误差, 且查询的复杂度较高, 需要占用的系统资源较大。 发明内容 In the process of realizing the above data processing, the inventors have found that at least the following problems exist in the prior art: Since the method of the discrete Fourier transform currently smoothes many original sequence information, it cannot be refined. It does represent the original sequence, and the time complexity of the method is 0 (", which makes the system easy to generate large errors when performing data query, and the query complexity is high, and the system resources that need to be occupied are large.
本发明的实施例提供一种数据查询方法及装置。  Embodiments of the present invention provide a data query method and apparatus.
为达到上述目的, 本发明的实施例釆用如下技术方案:  In order to achieve the above object, embodiments of the present invention use the following technical solutions:
一种数据查询方法, 包括:  A data query method, including:
获取待查询子序列及其对应时刻参数;  Obtaining a subsequence to be queried and its corresponding time parameter;
根据所述待查询子序列的对应时刻参数, 从历史数据中获取所述对应时 刻参数的子序列集合;  Obtaining, according to the corresponding time parameter of the sub-sequence to be queried, a sub-sequence set of the corresponding time parameter from the historical data;
将所述待查询子序列和所述获取到的子序列集合中的子序列进行降维处 理;  Performing a dimensionality reduction process on the sub-sequence to be queried and the sub-sequence in the acquired sub-sequence set;
将所述降维处理后的待查询子序列与所述降维处理后的子序列集合中的 子序列进行匹配查询;  Performing matching query on the sub-sequence in the dimension reduction processing sub-sequence after the dimension reduction processing;
获取与所述待查询子序列相匹配的子序列。  Obtaining a subsequence that matches the subsequence to be queried.
一种数据查询装置, 包括:  A data query device includes:
信息获取单元, 用于获取待查询子序列及其对应时刻参数;  An information acquiring unit, configured to acquire a subsequence to be queried and a corresponding time parameter thereof;
历史子序列获取单元, 用于根据所述待查询子序列的对应时刻参数, 从 历史数据中获取所述对应时刻参数的子序列集合;  a history subsequence obtaining unit, configured to acquire, according to the corresponding time parameter of the subsequence to be queried, a subsequence set of the corresponding time parameter from the historical data;
序列处理单元, 用于将所述待查询子序列和所述获取到的子序列集合中 的子序列进行降维处理;  a sequence processing unit, configured to perform a dimensionality reduction process on the sub-sequence in the to-be-queried sub-sequence and the acquired sub-sequence set;
匹配查询单元, 用于将所述降维处理后的待查询子序列与所述降维处理 后的子序列集合中的子序列进行匹配查询;  a matching query unit, configured to perform a matching query on the sub-sequence in the sub-sequence after the dimension reduction processing;
匹配序列获取单元, 用于获取与所述待查询子序列相匹配的子序列。 本发明实施例提供的数据查询方法及装置, 通过获取待查询子序列及其 对应时刻参数; 根据所述待查询子序列的对应时刻参数, 从历史数据中获取 所述对应时刻参数的子序列集合; 将所述待查询子序列和所述获取到的子序 列集合中的子序列进行降维处理; 将所述降维处理后的待查询子序列与所述 降维处理后的子序列集合中的子序列进行匹配查询; 获取与所述待查询子序 列相匹配的子序列。 与现有技术相比, 本发明实施例将所述待查询子序列与 历史数据中获取所述对应时刻参数的子序列集合中的子序列进行了降维处 理, 使得整个系统的查询时间复杂度降低, 提高了系统资源的利用率。 And a matching sequence obtaining unit, configured to acquire a subsequence that matches the subsequence to be queried. The data query method and device provided by the embodiment of the present invention obtain the sub-sequence set of the corresponding time parameter from the historical data according to the corresponding time parameter of the sub-sequence to be queried. ; the sub-sequence to be queried and the obtained sub-sequence The sub-sequences in the set of columns are subjected to a dimensionality reduction process; the sub-sequences in the reduced-dimensionally processed sub-sequences are matched with the sub-sequences in the reduced-dimensionally processed sub-sequences; Matching subsequences. Compared with the prior art, the embodiment of the present invention performs the dimensionality reduction processing on the subsequence in the subsequence set of the corresponding time parameter in the subsequence to be queried and the historical data, so that the query time complexity of the whole system is obtained. Reduced, increased utilization of system resources.
附图说明 DRAWINGS
图 1为本发明实施例提供的一种数据查询方法流程图;  1 is a flowchart of a data query method according to an embodiment of the present invention;
图 2为本发明实施例提供的一种数据查询方法中根据所述待查询子序列 的对应时刻参数, 从历史数据中获取所述对应时刻参数的子序列集合步骤的 实现流程图;  2 is a flowchart of implementing a sub-sequence set step of acquiring a corresponding time parameter from historical data according to a corresponding time parameter of the to-be-queried sub-sequence in a data query method according to an embodiment of the present disclosure;
图 3为本发明实施例提供的一种数据查询方法中将所述待查询子序列和 所述获取到的子序列集合中的子序列进行降维处理的具体实现流程图;  FIG. 3 is a flowchart of a specific implementation of performing a dimension reduction process on a sub-sequence in the sub-sequence to be queried and the obtained sub-sequence set in a data query method according to an embodiment of the present disclosure;
图 4为本发明实施例提供的一种数据查询方法中将所述降维处理后的待 查询子序列与所述降维处理后的子序列集合中的子序列进行匹配查询步骤的 具体实现流程图;  FIG. 4 is a flowchart of a specific implementation process of performing a matching query process on a sub-sequence in a sub-sequence after the dimension reduction process and a sub-sequence in the dimension reduction process in the data query method according to an embodiment of the present disclosure Figure
图 5为本发明实施例提供的一种数据查询方法中获取与所述待查询子序 列相匹配的子序列步骤的具体实现流程图;  FIG. 5 is a flowchart of a specific implementation of a step of acquiring a subsequence matching the to-be-queried sub-sequence in a data query method according to an embodiment of the present disclosure;
图 6为本发明实施例提供的一种数据查询装置结构示意图。  FIG. 6 is a schematic structural diagram of a data query apparatus according to an embodiment of the present invention.
具体实施方式 下面结合附图对本发明实施例数据查询方法及装置进行详细描 述。 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The data query method and apparatus of the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
如图 1所示, 为本发明实施例提供的一种数据查询方法, 该方法 包括:  As shown in FIG. 1 , a data query method according to an embodiment of the present invention includes:
101 : 获取待查询子序列及其对应时刻参数;  101: Obtain a subsequence to be queried and a corresponding time parameter thereof;
102 : 根据所述待查询子序列的对应时刻参数, 从历史数据中获 取所述对应时刻参数的子序列集合;  Obtaining, according to the corresponding time parameter of the sub-sequence to be queried, obtaining a sub-sequence set of the corresponding time parameter from the historical data;
103 : 将所述待查询子序列和所述获取到的子序列集合中的子序 列进行降维处理; 103: the sub-sequence in the sub-sequence to be queried and the obtained sub-sequence set Columns for dimensionality reduction;
104: 将所述降维处理后的待查询子序列与所述降维处理后的子 序列集合中的子序列进行匹配查询;  104: Perform matching query on the sub-sequence in the dimension reduction processing sub-sequence after the dimension reduction processing;
105: 获取与所述待查询子序列相匹配的子序列。  105: Obtain a subsequence that matches the subsequence to be queried.
如图 2所示,为本发明实施例提供的一种数据查询方法中根据所 述待查询子序列的对应时刻参数, 从历史数据中获取所述对应时刻 参数的子序列集合步骤的实现流程;  As shown in FIG. 2, in an data query method according to an embodiment of the present invention, an implementation process of obtaining a sub-sequence set step of the corresponding time parameter from historical data according to a corresponding time parameter of the to-be-queried sub-sequence;
设历史数据的时间序列的数据库为第一数据库 DB; 其中, 存储 了 N 个长度不 同 的 时 间 序 列 ; 当 前待查询 子序 列 为 Xi =(xi,xi+T,-,xi+(m_l ),i = \,2,-,n-(m-\)T_ 其中, 为嵌入维数; r为延迟 时间, = 12,…; 是相空间中的点。 从所述第一数据库 DB 中获取 与所述待查询子序列时刻所对应的子序列集合, 即获取重构相空间 的过程为: Let the database of the time series of historical data be the first database DB; wherein, N time series with different lengths are stored; the current subsequence to be queried is Xi = (x i , x i+T , -, x i+(m _ l ), i = \,2,-,n-(m-\)T_ where, is the embedding dimension; r is the delay time, = 1 , 2 ,...; is the point in the phase space. The process of obtaining the sub-sequence corresponding to the time of the sub-sequence to be queried in the database DB, that is, obtaining the reconstructed phase space is:
201: 获取所述待查询子序列中各个元素所对应的时刻参数; 例 如: 所述待查询子序列中的 Xi所对应的时刻点为 tl,xi+T所对应的时 刻点为 t2, ,xi+2 τ所对应的时刻点为 t3,依次类推。 201: Obtain a time parameter corresponding to each element in the sub-sequence to be queried; for example: a time point corresponding to Xi in the sub-sequence to be queried is tl, and a time point corresponding to xi+T is t2, xi The time point corresponding to +2 τ is t3, and so on.
202: 根据所述时刻参数, 从历史数据的时间序列的数据库即第 一数据库 DB 中查找所述时刻参数对应的子序列集合; 例如: 设时 刻参数为 tl, tl, t3...; 从历史数据的时间序列的数据库即第一数据 库 DB中依次查询前一天对应时刻 tl, t2, t3…的子序列, 前两天对 应时刻 tl, t2, t3…的子序列, 直到将所述第一数据库 DB中所有对 应时刻 tl, t2, t3...的子序列全部查询出来, 将所述查询到的所有对 应时刻 tl, t2, t3...的子序列组成一个子序列集合。  202: Search, according to the time parameter, a sub-sequence set corresponding to the time parameter from a database of a time series of historical data, that is, a first database DB; for example: setting a time parameter as tl, tl, t3...; The time-series database of data, that is, the first database DB sequentially queries the sub-sequences corresponding to the times t1, t2, t3... on the previous day, and the sub-sequences corresponding to the times t1, t2, t3... in the previous two days until the first database is to be All the sub-sequences of the corresponding time instants t1, t2, t3... in the DB are all queried, and the sub-sequences of all the corresponding time instants t1, t2, t3... of the query are grouped into one sub-sequence set.
如图 3所示,为本发明实施例提供的一种数据查询方法中将所述 待查询子序列和所述获取到的子序列集合中的子序列进行降维处理 的具体实现流程; 该流程包括:  As shown in FIG. 3, a specific implementation process for performing dimension reduction processing on the sub-sequence in the sub-sequence to be queried and the sub-sequence obtained in the obtained data query method is provided in the data query method provided by the embodiment of the present invention; Includes:
以下通过交通历史数据库中存放的是以 5 分钟间隔的车速值为 例; 其中, 一个时间序列为 1 天的车速值, 那么每个时间序列的长 度是小于或等于 288。 设待查询的子序列为 ^= , x2,...,xm}, 长度为 m<288; 预设误差大小为 e, 初始维度值为 p<m; The following is stored in the traffic history database as a case of a 5-minute interval speed; where, a time series is a 1-day speed value, then the length of each time series is less than or equal to 288. The subsequence to be queried is ^= , x 2 ,..., x m }, the length is m<288; the preset error size is e, and the initial dimension value is p<m;
301: 获取预设误差参数 e、 初始维度值 p, p<m以及待查询子序 302: 将所述待查询子序列映射到其对应的分段多项式特征空间 中; 其具体的映射过程如下: 301: Obtain a preset error parameter e, an initial dimension value p, p<m, and a subroutine to be queried 302: Map the to-be-queried sub-sequence into its corresponding piecewise polynomial feature space; the specific mapping process is as follows:
VXeX其长度 |χ|=∞,在最小均方误差意义下用如下多项式函数 近似: V X eX its length |χ|=∞, approximated by the following polynomial function in the sense of minimum mean square error:
/(t ) = w。 +Wlt + w2t2十…十 w- 1 即将 X影射 多项式基于 ^'^''''tp—1}张成的 P维特征空间中的点 ω = ( ,0, ,1,···,^_1) , 此处称 示如下: 式中 Q = (lV..,iV..,mT) i,2,...,m ./(t ) = w. + Wl t + w 2 t 2 ten... ten w- 1 is the X-ray polynomial based on ^'^''''tp- 1 } Zhang's point in the P-dimensional feature space ω = ( , 0 , , 1 , · ··,^_ 1 ) , here is shown as follows: where Q = (lV.., iV.., m T ) i, 2,...,m .
Figure imgf000007_0001
Figure imgf000007_0001
X的逆变换为: 5 =F— ^ω Οω; The inverse transformation of X is: 5 = F - ^ω Οω;
χ与 χ'之间满足: χ=χ'+  Satisfy between χ and χ': χ=χ'+
其中, e是残差序列, 服从标准正态分布, 即 w(o, 2)。 Where e is the residual sequence, subject to the standard normal distribution, ie w(o, 2 ).
利用该转换, 实现了 ^→Rp的映射, 一般 > , 因此 →Rp的 映射实现了时间序列数据的降维。  Using this transformation, the mapping of ^→Rp is implemented, generally > , so the mapping of →Rp realizes the dimensionality reduction of time series data.
由于上述预设维度值 p可能会带来较大的误差,即降维后的子序 列与降维前子序列之间的误差超出预设误差 e ,所以本发明实施例还 可以通过如下步骤保证降维后子序列的精确性。 The above-mentioned preset dimension value p may bring a large error, that is, the error between the sub-sequence after the dimension reduction and the pre-dimension sub-sequence exceeds the preset error e , so the embodiment of the present invention can also ensure the following steps The accuracy of the subsequence after dimensionality reduction.
项式函数近似, 得 到
Figure imgf000007_0002
Approximate function
Figure imgf000007_0002
304: 获取实际误差 判断所述实际误差 Ιχ -χΐ的值是否小 于预设误差 e, 如果小于预设误差 e, 则执行步骤 306; 如果不小于 预设误差 e, 则执行步骤 305。 304: Obtain the actual error to determine whether the value of the actual error Ι χ χ ΐ is less than the preset error e. If it is less than the preset error e, perform step 306; if not less than the preset error e, perform step 305.
305: 更新 p值; 例如: 增加 p值, 执行步骤 302。  305: Update the p value; for example: increase the p value, go to step 302.
306, 结果输出 w即为 ^的多项式表示。 并且实现了从 m维时序 空间转换到 P维空间, 实现了降维的过程。 306, the result output w is a polynomial representation of ^. And it realizes the process of transforming from m-dimensional time-space to P-dimensional space and realizing dimension reduction.
需要注意的是,将所述获取到的子序列集合中的子序列分别映射 到其对应的分段多项式特征空间中的过程与上述降维过程相同, 此 处不再赘述。  It should be noted that the process of mapping the subsequences in the obtained subsequence sets into their corresponding segment polynomial feature spaces is the same as the above dimension reduction process, and details are not described herein again.
如图 4所示,为本发明实施例提供的一种数据查询方法中将所述 降维处理后的待查询子序列与所述降维处理后的子序列集合中的子 序列进行匹配查询步骤的具体实现流程; 该流程包括:  As shown in FIG. 4, in a data query method according to an embodiment of the present invention, a step of matching a sub-sequence in a sub-sequence after the dimension reduction processing and a sub-sequence in the dimension reduction-processed sub-sequence are performed in a data query method Specific implementation process; the process includes:
401: 将所述降维处理后的子序列集合进行 MBR(最小外包矩形) 分割; 所述 MBR分割的实现过程如下: 401: Perform MBR (minimum outsourcing rectangle) on the reduced sequence subsequence set Segmentation; The implementation process of the MBR segmentation is as follows:
MBR即最小外包矩形, 就是包围图元, 且平行于 Χ, Υ轴的最小外接矩形。 将原始时间序列在特征空间中的轨迹用 MBR分割成多个子轨迹, 使得磁盘访 问次数最小。  MBR is the smallest outer rectangle, which is the smallest circumscribed rectangle that surrounds the primitive and is parallel to the Χ, Υ axis. The trajectory of the original time series in the feature space is divided into multiple sub-tracks by MBR, so that the number of disk accesses is minimized.
MBR索引方法中, 建立 R*树, R*树中的每一个节点(即每个 MBR ) , 需 要存储的数据包括 "^ ,^^1皿,^1 … ,^mn,^皿, 其中, 是每个时间序 列的唯一识别号; ^和 分别是该 MBR对应时间序列中的起始偏移位置和 结束偏移位置; F1 ' ^max是该 MBR的顶点坐标值。 In the MBR indexing method, an R* tree is established, and each node in the R* tree (ie, each MBR) needs to store data including "^, ^^1, ^1 ..., ^ mn , ^, where, Is the unique identification number of each time series; ^ and respectively are the starting offset position and the ending offset position in the MBR corresponding time series; F1 ' ^max is the vertex coordinate value of the MBR.
402 : 将所述降维处理后的待查询子序列与 MBR分割后的所述降维 处理后的子序列集合进行匹配查询。 具体的讲, 就是用 在索引文件中 搜索所有符合如下条件的 MBR作为候选集: Wq c MBR。 402: Perform matching query on the reduced-dimensionally processed sub-sequence and the reduced-dimensional processed sub-sequence set after the MBR segmentation. Specifically, it is used in the index file to search for all MBRs that meet the following conditions as a candidate set: Wq c MBR.
如图 5所示,为本发明实施例提供的一种数据查询方法中获取与 所述待查询子序列相匹配的子序列步骤的具体实现流程, 该流程包 括:  As shown in FIG. 5, a specific implementation process of obtaining a subsequence step matching the subsequence to be queried in a data query method according to an embodiment of the present invention includes:
501 : 获取所述降维处理后的子序列集合中子序列与所述降维处 理后的待查询子序列的 Euclidean (欧几里德 ) 距离。  501: Obtain an Euclidean distance of the sub-sequence in the sub-sequence set after the dimension reduction processing and the sub-sequence to be queried after the dimension reduction processing.
其中, 获取所述 Euclidean距离的具体过程为:  The specific process of obtaining the Euclidean distance is:
V%e X , VJ G X ,
Figure imgf000008_0001
V%e X , VJ GX ,
Figure imgf000008_0001
是 χ , _y之间的实际 Euc l idean距离。 需要注意的是, 当所述降维处理后的待查询子序列与 MBR分割后的 所述降维处理后的子序列集合进行匹配查询时,则仅获取 MBR作为候选 集中子序列与与所述降维处理后的待查询子序列的 Euclidean距离。  Is the actual Euc l idean distance between χ and _y. It is to be noted that, when the sub-sequence to be queried after the dimension reduction process is matched with the sub-sequence set after the MBR segmentation, only the MBR is acquired as the candidate concentrator sequence and The Euclidean distance of the sub-sequence to be queried after the dimension reduction process.
502 : 根据所述获取到的 Euclidean距离, 获取与所述待查询子 序列 q相匹配的子序列。 502: Obtain a subsequence that matches the to-be-queried subsequence q according to the acquired Euclidean distance.
需要注意的是, 该流程还可以包括:  It should be noted that the process can also include:
获取 Euclidean 距离阔值; 例如: 设 Euclidean 距离阔值 ε为 0.001  Get the Euclidean distance threshold; for example: Let Euclidean distance ε be 0.001
根据所述 Euclidean距离阔值, 获取与所述待查询子序列相匹配 的子序列集合。 具体的讲, 获取 Euclidean距离小于等于所述 ε , 即 0.001的子序列。 需要注意的是, 当所述获取的子序列集合为候选集合中的 MBR 所对应的相空间中的点 = 12,…与 之间的距离时, 如果 ^ ^ f 那么 就是一个与 相似的子序列。 Obtaining a set of subsequences matching the subsequence to be queried according to the Euclidean distance threshold. Specifically, a subsequence whose Euclidean distance is less than or equal to the ε, that is, 0.001, is obtained. It should be noted that when the acquired sub-sequence set is the distance between the points in the phase space corresponding to the MBR in the candidate set = 1 , 2 , . . . , if ^ ^ f then it is a similar Subsequence.
还需要注意的是, 如果将所有满足上式的子序列作为结果输 出, 则可以将所述查询过程称为 PQ 查询; 将所述满足 ^^"^ ^ ^的 子序列集合, 例如: 该集合中包含有 个子序列, 如果所述 个子序 列中与?之间距离最小的子序列作为结果输出, 则可以将所述查询 过程称为 查询。  It should also be noted that if all subsequences satisfying the above formula are output as a result, the query process may be referred to as a PQ query; the subsequence set satisfying ^^"^^^, for example: the set The sub-sequence is included, and if the sub-sequence with the smallest distance between the sub-sequences is output as a result, the query process may be referred to as a query.
如图 6 所示, 为本发明实施例提供的一种数据查询装置, 该装 置包括:  As shown in FIG. 6, a data query device according to an embodiment of the present invention includes:
信息获取单元 601 , 用于获取待查询子序列及其对应时刻参数; 历史子序列获取单元 602 ,用于根据所述待查询子序列的对应时 刻参数, 从历史数据中获取所述对应时刻参数的子序列集合;  The information obtaining unit 601 is configured to obtain a sub-sequence to be queried and a corresponding time parameter thereof, and the historical sub-sequence obtaining unit 602 is configured to obtain the corresponding time parameter from the historical data according to the corresponding time parameter of the sub-sequence to be queried. Subsequence set;
序列处理单元 603 ,用于将所述待查询子序列和所述获取到的子 序列集合中的子序列进行降维处理;  a sequence processing unit 603, configured to perform dimension reduction processing on the sub-sequence in the to-be-queried sub-sequence and the acquired sub-sequence set;
匹配查询单元 604 ,用于将所述降维处理后的待查询子序列与所 述降维处理后的子序列集合中的子序列进行匹配查询;  The matching query unit 604 is configured to perform a matching query on the sub-sequence in the sub-sequence after the dimension reduction processing;
匹配序列获取单元 605 ,用于获取与所述待查询子序列相匹配的 子序列。  The matching sequence obtaining unit 605 is configured to acquire a subsequence that matches the subsequence to be queried.
需要注意的是, 所述序列处理单元 603 , 包括:  It is to be noted that the sequence processing unit 603 includes:
待查询子序列处理子单元, 用于将所述待查询子序列映射到其 对应的分段多项式特征空间中;  The subsequence processing subunit to be queried is used to map the subsequence to be queried into its corresponding piecewise polynomial feature space;
历史子序列处理子单元, 用于将所述获取到的子序列集合中的 子序列分别映射到其对应的分段多项式特征空间中。  And a historical subsequence processing subunit, configured to map the subsequences in the acquired subsequence set into their corresponding piecewise polynomial feature spaces.
还需要注意的是, 所述匹配查询单元 604 , 包括:  It should be noted that the matching query unit 604 includes:
分割子单元,用于将所述降维处理后的子序列集合进行 MBR分 割;  a segmentation subunit, configured to perform MBR segmentation on the dimension reduction processed subsequence set;
匹配查询子单元, 用于将所述降维处理后的待查询子序列与 MBR分割后的所述降维处理后的子序列集合进行匹配查询。  The matching query sub-unit is configured to perform matching query on the reduced-dimensionally processed sub-sequence and the reduced-dimensional processed sub-sequence set after the MBR segmentation.
还需要注意的是, 所述匹配序列获取单元 605 , 包括:  It is also to be noted that the matching sequence obtaining unit 605 includes:
距离获取子单元, 用于获取所述降维处理后的子序列集合中子 序列与所述降维处理后的待查询子序列的 Euclidean距离;  a distance obtaining sub-unit, configured to obtain an Euclidean distance of the sub-sequence in the sub-sequence set after the dimension reduction processing and the sub-sequence to be queried after the dimension reduction processing;
匹配序列获取子单元, 用于根据所述获取到的 Euclidean距离, 获取与所述待查询子序列相匹配的子序列。 a matching sequence acquisition subunit for obtaining the Euclidean distance according to the Obtaining a subsequence that matches the subsequence to be queried.
还需要注意的是, 该装置还包括:  It should also be noted that the device also includes:
阔值获取单元, 用于获取 Euclidean距离阔值;  a threshold acquisition unit for obtaining an Euclidean distance threshold;
匹配子序列获取单元, 用于根据所述 Euclidean距离阔值, 获取 与所述待查询子序列相匹配的子序列集合。  And a matching subsequence obtaining unit, configured to obtain, according to the Euclidean distance threshold, a subsequence set that matches the subsequence to be queried.
本发明实施例提供的数据查询方法及装置, 通过获取待查询子 序列及其对应时刻参数; 根据所述待查询子序列的对应时刻参数, 从历史数据中获取所述对应时刻参数的子序列集合; 将所述待查询 子序列和所述获取到的子序列集合中的子序列进行降维处理; 将所 述降维处理后的待查询子序列与所述降维处理后的子序列集合中的 子序列进行匹配查询; 获取与所述待查询子序列相匹配的子序列。 与现有技术相比, 本发明实施例将所述待查询子序列与历史数据中 获取所述对应时刻参数的子序列集合中的子序列进行了降维处理, 使得整个系统的查询时间复杂度降低, 提高了系统资源的利用率; 且釆用了分段多项式的方法来表示时间序列, 从而减小了查询过程 中的误差。  The data query method and device provided by the embodiment of the present invention obtain the sub-sequence set of the corresponding time parameter from the historical data according to the corresponding time parameter of the sub-sequence to be queried. Performing a dimensionality reduction process on the sub-sequence in the acquired sub-sequence and the obtained sub-sequence set; and performing the dimension-reduced sub-sequence to be processed in the reduced-dimensional processed sub-sequence set The subsequence performs a matching query; and obtains a subsequence that matches the subsequence to be queried. Compared with the prior art, the embodiment of the present invention performs the dimensionality reduction processing on the subsequence in the subsequence set of the corresponding time parameter in the subsequence to be queried and the historical data, so that the query time complexity of the whole system is obtained. Reduce, improve the utilization of system resources; and use the method of piecewise polynomial to represent the time series, thus reducing the error in the query process.
通过以上的实施方式的描述, 本领域普通技术人员可以理解: 实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相 关的硬件来完成, 所述的程序可以存储于一计算机可读取存储介质 中, 该程序在执行时, 包括如上述方法实施例的步骤, 所述的存储 介质, 如: FLASH、 ROM/RAM、 磁碟、 光盘等。  Through the description of the above embodiments, those skilled in the art can understand that all or part of the steps of the foregoing embodiment can be implemented by a program to instruct related hardware, and the program can be stored in a computer readable manner. In the storage medium, when the program is executed, the method includes the steps of the foregoing method embodiment, such as: FLASH, ROM/RAM, disk, optical disk, and the like.
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并 不局限于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范 围内, 可轻易想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护范围应所述以权利要求的保护范围为准。  The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Claims

权 利 要求 书 Claim
1、 一种数据查询方法, 其特征在于, 包括:  A data query method, comprising:
获取待查询子序列及其对应时刻参数;  Obtaining a subsequence to be queried and its corresponding time parameter;
根据所述待查询子序列的对应时刻参数, 从历史数据中获取所述对应时刻 参数的子序列集合;  Obtaining, according to the corresponding time parameter of the sub-sequence to be queried, a sub-sequence set of the corresponding time parameter from the historical data;
将所述待查询子序列和所述获取到的子序列集合中的子序列进行降维处 理;  Performing a dimensionality reduction process on the sub-sequence to be queried and the sub-sequence in the acquired sub-sequence set;
将所述降维处理后的待查询子序列与所述降维处理后的子序列集合中的子 序列进行匹配查询;  Performing a matching query on the sub-sequence in the sub-sequence set after the dimension reduction processing;
获取与所述待查询子序列相匹配的子序列。  Obtaining a subsequence that matches the subsequence to be queried.
2、 根据权利要求 1所述的数据查询方法, 其特征在于, 将所述待查询子序 列和所述获取到的子序列集合中的子序列进行降维处理的步骤, 包括:  The data query method according to claim 1, wherein the step of performing dimension reduction processing on the sub-sequence in the sub-sequence to be queried and the sub-sequence obtained in the obtained sub-sequence includes:
将所述待查询子序列映射到其对应的分段多项式特征空间中;  Mapping the to-be-queried subsequence into its corresponding piecewise polynomial feature space;
将所述获取到的子序列集合中的子序列分别映射到其对应的分段多项式特 征空间中。  The subsequences in the acquired set of subsequences are respectively mapped into their corresponding piecewise polynomial feature spaces.
3、 根据权利要求 1或 2所述的数据查询方法, 其特征在于, 将所述降维处理 后的待查询子序列与所述降维处理后的子序列集合中的子序列进行匹配查询的 步骤, 包括:  The data query method according to claim 1 or 2, wherein the sub-sequence in the reduced-dimensionally processed sub-sequence is matched with the sub-sequence in the reduced-dimensional processed sub-sequence Steps, including:
将所述降维处理后的子序列集合进行最小外包矩形分割;  Performing the minimum outsourcing rectangle segmentation on the reduced dimensionally processed subsequence set;
将所述降维处理后的待查询子序列与所述最小外包矩形分割后的所述降维 处理后的子序列集合进行匹配查询。  Matching the reduced-dimensionally processed sub-sequence after the dimension-reduced sub-sequence to the reduced-dimensional processed sub-sequence set after the minimum outsourcing rectangle is segmented.
4、 根据权利要求 1或 2所述的数据查询方法, 其特征在于, 获取与所述待查 询子序列相匹配的子序列的步骤, 包括:  The data query method according to claim 1 or 2, wherein the step of acquiring a subsequence matching the subsequence to be queried comprises:
获取所述降维处理后的子序列集合中子序列与所述降维处理后的待查询子 序列的欧几里德距离;  Obtaining a Euclidean distance of the sub-sequence in the sub-sequence set after the dimension reduction processing and the sub-sequence to be queried after the dimension reduction processing;
根据所述获取到的欧几里德距离, 获取与所述待查询子序列相匹配的子序 列。 Obtaining an subsequence matching the subsequence to be queried according to the acquired Euclidean distance Column.
5、 根据权利要求 4所述的数据查询方法, 其特征在于, 该方法还包括: 获取欧几里德距离阔值;  The data query method according to claim 4, further comprising: obtaining a Euclidean distance threshold;
根据所述欧几里德距离阔值, 获取与所述待查询子序列相匹配的子序列集 合。  According to the Euclidean distance threshold, a subsequence set matching the subsequence to be queried is obtained.
6、 一种数据查询装置, 其特征在于, 包括:  6. A data query device, comprising:
信息获取单元, 用于获取待查询子序列及其对应时刻参数;  An information acquiring unit, configured to acquire a subsequence to be queried and a corresponding time parameter thereof;
历史子序列获取单元, 用于根据所述待查询子序列的对应时刻参数, 从历 史数据中获取所述对应时刻参数的子序列集合;  a history subsequence obtaining unit, configured to acquire, according to the corresponding time parameter of the subsequence to be queried, a subsequence set of the corresponding time parameter from the historical data;
序列处理单元, 用于将所述待查询子序列和所述获取到的子序列集合中的 子序列进行降维处理;  a sequence processing unit, configured to perform a dimensionality reduction process on the sub-sequence in the sub-sequence to be queried and the acquired sub-sequence set;
匹配查询单元, 用于将所述降维处理后的待查询子序列与所述降维处理后 的子序列集合中的子序列进行匹配查询;  a matching query unit, configured to perform matching query between the reduced-dimensionally processed sub-sequence and the sub-sequence in the reduced-dimensional processed sub-sequence;
匹配序列获取单元, 用于获取与所述待查询子序列相匹配的子序列。  And a matching sequence obtaining unit, configured to acquire a subsequence that matches the subsequence to be queried.
7、 根据权利要求 6所述的数据查询装置, 其特征在于, 所述序列处理单元, 包括:  The data query device according to claim 6, wherein the sequence processing unit comprises:
待查询子序列处理子单元, 用于将所述待查询子序列映射到其对应的分段 多项式特征空间中;  a subsequence processing subunit to be queried, configured to map the subsequence to be queried into its corresponding segment polynomial feature space;
历史子序列处理子单元, 用于将所述获取到的子序列集合中的子序列分别 映射到其对应的分段多项式特征空间中。  The historical subsequence processing subunit is configured to map the subsequences in the acquired subsequence set into their corresponding piecewise polynomial feature spaces, respectively.
8、 根据权利要求 6或 7所述的数据查询装置, 其特征在于, 所述匹配查询单 元, 包括:  The data query device according to claim 6 or 7, wherein the matching query unit comprises:
分割子单元, 用于将所述降维处理后的子序列集合进行最小外包矩形分割; 匹配查询子单元, 用于将所述降维处理后的待查询子序列与所述最小外包 矩形分割后的所述降维处理后的子序列集合进行匹配查询。  a segmentation subunit, configured to perform a minimum outsourcing rectangle segmentation on the dimension reduction processed subsequence set; and a matching query subunit, configured to divide the dimension reduction processed subsequence to be subdivided into the minimum outsourcing rectangle The sub-sequence set after the dimension reduction processing performs a matching query.
9、 根据权利要求 6或 7所述的数据查询装置, 其特征在于, 所述匹配序列获 取单元, 包括: The data query device according to claim 6 or 7, wherein the matching sequence is obtained Take the unit, including:
距离获取子单元, 用于获取所述降维处理后的子序列集合中子序列与所述 降维处理后的待查询子序列的欧几里德距离;  a distance obtaining sub-unit, configured to obtain a Euclidean distance of the sub-sequence in the sub-sequence set after the dimension reduction processing and the sub-sequence to be queried after the dimension reduction processing;
匹配序列获取子单元, 用于根据所述获取到的欧几里德距离, 获取与所述 待查询子序列相匹配的子序列。  And a matching sequence obtaining subunit, configured to obtain a subsequence matching the subsequence to be queried according to the acquired Euclidean distance.
10、 根据权利要求 9所述的数据查询装置, 其特征在于, 该装置还包括: 阔值获取单元, 用于获取欧几里德距离阔值;  The data query device according to claim 9, wherein the device further comprises: a threshold value obtaining unit, configured to acquire a Euclidean distance threshold;
匹配子序列获取单元, 用于根据所述欧几里德距离阔值, 获取与所述待查 询子序列相匹配的子序列集合。  And a matching subsequence obtaining unit, configured to obtain, according to the Euclidean distance threshold, a set of subsequences that match the subsequence to be searched.
PCT/CN2010/079728 2009-12-30 2010-12-13 Method and device for data query WO2011079706A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200910244152.5 2009-12-30
CN200910244152A CN101763417A (en) 2009-12-30 2009-12-30 Data query method and device

Publications (1)

Publication Number Publication Date
WO2011079706A1 true WO2011079706A1 (en) 2011-07-07

Family

ID=42494581

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/079728 WO2011079706A1 (en) 2009-12-30 2010-12-13 Method and device for data query

Country Status (2)

Country Link
CN (1) CN101763417A (en)
WO (1) WO2011079706A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763417A (en) * 2009-12-30 2010-06-30 北京世纪高通科技有限公司 Data query method and device
CN104077309B (en) * 2013-03-28 2018-05-08 日电(中国)有限公司 A kind of method and apparatus that dimension-reduction treatment is carried out to time series
CN103235822B (en) * 2013-05-03 2016-05-25 富景天策(北京)气象科技有限公司 The generation of database and querying method
CN106294348B (en) * 2015-05-13 2019-07-09 深圳市智美达科技有限公司 For the real-time sort method and device of real-time report data
CN107832347B (en) * 2017-10-16 2021-12-31 北京京东尚科信息技术有限公司 Data dimension reduction method and system and electronic equipment
CN107908593B (en) * 2017-12-12 2018-10-30 清华大学 A kind of subsequence search method and system based on frequency domain character
CN109033289A (en) * 2018-07-13 2018-12-18 天津瑞能电气有限公司 A kind of banking procedure of the high frequency real time data for micro-capacitance sensor

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239753A1 (en) * 2006-04-06 2007-10-11 Leonard Michael J Systems And Methods For Mining Transactional And Time Series Data
CN101763417A (en) * 2009-12-30 2010-06-30 北京世纪高通科技有限公司 Data query method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239753A1 (en) * 2006-04-06 2007-10-11 Leonard Michael J Systems And Methods For Mining Transactional And Time Series Data
CN101763417A (en) * 2009-12-30 2010-06-30 北京世纪高通科技有限公司 Data query method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI AI GUO ET AL.: "Dimensionality Reduction and Similarity Search in Large Time Series Databases", CHINESE JOURNAL OF COMUPTERS, vol. 28, no. 9, September 2005 (2005-09-01), pages 1468 - 1471 *

Also Published As

Publication number Publication date
CN101763417A (en) 2010-06-30

Similar Documents

Publication Publication Date Title
WO2011079706A1 (en) Method and device for data query
CN111488984B (en) Method for training track prediction model and track prediction method
WO2022033252A1 (en) Video matching method and apparatus, and blockchain-based infringement evidence storage method and apparatus
WO2018040503A1 (en) Method and system for obtaining search results
JP2022519963A (en) Incident search methods, devices, equipment and storage media based on the knowledge graph
US9442929B2 (en) Determining documents that match a query
CN103473307B (en) Across media sparse hash indexing means
CN103678491A (en) Method based on Hadoop small file optimization and reverse index establishment
CN111639092B (en) Personnel flow analysis method and device, electronic equipment and storage medium
Liu et al. Compressing large scale urban trajectory data
WO2024016946A1 (en) Cost estimation method, electronic device, storage medium and computer program product
CN105359142B (en) Hash connecting method and device
Xu et al. The greatest Hosoya index of bicyclic graphs with given maximum degree
CN113704565B (en) Learning type space-time index method, device and medium based on global interval error
CN111813744A (en) File searching method, device, equipment and storage medium
US20140012858A1 (en) Data processing method, data query method in a database, and corresponding device
CN112000628B (en) Multichannel laser radar data storage method and device and electronic equipment
CN104102680A (en) Coding indexing mode for time sequences
WO2023169496A1 (en) Data processing method and apparatus, electronic device, and storage medium
CN116578724A (en) Knowledge base knowledge structure construction method and device, storage medium and terminal
JP7082542B2 (en) Trajectory search device and trajectory search program
CN112711545B (en) Data access method based on array linked list type queue structure
CN114255588A (en) Regional motor vehicle illegal parking prediction method, storage medium and device
Cai et al. Trajectory similarity measuring with grid-based DTW
Liao et al. Bow image retrieval method based on SSD target detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10840495

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10840495

Country of ref document: EP

Kind code of ref document: A1