CN109726225A - A kind of storage of distributed stream data and querying method based on Storm - Google Patents

A kind of storage of distributed stream data and querying method based on Storm Download PDF

Info

Publication number
CN109726225A
CN109726225A CN201910026601.2A CN201910026601A CN109726225A CN 109726225 A CN109726225 A CN 109726225A CN 201910026601 A CN201910026601 A CN 201910026601A CN 109726225 A CN109726225 A CN 109726225A
Authority
CN
China
Prior art keywords
data
subquery
query
server
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910026601.2A
Other languages
Chinese (zh)
Other versions
CN109726225B (en
Inventor
蔡瑞初
林峰极
郝志峰
王立
黄泽林
陈炳丰
温雯
王丽娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201910026601.2A priority Critical patent/CN109726225B/en
Publication of CN109726225A publication Critical patent/CN109726225A/en
Application granted granted Critical
Publication of CN109726225B publication Critical patent/CN109726225B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The present invention provide it is a kind of based on Storm distributed stream data storage and querying method, the present invention is based on Storm data stream type Computational frames, CEPHFS is as under data bottom storage system, pass through the signature analysis to distributive type data, real-time subregion and index construct are carried out to data, by the good data block compression deposit CEPHFS of subregion.According to the attribute of the key of data block and two dimensions of temporal when search operation, it is corresponding subquery by query decomposition, and the file that may contain required data is only read by bloomFilter method, qualified data are selected by predicate, aggregate operation is carried out after submitting subquery results to merge, and returns to user.Computing resource is made full use of to improve the efficiency of data storage and inquiry.The present invention have the characteristics that application scenarios extensively, low time delay, load balancing, and can be realized high speed storing.

Description

A kind of storage of distributed stream data and querying method based on Storm
Technical field
The present invention relates to technical field of data processing, especially a kind of distributed stream data based on Storm are stored and are looked into Inquiry method.
Background technique
With the fast development of network technology, the high speed of real-time streaming data caused by social networks and Location Service Platform etc. Increase, occurs carrying out magnanimity flow data the requirement of processing response in real time in more and more fields, so that the high speed of data Insertion and real-time searching become a very important data-handling capacity, user can obtain in real time desired historical data and New data.For providing the platform such as Baidu map of location service, Amap etc. is per second all instantaneously to produce the position of magnanimity Information and trail change data, in order to meet the needs of users and improve company's benefit, plateform system is required to support Real-time insertion storage on million grades of flow datas is inquired with low delay, such as client needs to obtain 5km range near current time The GPS information of interior all vehicles, or specify driving trace of certain vehicle within past 1 hour.
Common key-value memory technology open source, which is realized, updates leaf node band as HBase is reduced using LSM-Tree The time overhead come, but new data and the historical data needs being inserted into every time are updated, in the inquiry time delay of time range It is excessively high;The Druid of common time series databases technology such as Alibaba's open source only supports inverted index, looks into key range It is more inefficient in inquiry.In order to solve this problem, must design one can be carried out high speed storing and reality for magnanimity flow data When the distributed data base technique inquired, all support efficient inquiry in key scope and time range, this requires data of newly arriving It can be separated with historical data, avoid the traversal of unrelated range data as far as possible in inquiry, improve search efficiency, guarantee simultaneously The load balancing of system difference node, carrys out the utilization rate of maximum resource.
Summary of the invention
In view of the deficiencies of the prior art, the present invention provide it is a kind of based on Storm distributed stream data storage and issuer Method, present invention analysis flow data can reach the stability feature with data distribution according to close sequence under true environment, With problem efficient in key scope and time range is unable to satisfy in present database technology, provide a kind of magnanimity flow data Under efficient index and time-domain range real-time query processing method.The present invention is directed to by for upcoming flow data into Line range divides, and is respectively stored into distributed file system after different machines nodal parallel index, and when inquiry carries out inquiry point Solution, executes subquery parallel, filters, and after the operation such as polymerization, amalgamation result is returned.
The technical solution of the present invention is as follows: a kind of storage of distributed stream data and querying method, the present invention based on Storm are logical The B+Tree index for establishing several isolation ranges in real time when receiving distributed stream data is crossed, distribution is arrived in storage after reaching threshold value File system, and query decomposition is carried out in inquiry, the subquery under parallel processing different range keeps load balancing, completes Merge afterwards and returns to real-time storage as a result, realizing the flow data insertion of high-throughput and inquiring, specifically includes the following steps:
S1), receive source data and be distributed to downstream units building index structure;
S2), by index structure boil down to data block and distributed file storage system CEPHFS is written;
It S3), is several independent subqueries by query decomposition based on querying condition and data block information;
S4), the son for being distributed to the independent query processing unit in downstream by accessing distributed file storage system CEPHFS Inquiry;
S5), receive the subquery results returned and merging returns to user.
Further, step S1) in, the received each source data of flow data storage system is data element ancestral, is defined as d= {dk,dt,dr, wherein dkIt is the major key of first ancestral, dtIt is time attribute, drIt is other attribute values of first ancestral, K and T define one The two-dimensional space D=(K, T) of major key and time-domain;Major key range is fixed, and time range is continuously increased, and the section major key K is expressed as K (k-, k+), the section time-domain T are expressed as T (t-, t+), establish unique rectangle r≤K, T >={ (k, t) ∈ R according to two sections |k∈K,t∈T}。
Further, by rectangle r≤K, the data tuple write-in in T >={ (k, t) ∈ R | k ∈ K, t ∈ T } range is unique right In the template B+Tree answered, key reaches the template B+Tree of threshold value chunkSize size in memory as indexing with chunk shape Formula storage is to distributed file system, and chunk is made of key array and array of data, the key value of key storage order of array, packet Include the offset of a direction array of data.
Further, it is based on two-dimensional space D=(K, T), the querying condition of flow data storage system can be defined as one Triple q={ Kq,Tq,fq, Kq,TqThe condition range of choice on major key and time-domain, query range cutting be a r≤ K,T≥{(k,t)∈R|k∈Kq,t∈Tq},fq: t- > { true, false } is the customized condition filter function of user, is used to Judge whether the selection for meeting user.
Further, the blocks of files difference based on the storage of different subquery server S ubquery Server nodes is gentle The template B+Tree leaf node deposited is different, realizes the algorithm of query decomposition scheduling, calculates subquery server S ubquery Server carries out inquiry distribution to each untreated subquery priority query, until untreated subquery collection is combined into sky, and The leaf segment point data inquired recently is written and is cached, realizes the caching locality of inquiry distribution, data block locality and load are equal Weighing apparatus;Specific algorithm process is as follows:
To S (qi) andIt shuffles, if S (qi) preceding, then the two is spliced into new arrayWherein, subscript It is small to represent priority height, it willElement include priority be separately added into each subquery server S ubquery Server Subquery priority query in, all qiAfter all having handled, to the priority of sub- query service device Subquery Server Queue successively takes out highest priority and untreated qiIt is allocated, until all qiIt is assigned, wherein S (qi) generation Table has qiThe subquery server S ubquery Server array of range data,Represent remaining subquery server The array of Subquery Server, qi∈ q represents the subquery after one query is decomposed.
Further, step S2) in, index structure is tree index structure, and tree index structure size is being more than specified Threshold value after, the data element ancestral in leaf node is compressed by Snappy algorithm, is written in the form of data block point It is permanently stored in cloth document storage system CEPHFS, and by first ancestral's major key of data block, the relevant metadata of time-domain range Meta data manager metadata keeper is recorded;It can be become in a certain range according to flow data key major key domain Change, and time-domain can ever-increasing characteristic, the non-leaf nodes part of tree index structure is carried out to be left template, with side Just index templates are directly used in building next time, the division of the progress node as building B+ tree is avoided, when causing very big Between expense.
Further, step S3) in, query decomposition is looked into for several independent sons based on querying condition and data block information It askes, specifically includes the following steps:
S301), major key and time-domain in the querying condition that query scheduling device query dispatcher is provided according to user Range, the data block metadata information read in meta data manager (metadata keeper) compares, by query region It is divided into a series of two-dimensional index regions;
S302), the equivalent Rule of judgment provided based on user, is filtered out by Bloom filter bloomFilter method Certain subquery region for not containing target data member ancestral;
S303), the independent subquery server in downstream will likely be only distributed to containing the subquery of target data member ancestral Subquery Server。
Further, step S4) in, it is distributed to that downstream is independent to be looked by accessing distributed file storage system CEPHFS The subquery of processing unit is ask, specifically includes the following steps:
S401), subquery server S ubquery Server read parallel in distributed file storage system CephFs with The corresponding data block of subquery, the template part of index structure, obtains leaf node for all leaf nodes in first read block Opposite offset and it is packed compressed after offset, be calculated may include target key range a series of leaf nodes offset;
S402), the leaf node part based on index structure in offset read block file, passes through Snappy algorithm solution Obtained leaf node packet data block byte is pressed, is deserialized as leaf node, and do the filtering in time range and equivalence condition;
S403), aggregate operation is carried out to filtered volume of data member ancestral, inquiry is sent to after serializing and is adjusted Spend device query dispatcher.
The invention has the benefit that
1, application scenarios of the present invention are extensive, distributed stream data handling utility such as communication common carrier monitoring analysis network flow, Position networked platforms vehicle flowrate trail change, electric business platform festivals or holidays conclusion of the business index etc. in real time realize that the data of mass data are real When transmission process.
2, the present invention can be realized high speed storing, and the present invention will newly arrive data and history using efficient data division mode Interval data is opened, and using data area stability feature, by reserving index, template constructs B+Tree index, avoids tree node point Split the consumption of bring plenty of time.
3, the present invention has the characteristics that low time delay, and after carrying out range cutting to querying condition, only accessing metamessage may be accorded with The file of query context, parallel processing filtering, the key operations such as polymerization are closed, and realize caching locality and file locality, are mentioned High search efficiency.
4, load balancing of the present invention is allocated different sections to the subquery of decomposition by the query scheduling algorithm of design Point, makes full use of system resource.
Detailed description of the invention
Fig. 1 is flow diagram of the invention;
Fig. 2 is structure chart of the distributed stream datum number storage of the present invention according to block;
Fig. 3 is internal structure chart of the distributed stream datum number storage of the present invention according to block leaf node;
Fig. 4 is that distributed stream data query of the present invention decomposes scheduling graph.
Specific embodiment
Specific embodiments of the present invention will be further explained with reference to the accompanying drawing:
As shown in Figure 1, a kind of storage of distributed stream data and querying method based on Storm, the present invention is by receiving The B+Tree index for establishing several isolation ranges when distributed stream data in real time, distributed field system is arrived in storage after reaching threshold value System, and query decomposition is carried out in inquiry, the subquery under parallel processing different range keeps load balancing, merges after the completion Real-time storage is returned as a result, realizing the flow data insertion and inquiry of high-throughput, specifically includes the following steps:
S1), receive source data and be distributed to downstream units building index structure;
Wherein, the received each source data of flow data storage system is known as data element ancestral, and is defined as d={ dk,dt,dr, Wherein, dkIt is the major key of first ancestral, dtIt is time attribute, drIt is other attribute values of first ancestral, K and T define a major key and time The two-dimensional space D=(K, T) in domain;Major key range is fixed, and time range is continuously increased, and the section major key K is expressed as K (k-, k+), when Between the domain section T be expressed as T (t-, t+), establish unique rectangle according to two sections:
R≤K, T >=(k, t) ∈ R | k ∈ K, t ∈ T };
By rectangle r≤K, unique corresponding template B is written in the data tuple in T >={ (k, t) ∈ R | k ∈ K, t ∈ T } range In+Tree, key is as index, when reaching the template B+Tree of threshold value chunkSize size in memory with data block data The storage of chunk form is to distributed file system, and chunk is made of key array and array of data, key storage order of array Key value is directed toward the offset of array of data including one.
Based on two-dimensional space D=(K, T), the querying condition of flow data storage system can be defined as a triple q= {Kq,Tq,fq, Kq,TqIt is the condition range of choice on major key and time-domain, query range cutting is a r≤K, T >=(k, t)∈R|k∈Kq,t∈Tq},fq: t- > { true, false } is the customized condition filter function of user, with to determine whether Meet the selection of user.
The template B+ of blocks of files difference and caching based on the storage of different subquery server S ubquery Server nodes Tree leaf node is different, realizes the algorithm of query decomposition scheduling, calculates subquery server S ubquery Server to each Untreated subquery priority query carries out inquiry distribution, until untreated subquery collection is combined into sky, and will inquire recently Leaf segment point data write-in caching, realizes the caching locality of inquiry distribution, data block locality and load balancing;Specific algorithm mistake Journey is as follows:
To S (qi) andIt shuffles, if S (qi) preceding, then the two is spliced into new arrayWherein, subscript It is small to represent priority height, it willElement include priority be separately added into each subquery server S ubquery Server Subquery priority query in, all qiAfter all having handled, to the priority of sub- query service device Subquery Server Queue successively takes out highest priority and untreated qiIt is allocated, until all qiIt is assigned, wherein S (qi) generation Table has qiSubquery server (Subquery Server) array of range data,Represent remaining subquery server The array of Subquery Server, qi∈ q represents the subquery after one query is decomposed.
S2), by index structure boil down to data block and distributed file storage system CEPHFS is written;Wherein,
Index structure is tree index structure, and tree index structure size passes through Snappy after more than specified threshold value Algorithm compresses the data element ancestral in leaf node, and distributed file storage system is written in the form of data block It is permanently stored in CEPHFS, and by first ancestral's major key of data block, the relevant metadata record of time-domain range to metadata management Device metadata keeper;It can be changed in a certain range according to flow data key major key domain, and time-domain can be continuous The characteristic of growth carries out the non-leaf nodes part of tree index structure to be left template, straight in building next time to facilitate The division for avoiding carrying out node as building B+ tree using index templates is connect, very big time overhead is caused.
S3), it is several independent subqueries by query decomposition based on querying condition and data block information, specifically includes following Step:
S301), major key and time-domain in the querying condition that query scheduling device query dispatcher is provided according to user Range, the data block metadata information read in meta data manager metadata keeper compare, and query region is drawn It is divided into a series of two-dimensional index regions;
S302), the equivalent Rule of judgment provided based on user is filtered by Bloom filter (bloomFilter) method Fall certain subquery region for not containing target data member ancestral;
S303), the independent subquery server in downstream will likely be only distributed to containing the subquery of target data member ancestral Subquery Server。
S4), the son for being distributed to the independent query processing unit in downstream by accessing distributed file storage system CEPHFS Inquiry, specifically includes the following steps:
S401), subquery server S ubquery Server read parallel in distributed file storage system CephFs with The corresponding data block of subquery, the template part of index structure, obtains leaf node for all leaf nodes in first read block Opposite offset and it is packed compressed after offset, be calculated may include target key range a series of leaf nodes offset;
S402), the leaf node part based on index structure in offset read block file, passes through Snappy algorithm solution Obtained leaf node packet data block byte is pressed, is deserialized as leaf node, and do the filtering in time range and equivalence condition;
S403), aggregate operation is carried out to filtered volume of data member ancestral, inquiry is sent to after serializing and is adjusted Spend device query dispatcher.
S5), receive the subquery results returned and merging returns to user.
As shown in Fig. 2, the chunk file internals of flow data write-in distributed file system.Chunk contains B+ Tree template part and leaf node two parts.Template in figure represents B+Tree template part, and leaf node represents leaf Node section, compress chunk represent leaf node it is packed compressed after data block.
B+Tree template part includes root node and the internal node part of B+Tree, each nodes records key value, child Relative displacement of the column leaf node in all leaf nodes, a column leaf is also recorded in child node etc., maximum layer internal node In the offset of chunk after node is packed compressed.
Leaf node includes key array part and array of data part, and all nodes are carried out continuous by sequence from left to right Storage.When storage file, template part is written chunk as a whole, and leaf node is written in the form after packed compressed Chunk, every group of leaf node number N are set as 20, improve the problem of compression factor carrys out processing space storage.
As shown in figure 3, the leaf node partial data in flow data storage organization chunk is laid out.Data layout is by two parts Composition, one is key array, and one is array of data.Index array in figure represents key array, data array generation Table array of data.The key value of Key storage order of array, which includes the offsets that one is directed toward array of data, when search By finding the Key value and offset of eligible range in Key array, then corresponding data element is taken into array of data Ancestral.
As shown in figure 4, the algorithm of processing query decomposition scheduling can be expressed as a figure.Pending Set generation in figure All also unassigned subqueries of table, S (qi) the optimum allocation Subquery Server of each subquery is represented, The Subquery priority array of each subquery is represented, preferred server queue PreferedServer Arrays represents son Priority query of the query service device (Subquery Server) to all untreated subqueries.Pending Set is not empty When, each Subquery Server is stored in the data area in local data area and caching according to file system, right Subquery in Set carries out priority ranking will be preferred to all Subquery Server according to ID sequence after the completion of sequence Untreated subquery is allocated in server queue PreferedServer Arrays, until Pending Set is all Until subquery is all handled.
The above embodiments and description only illustrate the principle of the present invention and most preferred embodiment, is not departing from this Under the premise of spirit and range, various changes and improvements may be made to the invention, these changes and improvements both fall within requirement and protect In the scope of the invention of shield.

Claims (8)

1. a kind of storage of distributed stream data and querying method based on Storm, it is characterised in that: by receiving distributed stream The B+Tree index for establishing several isolation ranges when data in real time, storage is to distributed file system after reaching threshold value, and is looking into Query decomposition is carried out when inquiry, the subquery under parallel processing different range keeps load balancing, merges to return after the completion and deposit in real time Flow data insertion and inquiry as a result, realization high-throughput are stored up, specifically includes the following steps:
S1), receive source data and be distributed to downstream units building index structure;
S2), by index structure boil down to data block and distributed file storage system CEPHFS is written;
It S3), is several independent subqueries by query decomposition based on querying condition and data block information;
S4), the subquery for being distributed to the independent query processing unit in downstream by accessing distributed file storage system CEPHFS;
S5), receive the subquery results returned and merging returns to user.
2. a kind of storage of distributed stream data and querying method, feature based on Storm according to claim 1 exists In: in step S1), the received each source data of flow data storage system is data element ancestral, is defined as d={ dk,dt,dr, In, dkIt is the major key of first ancestral, dtIt is time attribute, drIt is other attribute values of first ancestral, K and T define a major key and time-domain Two-dimensional space D=(K, T);Major key range is fixed, and time range is continuously increased, and the section major key K is expressed as K (k-, k+), time The domain section T is expressed as T (t-, t+), establishes unique rectangle r≤K according to two sections, and T >=(k, t) ∈ R | k ∈ K, t ∈ T }.
3. a kind of storage of distributed stream data and querying method, feature based on Storm according to claim 2 exists In: by rectangle r≤K, unique corresponding template B+Tree is written in the data tuple in T >={ (k, t) ∈ R | k ∈ K, t ∈ T } range In, as indexing, the template B+Tree that threshold value chunkSize size is reached in memory is stored in the form of chunk to distribution key File system, chunk are made of key array and array of data, the key value of key storage order of array, including a direction data The offset of array.
4. a kind of storage of distributed stream data and querying method, feature based on Storm according to claim 3 exists In: it is based on two-dimensional space D=(K, T), the querying condition of flow data storage system can be defined as a triple q={ Kq,Tq, fq, Kq,TqIt is the condition range of choice on major key and time-domain, query range cutting is a r≤K, T >=(k, t) ∈ R | k∈Kq,t∈Tq},fq: t- > { true, false } is the customized condition filter function of user, and use is used to determine whether meeting The selection at family.
5. a kind of storage of distributed stream data and querying method, feature based on Storm according to claim 4 exists In: the template B+ of blocks of files difference and caching based on the storage of different subquery servers (Subquery Server) node Tree leaf node is different, realizes the algorithm of query decomposition scheduling, calculates subquery server S ubquery Server to each Untreated subquery priority query carries out inquiry distribution, until the inquiry leaf that untreated subquery collection is combined into sky and will look into recently Node data write-in caching, realizes the caching locality of inquiry distribution, data block locality and load balancing;Specific algorithm process It is as follows:
To S (qi) andIt shuffles, if S (qi) preceding, then the two is spliced into new arrayWherein, subscript small generation Table priority is high, willElement include that priority is separately added into the son of each subquery server S ubquery Server In Query priority queue, all qiAfter all having handled, to the priority query of sub- query service device Subquery Server Successively take out highest priority and untreated qiIt is allocated, until all qiIt is assigned, wherein S (qi) represent and deposit There is qiThe subquery server S ubquery Server array of range data,Represent remaining subquery server The array of Subquery Server, qi∈ q represents the subquery after one query is decomposed.
6. a kind of storage of distributed stream data and querying method, feature based on Storm according to claim 1 exists In: in step S2), index structure is tree index structure, and tree index structure size passes through after more than specified threshold value Snappy algorithm compresses the data element ancestral in leaf node, and distributed document storage system is written in the form of data block It is permanently stored in system CEPHFS, and by first ancestral's major key of data block, the relevant metadata record of time-domain range to metadata pipe Manage device metadata keeper.
7. a kind of storage of distributed stream data and querying method, feature based on Storm according to claim 1 exists In: by query decomposition be several independent subqueries based on querying condition and data block information in step S3), specifically include with Lower step:
S301), major key and time-domain range in the querying condition that query scheduling device query dispatcher is provided according to user, The data block metadata information read in meta data manager metadata keeper compares, and query region is divided into A series of two-dimensional index regions;
S302), the equivalent Rule of judgment provided based on user, is filtered out centainly by Bloom filter bloomFilter method Subquery region without containing target data member ancestral;
S303), the independent subquery server in downstream will likely be only distributed to containing the subquery of target data member ancestral Subquery Server。
8. a kind of storage of distributed stream data and querying method, feature based on Storm according to claim 1 exists In: in step S4), the son of the independent query processing unit in downstream is distributed to by access distributed file storage system CEPHFS Inquiry, specifically includes the following steps:
S401), subquery server S ubquery Server is read parallel in distributed file storage system CephFs looks into son Corresponding data block is ask, the template part of index structure in first read block obtains leaf node for the phase of all leaf nodes To offset and it is packed compressed after offset, be calculated may include target key range a series of leaf nodes offset;
S402), the leaf node part based on index structure in offset read block file, is decompressed by Snappy algorithm The leaf node packet data block byte arrived, is deserialized as leaf node, and do the filtering in time range and equivalence condition;
S403), aggregate operation is carried out to filtered volume of data member ancestral, query scheduling device is sent to after serializing (query dispatcher)。
CN201910026601.2A 2019-01-11 2019-01-11 Storm-based distributed stream data storage and query method Active CN109726225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910026601.2A CN109726225B (en) 2019-01-11 2019-01-11 Storm-based distributed stream data storage and query method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910026601.2A CN109726225B (en) 2019-01-11 2019-01-11 Storm-based distributed stream data storage and query method

Publications (2)

Publication Number Publication Date
CN109726225A true CN109726225A (en) 2019-05-07
CN109726225B CN109726225B (en) 2023-08-01

Family

ID=66299136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910026601.2A Active CN109726225B (en) 2019-01-11 2019-01-11 Storm-based distributed stream data storage and query method

Country Status (1)

Country Link
CN (1) CN109726225B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110515990A (en) * 2019-07-23 2019-11-29 华信永道(北京)科技股份有限公司 Data query methods of exhibiting and inquiry display systems
CN111241099A (en) * 2020-01-09 2020-06-05 佛山科学技术学院 Industrial big data storage method and device
CN111310230A (en) * 2020-02-10 2020-06-19 腾讯云计算(北京)有限责任公司 Spatial data processing method, device, equipment and medium
WO2020248150A1 (en) * 2019-06-12 2020-12-17 Alibaba Group Holding Limited Method and system for answering multi-dimensional analytical queries under local differential privacy
CN115563103A (en) * 2022-09-15 2023-01-03 河南星环众志信息科技有限公司 Multi-dimensional aggregation method, system, electronic device and storage medium
CN116244313A (en) * 2023-05-08 2023-06-09 北京四维纵横数据技术有限公司 JSON data storage and access method, device, computer equipment and medium
CN117076466A (en) * 2023-10-18 2023-11-17 河北因朵科技有限公司 Rapid data indexing method for large archive database
CN117689451A (en) * 2024-01-31 2024-03-12 浙江大学 Flink-based stream vector search method, device and system
CN117689451B (en) * 2024-01-31 2024-04-26 浙江大学 Flink-based stream vector search method, device and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678520A (en) * 2013-11-29 2014-03-26 中国科学院计算技术研究所 Multi-dimensional interval query method and system based on cloud computing
US20140172867A1 (en) * 2012-12-17 2014-06-19 General Electric Company Method for storage, querying, and analysis of time series data
CN105589951A (en) * 2015-12-18 2016-05-18 中国科学院计算机网络信息中心 Distributed type storage method and parallel query method for mass remote-sensing image metadata
CN107357659A (en) * 2017-07-04 2017-11-17 东北大学 Towards the group technology and querying method of Storm successive ranges inquiry GSLB

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140172867A1 (en) * 2012-12-17 2014-06-19 General Electric Company Method for storage, querying, and analysis of time series data
CN103678520A (en) * 2013-11-29 2014-03-26 中国科学院计算技术研究所 Multi-dimensional interval query method and system based on cloud computing
CN105589951A (en) * 2015-12-18 2016-05-18 中国科学院计算机网络信息中心 Distributed type storage method and parallel query method for mass remote-sensing image metadata
CN107357659A (en) * 2017-07-04 2017-11-17 东北大学 Towards the group technology and querying method of Storm successive ranges inquiry GSLB

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱东升等: "基于Hadoop平台的地铁NCC数据中心方案研究", 《计算机测量与控制》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020248150A1 (en) * 2019-06-12 2020-12-17 Alibaba Group Holding Limited Method and system for answering multi-dimensional analytical queries under local differential privacy
CN110515990A (en) * 2019-07-23 2019-11-29 华信永道(北京)科技股份有限公司 Data query methods of exhibiting and inquiry display systems
CN111241099A (en) * 2020-01-09 2020-06-05 佛山科学技术学院 Industrial big data storage method and device
CN111310230A (en) * 2020-02-10 2020-06-19 腾讯云计算(北京)有限责任公司 Spatial data processing method, device, equipment and medium
CN111310230B (en) * 2020-02-10 2023-04-14 腾讯云计算(北京)有限责任公司 Spatial data processing method, device, equipment and medium
CN115563103A (en) * 2022-09-15 2023-01-03 河南星环众志信息科技有限公司 Multi-dimensional aggregation method, system, electronic device and storage medium
CN115563103B (en) * 2022-09-15 2023-12-08 河南星环众志信息科技有限公司 Multi-dimensional aggregation method, system, electronic equipment and storage medium
CN116244313A (en) * 2023-05-08 2023-06-09 北京四维纵横数据技术有限公司 JSON data storage and access method, device, computer equipment and medium
CN117076466A (en) * 2023-10-18 2023-11-17 河北因朵科技有限公司 Rapid data indexing method for large archive database
CN117076466B (en) * 2023-10-18 2023-12-29 河北因朵科技有限公司 Rapid data indexing method for large archive database
CN117689451A (en) * 2024-01-31 2024-03-12 浙江大学 Flink-based stream vector search method, device and system
CN117689451B (en) * 2024-01-31 2024-04-26 浙江大学 Flink-based stream vector search method, device and system

Also Published As

Publication number Publication date
CN109726225B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN109726225A (en) A kind of storage of distributed stream data and querying method based on Storm
US6438562B1 (en) Parallel index maintenance
CN106528773B (en) Map computing system and method based on Spark platform supporting spatial data management
CN110162528A (en) Magnanimity big data search method and system
CN104850572A (en) HBase non-primary key index building and inquiring method and system
CN106294352B (en) A kind of document handling method, device and file system
Zhang et al. Trajspark: A scalable and efficient in-memory management system for big trajectory data
CN102402617A (en) Easily compressed database index storage system using fragments and sparse bitmap, and corresponding construction, scheduling and query processing methods
CN110287391A (en) Multi-level trajectory data storage method, storage medium and terminal based on Hadoop
CN103793493B (en) A kind of method and system for handling car-mounted terminal mass data
CN108804602A (en) A kind of distributed spatial data storage computational methods based on SPARK
CN106528787A (en) Mass data multi-dimensional analysis-based query method and device
CN108920552A (en) A kind of distributed index method towards multi-source high amount of traffic
Lee et al. Efficient processing of multiple continuous skyline queries over a data stream
CN107193898A (en) The inquiry sharing method and system of log data stream based on stepped multiplexing
Du et al. Spatio-temporal data index model of moving objects on fixed networks using hbase
US20220253419A1 (en) Multi-record index structure for key-value stores
Zhang et al. Aggregate keyword nearest neighbor queries on road networks
CN107704475A (en) Multilayer distributed unstructured data storage method, querying method and device
CN110059149A (en) Electronic map spatial key Querying Distributed directory system and method
CN109726219A (en) The method and terminal device of data query
Wang et al. Waterwheel: Realtime indexing and temporal range query processing over massive data streams
CN108733781B (en) Cluster temporal data indexing method based on memory calculation
Jiang et al. MOIST: a scalable and parallel moving object indexer with school tracking
CN103020300B (en) Method and device for information retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant