CN112650756B - Time projection indexing method and system based on time sequence data - Google Patents

Time projection indexing method and system based on time sequence data Download PDF

Info

Publication number
CN112650756B
CN112650756B CN202011590816.6A CN202011590816A CN112650756B CN 112650756 B CN112650756 B CN 112650756B CN 202011590816 A CN202011590816 A CN 202011590816A CN 112650756 B CN112650756 B CN 112650756B
Authority
CN
China
Prior art keywords
time
data
index
projection
statistical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011590816.6A
Other languages
Chinese (zh)
Other versions
CN112650756A (en
Inventor
王勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kelai Network Technology Co ltd
Original Assignee
Kelai Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kelai Network Technology Co ltd filed Critical Kelai Network Technology Co ltd
Priority to CN202011590816.6A priority Critical patent/CN112650756B/en
Publication of CN112650756A publication Critical patent/CN112650756A/en
Application granted granted Critical
Publication of CN112650756B publication Critical patent/CN112650756B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an indexing method and an indexing system for time projection based on time sequence data. The indexing method comprises the following steps: grabbing a group of data packets, carrying out aggregation statistics on the data packets according to quadruples, generating statistical data, and storing the statistical data into a disk according to a period; constructing a time projection index according to the statistical data, and storing the time projection index into a disk according to the period; when the statistical data is searched, firstly, a time projection index of a search time range is read, then an effective time range is obtained according to the index and the search time, then the statistical data is read in the effective time range, and finally, the read statistical data is filtered to obtain target data. By the method and the system, invalid time data is filtered, reading of the invalid data is reduced, and the effective utilization rate of the read data is improved.

Description

Time projection indexing method and system based on time sequence data
Technical Field
The invention belongs to the technical field of data storage and data retrieval, and particularly relates to an indexing method and an indexing system for time projection based on time sequence data.
Background
In the network statistics engineering, a plurality of statistics tables are generated, and the statistics tables are divided into full data query and data retrieval in the process of querying the statistics tables. The former is to query all data of the statistical table at the time point, and the latter is to search the data of the statistical table at the time point which meets a certain condition.
For example, it is necessary to retrieve a certain IP address from the IP session table in a certain time range.
The current general technical scheme is as follows:
the time is traversed firstly, the time required to be searched is found, then the data of the corresponding time is read out from the disk, and the wanted data is filtered according to the search condition.
The prior art has a very fatal disadvantage that the time for retrieving the data is too long and the efficiency is too low. The main reason is that most of the data read is data that we want to retrieve, resulting in a particularly low effective utilization of the data.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an index method and a system for time projection based on time sequence data.
The aim of the invention is achieved by the following technical scheme:
in one aspect, the invention discloses an indexing method of time projection based on time sequence data, the indexing method comprises the following steps: grabbing a group of data packets, carrying out aggregation statistics on the data packets according to quadruples, generating statistical data, and storing the statistical data into a disk according to a period; constructing a time projection index according to the statistical data, and storing the time projection index into a disk according to the period; when the statistical data is searched, firstly, a time projection index of a search time range is read, then an effective time range is obtained according to the index and the search time, then the statistical data is read in the effective time range, and finally, the read statistical data is filtered to obtain target data.
On the other hand, the invention also discloses an index system of time projection based on time sequence data, which comprises an analysis server, wherein the analysis server comprises an acquisition port, an analysis module, a query module, a data packet acquisition module and a storage module; the data packet acquisition module is configured to grab a data source based on the acquisition port and generate statistical data; the analysis module is configured to construct a time projection index based on the statistical data; the storage module is configured to store corresponding index data after the index of each statistical period is generated; the query module is configured to read target data based on the read time projection index and the filtered effective time according to the result of the index.
According to a preferred embodiment, the process of capturing the data source specifically includes finding an egress switch in the network and mirroring the traffic to an analysis server, which completes capturing the data source through a collection port.
According to a preferred embodiment, the generation of the statistical data comprises aggregation statistics of the obtained data sources according to preset rules and obtaining statistical results.
According to a preferred embodiment, the preset rule is not limited to a four-tuple ip1+ip2+port1+port2 rule or a two-tuple ip1+ip2 rule.
According to a preferred embodiment, the construction of the time projection index comprises: and selecting a time period as a statistical period of time projection, selecting a time scale of a time projection index, and completing bit setting of statistical data on the whole statistical period based on the time scale to form the projection index on the statistical period.
According to a preferred embodiment, the process of reading the target data comprises: and judging the time required to be searched by the search condition and the projection index of the time range to be intersected to obtain an effective time range, then reading data from a disk by using the effective time, and then filtering the read data according to the condition to obtain the data required to be searched.
According to a preferred embodiment, the storage module is not limited to being constituted by a magnetic disk.
The foregoing inventive subject matter and various further alternatives thereof may be freely combined to form a plurality of alternatives, all of which are employable and claimed herein; and the invention can be freely combined between the (non-conflicting choices) choices and between the choices and other choices. Various combinations will be apparent to those skilled in the art from a review of the present disclosure, and are not intended to be exhaustive or all of the present disclosure.
The invention has the beneficial effects that: in the time projection indexing method and system based on time sequence data, the statistical period and the time scale can be changed in the construction mode of the time projection index. When retrieving data, acceleration is based on the time projection index. Therefore, the method and the system have higher effective utilization rate of the read data than the prior art. While the prior art needs to read all time ranges, the invention only needs to read the time points when the search condition exists.
Drawings
FIG. 1 is a schematic diagram of an indexing system based on time projection of time series data according to the present invention;
FIG. 2 is a schematic diagram of the target data reading process performed by the time projection based indexing system of the present invention;
fig. 3 is a flow chart of the indexing method based on time projection of time series data.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It should be noted that, for the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions in the embodiments of the present invention are clearly and completely described below, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments.
Thus, the following detailed description of the embodiments of the invention is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1:
referring to fig. 1 and 2, the invention discloses an index system for time projection based on time sequence data.
Preferably, the index system comprises an analysis server, wherein the analysis server comprises a collection port, an analysis module, a query module, a data packet collection module and a storage module.
Preferably, the data packet acquisition module is configured to perform data source grabbing based on the acquisition port and generate statistical data.
Further, the capturing process of the data source specifically includes finding out an outlet switch in the network, mirroring the flow to an analysis server, and completing capturing of the data source by the analysis server through a collection port. The generation of the statistical data comprises the steps of carrying out aggregation statistics on the obtained data sources according to the quadruple and obtaining a statistical result.
Preferably, the analysis module is configured to construct a time projection index based on the statistical data.
Further, the constructing of the time projection index includes: and selecting a time period as a statistical period of time projection, selecting a time scale of a time projection index, and completing bit setting of statistical data on the whole statistical period based on the time scale to form the projection index on the statistical period.
For example, the scheme for constructing the time projection index may be:
a time period is selected which is a statistical period of the time projection index, where 1 hour is selected first. The time scale of the time projection index is selected, where 1 second is first selected.
In the process of network data statistics, an IP session table is taken here as an example. In the IP session table generated every 1 second, the IP addresses present in the 1 second IP session table are extracted, each IP address maintains an array of 450 bytes, the current time is converted to the position of the 1 second in the hour, and then the bit of the position is set to 1, which indicates that the IP address is present in the 1 second of the hour. All IP addresses for this 1 second are traversed and the bit of the corresponding location is set.
The process is then repeated for the next 1 second until all 1 second of this hour has been processed. Then, a time projection index of this hour is formed. The index is written to disk. The procedure was repeated every hour thereafter as per the above procedure.
After a statistical period is completed, the layout of the time projection index in memory (1 indicates present, 0 indicates absent):
layout of 1 hour statistical cycle time projection index in memory
Second 1 Second 2 Second 3 3599 second 3600 th second
IP1 0 1 1 0 0
IP2 1 0 0 1 1
IPN 1 1 0 0 0
Preferably, the storage module is configured to store the corresponding index data after the index generation of each statistical period.
Further, the storage module is not limited to being constituted by a magnetic disk.
Preferably, the query module is configured to perform reading of the target data based on the read time projection index and the effective time filtered out according to the result of the index.
Further, the reading process of the target data includes: and judging the intersection of the time required for searching the search condition and the projection index of the time range to obtain an effective time range, as shown in fig. 2. Then, the data is read from the disk by using the effective time, and then the data to be retrieved is obtained by filtering the read data according to the conditions.
As shown in fig. 3, the invention also discloses an indexing method of time projection based on time sequence data.
Preferably, the indexing method comprises:
and grabbing a group of data packets, carrying out aggregation statistics on the data packets according to the quadruple, generating statistical data, and storing the statistical data into a disk according to a period.
And constructing a time projection index according to the statistical data, and storing the time projection index into a disk according to the period.
When the statistical data is searched, firstly, a time projection index of a search time range is read, then an effective time range is obtained according to the index and the search time, then the statistical data is read in the effective time range, and finally, the read statistical data is filtered to obtain target data.
In the time projection indexing method and system based on time sequence data, the statistical period and the time scale can be changed in the construction mode of the time projection index. When retrieving data, acceleration is based on the time projection index. Therefore, the method and the system have higher effective utilization rate of the read data than the prior art. While the prior art needs to read all time ranges, the invention only needs to read the time points when the search condition exists.
The foregoing basic embodiments of the invention, as well as other embodiments of the invention, can be freely combined to form numerous embodiments, all of which are contemplated and claimed. In the scheme of the invention, each selection example can be arbitrarily combined with any other basic example and selection example. Numerous combinations will be apparent to those skilled in the art.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (7)

1. An indexing method of time projection based on time series data, characterized in that the indexing method comprises:
grabbing a group of data packets, carrying out aggregation statistics on the data packets according to quadruples, generating statistical data, and storing the statistical data into a disk according to a period;
constructing a time projection index according to the statistical data, and storing the time projection index into a disk according to the period;
the construction of the time projection index includes:
selecting a time period as a statistical period of time projection, selecting a time scale of a time projection index, and completing bit setting of statistical data on the whole statistical period based on the time scale to form the projection index on the statistical period;
when the statistical data is searched, firstly, a time projection index of a search time range is read, then an effective time range is obtained according to the index and the search time, then the statistical data is read in the effective time range, and finally, the read statistical data is filtered to obtain target data.
2. The index system based on time projection of time sequence data is characterized by comprising an analysis server, wherein the analysis server comprises an acquisition port, an analysis module, a query module, a data packet acquisition module and a storage module;
the data packet acquisition module is configured to grab a data source based on the acquisition port and generate statistical data; the analysis module is configured to construct a time projection index based on the statistical data; the storage module is configured to store corresponding index data after the index of each statistical period is generated; the query module is configured to read target data based on the read time projection index and the effective time filtered according to the result of the index;
the construction of the time projection index includes:
and selecting a time period as a statistical period of time projection, selecting a time scale of a time projection index, and completing bit setting of statistical data on the whole statistical period based on the time scale to form the projection index on the statistical period.
3. The time projection based indexing system of claim 2, wherein the data source grabbing process specifically includes finding an egress switch in the network and mirroring traffic to an analysis server, which completes the data source grabbing through the collection port.
4. The time projection based indexing system of claim 3, wherein the generation of statistics comprises aggregating obtained data sources according to a predetermined rule and obtaining statistics.
5. The time projection based indexing system of claim 4, wherein the predetermined rule is not limited to a four-tuple ip1+ip2+port1+port2 rule or a two-tuple ip1+ip2 rule.
6. The time-projection-based indexing system of claim 4, wherein performing the reading of the target data comprises:
and judging the time required to be searched by the search condition and the projection index of the time range to be intersected to obtain an effective time range, then reading data from a disk by using the effective time, and then filtering the read data according to the condition to obtain the data required to be searched.
7. The time-projection based indexing system of claim 2, wherein the storage module is not limited to being comprised of magnetic disks.
CN202011590816.6A 2020-12-29 2020-12-29 Time projection indexing method and system based on time sequence data Active CN112650756B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011590816.6A CN112650756B (en) 2020-12-29 2020-12-29 Time projection indexing method and system based on time sequence data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011590816.6A CN112650756B (en) 2020-12-29 2020-12-29 Time projection indexing method and system based on time sequence data

Publications (2)

Publication Number Publication Date
CN112650756A CN112650756A (en) 2021-04-13
CN112650756B true CN112650756B (en) 2023-05-02

Family

ID=75363641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011590816.6A Active CN112650756B (en) 2020-12-29 2020-12-29 Time projection indexing method and system based on time sequence data

Country Status (1)

Country Link
CN (1) CN112650756B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573703B (en) * 2024-01-16 2024-04-09 科来网络技术股份有限公司 Universal retrieval method, system, equipment and storage medium for time sequence data

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117502A (en) * 2015-10-13 2015-12-02 四川中科腾信科技有限公司 Search method based on big data
CN106446028A (en) * 2016-08-31 2017-02-22 成都科来软件有限公司 Novel index system of network conversation package
CN108509592A (en) * 2018-03-30 2018-09-07 贵阳朗玛信息技术股份有限公司 Date storage method, read method based on Redis and device
CN108563711A (en) * 2018-03-28 2018-09-21 山东昭元信息科技有限公司 A kind of time series data storage method based on timing node
CN109902088A (en) * 2019-02-13 2019-06-18 北京航空航天大学 A kind of data index method towards streaming time series data
KR102102313B1 (en) * 2019-11-27 2020-04-20 주식회사 리얼타임테크 System for Managing TimeSeries data in In-Memory Database
CN111913950A (en) * 2019-05-10 2020-11-10 上海顶岩自动化工程有限公司 Event index analysis system for time sequence data storage
CN112115361A (en) * 2020-09-17 2020-12-22 浪潮卓数大数据产业发展有限公司 Data retrieval optimization method and system based on elastic search

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106033324B (en) * 2015-03-09 2020-03-06 杭州海康威视数字技术股份有限公司 Data storage method and device
CN106326464B (en) * 2016-08-31 2019-09-10 成都科来软件有限公司 A kind of network session packet indexing means based on retrieval information projection
US20190079943A1 (en) * 2017-09-11 2019-03-14 Blackfynn Inc. Real time and retrospective query integration
CN110209686A (en) * 2018-02-22 2019-09-06 北京嘀嘀无限科技发展有限公司 Storage, querying method and the device of data
CN109471905B (en) * 2018-11-16 2020-08-25 华东师范大学 Block chain indexing method supporting time range and attribute range compound query
CN110046183A (en) * 2019-04-16 2019-07-23 北京易沃特科技有限公司 A kind of time series data polymerization search method, equipment and medium
CN110134723A (en) * 2019-05-22 2019-08-16 网易(杭州)网络有限公司 A kind of method and database of storing data
CN111552687B (en) * 2020-03-10 2023-08-04 远景智能国际私人投资有限公司 Time sequence data storage method, query method, device, equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117502A (en) * 2015-10-13 2015-12-02 四川中科腾信科技有限公司 Search method based on big data
CN106446028A (en) * 2016-08-31 2017-02-22 成都科来软件有限公司 Novel index system of network conversation package
CN108563711A (en) * 2018-03-28 2018-09-21 山东昭元信息科技有限公司 A kind of time series data storage method based on timing node
CN108509592A (en) * 2018-03-30 2018-09-07 贵阳朗玛信息技术股份有限公司 Date storage method, read method based on Redis and device
CN109902088A (en) * 2019-02-13 2019-06-18 北京航空航天大学 A kind of data index method towards streaming time series data
CN111913950A (en) * 2019-05-10 2020-11-10 上海顶岩自动化工程有限公司 Event index analysis system for time sequence data storage
KR102102313B1 (en) * 2019-11-27 2020-04-20 주식회사 리얼타임테크 System for Managing TimeSeries data in In-Memory Database
CN112115361A (en) * 2020-09-17 2020-12-22 浪潮卓数大数据产业发展有限公司 Data retrieval optimization method and system based on elastic search

Also Published As

Publication number Publication date
CN112650756A (en) 2021-04-13

Similar Documents

Publication Publication Date Title
CN108376143B (en) Novel OLAP pre-calculation system and method for generating pre-calculation result
CN108446349B (en) GIS abnormal data detection method
CN105677683B (en) Batch data querying method and device
CN102622434B (en) Data storage method, data searching method and device
Wu FastBit: an efficient indexing technology for accelerating data-intensive science
CN106528787B (en) query method and device based on multidimensional analysis of mass data
CN110825733B (en) Multi-sampling-stream-oriented time series data management method and system
CN106709851B (en) Big data retrieval method and device
CN107766445B (en) Efficient and rapid data retrieval method supporting multi-dimensional retrieval
CN102222099A (en) Methods and devices for storing and searching data
CN112650756B (en) Time projection indexing method and system based on time sequence data
CN110597852A (en) Data processing method, device, terminal and storage medium
CN104765754A (en) Data storage method and device
CN111090705A (en) Multidimensional data processing method, multidimensional data processing device, multidimensional data processing equipment and storage medium
Gou et al. Graph stream sketch: Summarizing graph streams with high speed and accuracy
CN109800228B (en) Method for efficiently and quickly solving hash conflict
CN113468080B (en) Caching method, system and related device for full-flash metadata
CN110825744B (en) Cluster environment-based air quality monitoring big data partition storage method
CN104301182B (en) A kind of querying method and device of the exception information of website visiting at a slow speed
CN113268636A (en) Rapid retrieval method and device based on time sequence data
CN113360551B (en) Method and system for storing and rapidly counting time sequence data in shooting range
CN110765128A (en) Optimized storage method based on large-scale GPS data
CN112988846B (en) Flow real-time statistical method and engine based on absolute time sliding window
CN114077581A (en) Database based on data aggregation storage mode
CN113343034A (en) IP searching method, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 610041 12th, 13th and 14th floors, unit 1, building 4, No. 966, north section of Tianfu Avenue, Chengdu hi tech Zone, China (Sichuan) pilot Free Trade Zone, Chengdu, Sichuan

Applicant after: Kelai Network Technology Co.,Ltd.

Address before: 41401-41406, 14th floor, unit 1, building 4, No. 966, north section of Tianfu Avenue, Chengdu hi tech Zone, Chengdu Free Trade Zone, Sichuan 610041

Applicant before: Chengdu Kelai Network Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant