CN112650756B - Time projection indexing method and system based on time sequence data - Google Patents
Time projection indexing method and system based on time sequence data Download PDFInfo
- Publication number
- CN112650756B CN112650756B CN202011590816.6A CN202011590816A CN112650756B CN 112650756 B CN112650756 B CN 112650756B CN 202011590816 A CN202011590816 A CN 202011590816A CN 112650756 B CN112650756 B CN 112650756B
- Authority
- CN
- China
- Prior art keywords
- time
- data
- index
- projection
- statistical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2477—Temporal data queries
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Fuzzy Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an indexing method and an indexing system for time projection based on time sequence data. The indexing method comprises the following steps: grabbing a group of data packets, carrying out aggregation statistics on the data packets according to quadruples, generating statistical data, and storing the statistical data into a disk according to a period; constructing a time projection index according to the statistical data, and storing the time projection index into a disk according to the period; when the statistical data is searched, firstly, a time projection index of a search time range is read, then an effective time range is obtained according to the index and the search time, then the statistical data is read in the effective time range, and finally, the read statistical data is filtered to obtain target data. By the method and the system, invalid time data is filtered, reading of the invalid data is reduced, and the effective utilization rate of the read data is improved.
Description
Technical Field
The invention belongs to the technical field of data storage and data retrieval, and particularly relates to an indexing method and an indexing system for time projection based on time sequence data.
Background
In the network statistics engineering, a plurality of statistics tables are generated, and the statistics tables are divided into full data query and data retrieval in the process of querying the statistics tables. The former is to query all data of the statistical table at the time point, and the latter is to search the data of the statistical table at the time point which meets a certain condition.
For example, it is necessary to retrieve a certain IP address from the IP session table in a certain time range.
The current general technical scheme is as follows:
the time is traversed firstly, the time required to be searched is found, then the data of the corresponding time is read out from the disk, and the wanted data is filtered according to the search condition.
The prior art has a very fatal disadvantage that the time for retrieving the data is too long and the efficiency is too low. The main reason is that most of the data read is data that we want to retrieve, resulting in a particularly low effective utilization of the data.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an index method and a system for time projection based on time sequence data.
The aim of the invention is achieved by the following technical scheme:
in one aspect, the invention discloses an indexing method of time projection based on time sequence data, the indexing method comprises the following steps: grabbing a group of data packets, carrying out aggregation statistics on the data packets according to quadruples, generating statistical data, and storing the statistical data into a disk according to a period; constructing a time projection index according to the statistical data, and storing the time projection index into a disk according to the period; when the statistical data is searched, firstly, a time projection index of a search time range is read, then an effective time range is obtained according to the index and the search time, then the statistical data is read in the effective time range, and finally, the read statistical data is filtered to obtain target data.
On the other hand, the invention also discloses an index system of time projection based on time sequence data, which comprises an analysis server, wherein the analysis server comprises an acquisition port, an analysis module, a query module, a data packet acquisition module and a storage module; the data packet acquisition module is configured to grab a data source based on the acquisition port and generate statistical data; the analysis module is configured to construct a time projection index based on the statistical data; the storage module is configured to store corresponding index data after the index of each statistical period is generated; the query module is configured to read target data based on the read time projection index and the filtered effective time according to the result of the index.
According to a preferred embodiment, the process of capturing the data source specifically includes finding an egress switch in the network and mirroring the traffic to an analysis server, which completes capturing the data source through a collection port.
According to a preferred embodiment, the generation of the statistical data comprises aggregation statistics of the obtained data sources according to preset rules and obtaining statistical results.
According to a preferred embodiment, the preset rule is not limited to a four-tuple ip1+ip2+port1+port2 rule or a two-tuple ip1+ip2 rule.
According to a preferred embodiment, the construction of the time projection index comprises: and selecting a time period as a statistical period of time projection, selecting a time scale of a time projection index, and completing bit setting of statistical data on the whole statistical period based on the time scale to form the projection index on the statistical period.
According to a preferred embodiment, the process of reading the target data comprises: and judging the time required to be searched by the search condition and the projection index of the time range to be intersected to obtain an effective time range, then reading data from a disk by using the effective time, and then filtering the read data according to the condition to obtain the data required to be searched.
According to a preferred embodiment, the storage module is not limited to being constituted by a magnetic disk.
The foregoing inventive subject matter and various further alternatives thereof may be freely combined to form a plurality of alternatives, all of which are employable and claimed herein; and the invention can be freely combined between the (non-conflicting choices) choices and between the choices and other choices. Various combinations will be apparent to those skilled in the art from a review of the present disclosure, and are not intended to be exhaustive or all of the present disclosure.
The invention has the beneficial effects that: in the time projection indexing method and system based on time sequence data, the statistical period and the time scale can be changed in the construction mode of the time projection index. When retrieving data, acceleration is based on the time projection index. Therefore, the method and the system have higher effective utilization rate of the read data than the prior art. While the prior art needs to read all time ranges, the invention only needs to read the time points when the search condition exists.
Drawings
FIG. 1 is a schematic diagram of an indexing system based on time projection of time series data according to the present invention;
FIG. 2 is a schematic diagram of the target data reading process performed by the time projection based indexing system of the present invention;
fig. 3 is a flow chart of the indexing method based on time projection of time series data.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It should be noted that, for the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions in the embodiments of the present invention are clearly and completely described below, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments.
Thus, the following detailed description of the embodiments of the invention is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1:
referring to fig. 1 and 2, the invention discloses an index system for time projection based on time sequence data.
Preferably, the index system comprises an analysis server, wherein the analysis server comprises a collection port, an analysis module, a query module, a data packet collection module and a storage module.
Preferably, the data packet acquisition module is configured to perform data source grabbing based on the acquisition port and generate statistical data.
Further, the capturing process of the data source specifically includes finding out an outlet switch in the network, mirroring the flow to an analysis server, and completing capturing of the data source by the analysis server through a collection port. The generation of the statistical data comprises the steps of carrying out aggregation statistics on the obtained data sources according to the quadruple and obtaining a statistical result.
Preferably, the analysis module is configured to construct a time projection index based on the statistical data.
Further, the constructing of the time projection index includes: and selecting a time period as a statistical period of time projection, selecting a time scale of a time projection index, and completing bit setting of statistical data on the whole statistical period based on the time scale to form the projection index on the statistical period.
For example, the scheme for constructing the time projection index may be:
a time period is selected which is a statistical period of the time projection index, where 1 hour is selected first. The time scale of the time projection index is selected, where 1 second is first selected.
In the process of network data statistics, an IP session table is taken here as an example. In the IP session table generated every 1 second, the IP addresses present in the 1 second IP session table are extracted, each IP address maintains an array of 450 bytes, the current time is converted to the position of the 1 second in the hour, and then the bit of the position is set to 1, which indicates that the IP address is present in the 1 second of the hour. All IP addresses for this 1 second are traversed and the bit of the corresponding location is set.
The process is then repeated for the next 1 second until all 1 second of this hour has been processed. Then, a time projection index of this hour is formed. The index is written to disk. The procedure was repeated every hour thereafter as per the above procedure.
After a statistical period is completed, the layout of the time projection index in memory (1 indicates present, 0 indicates absent):
layout of 1 hour statistical cycle time projection index in memory
Second 1 | Second 2 | Second 3 | … | 3599 second | 3600 th second | |
IP1 | 0 | 1 | 1 | … | 0 | 0 |
IP2 | 1 | 0 | 0 | … | 1 | 1 |
… | … | … | … | … | … | … |
IPN | 1 | 1 | 0 | … | 0 | 0 |
Preferably, the storage module is configured to store the corresponding index data after the index generation of each statistical period.
Further, the storage module is not limited to being constituted by a magnetic disk.
Preferably, the query module is configured to perform reading of the target data based on the read time projection index and the effective time filtered out according to the result of the index.
Further, the reading process of the target data includes: and judging the intersection of the time required for searching the search condition and the projection index of the time range to obtain an effective time range, as shown in fig. 2. Then, the data is read from the disk by using the effective time, and then the data to be retrieved is obtained by filtering the read data according to the conditions.
As shown in fig. 3, the invention also discloses an indexing method of time projection based on time sequence data.
Preferably, the indexing method comprises:
and grabbing a group of data packets, carrying out aggregation statistics on the data packets according to the quadruple, generating statistical data, and storing the statistical data into a disk according to a period.
And constructing a time projection index according to the statistical data, and storing the time projection index into a disk according to the period.
When the statistical data is searched, firstly, a time projection index of a search time range is read, then an effective time range is obtained according to the index and the search time, then the statistical data is read in the effective time range, and finally, the read statistical data is filtered to obtain target data.
In the time projection indexing method and system based on time sequence data, the statistical period and the time scale can be changed in the construction mode of the time projection index. When retrieving data, acceleration is based on the time projection index. Therefore, the method and the system have higher effective utilization rate of the read data than the prior art. While the prior art needs to read all time ranges, the invention only needs to read the time points when the search condition exists.
The foregoing basic embodiments of the invention, as well as other embodiments of the invention, can be freely combined to form numerous embodiments, all of which are contemplated and claimed. In the scheme of the invention, each selection example can be arbitrarily combined with any other basic example and selection example. Numerous combinations will be apparent to those skilled in the art.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.
Claims (7)
1. An indexing method of time projection based on time series data, characterized in that the indexing method comprises:
grabbing a group of data packets, carrying out aggregation statistics on the data packets according to quadruples, generating statistical data, and storing the statistical data into a disk according to a period;
constructing a time projection index according to the statistical data, and storing the time projection index into a disk according to the period;
the construction of the time projection index includes:
selecting a time period as a statistical period of time projection, selecting a time scale of a time projection index, and completing bit setting of statistical data on the whole statistical period based on the time scale to form the projection index on the statistical period;
when the statistical data is searched, firstly, a time projection index of a search time range is read, then an effective time range is obtained according to the index and the search time, then the statistical data is read in the effective time range, and finally, the read statistical data is filtered to obtain target data.
2. The index system based on time projection of time sequence data is characterized by comprising an analysis server, wherein the analysis server comprises an acquisition port, an analysis module, a query module, a data packet acquisition module and a storage module;
the data packet acquisition module is configured to grab a data source based on the acquisition port and generate statistical data; the analysis module is configured to construct a time projection index based on the statistical data; the storage module is configured to store corresponding index data after the index of each statistical period is generated; the query module is configured to read target data based on the read time projection index and the effective time filtered according to the result of the index;
the construction of the time projection index includes:
and selecting a time period as a statistical period of time projection, selecting a time scale of a time projection index, and completing bit setting of statistical data on the whole statistical period based on the time scale to form the projection index on the statistical period.
3. The time projection based indexing system of claim 2, wherein the data source grabbing process specifically includes finding an egress switch in the network and mirroring traffic to an analysis server, which completes the data source grabbing through the collection port.
4. The time projection based indexing system of claim 3, wherein the generation of statistics comprises aggregating obtained data sources according to a predetermined rule and obtaining statistics.
5. The time projection based indexing system of claim 4, wherein the predetermined rule is not limited to a four-tuple ip1+ip2+port1+port2 rule or a two-tuple ip1+ip2 rule.
6. The time-projection-based indexing system of claim 4, wherein performing the reading of the target data comprises:
and judging the time required to be searched by the search condition and the projection index of the time range to be intersected to obtain an effective time range, then reading data from a disk by using the effective time, and then filtering the read data according to the condition to obtain the data required to be searched.
7. The time-projection based indexing system of claim 2, wherein the storage module is not limited to being comprised of magnetic disks.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011590816.6A CN112650756B (en) | 2020-12-29 | 2020-12-29 | Time projection indexing method and system based on time sequence data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011590816.6A CN112650756B (en) | 2020-12-29 | 2020-12-29 | Time projection indexing method and system based on time sequence data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112650756A CN112650756A (en) | 2021-04-13 |
CN112650756B true CN112650756B (en) | 2023-05-02 |
Family
ID=75363641
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011590816.6A Active CN112650756B (en) | 2020-12-29 | 2020-12-29 | Time projection indexing method and system based on time sequence data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112650756B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117573703B (en) * | 2024-01-16 | 2024-04-09 | 科来网络技术股份有限公司 | Universal retrieval method, system, equipment and storage medium for time sequence data |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105117502A (en) * | 2015-10-13 | 2015-12-02 | 四川中科腾信科技有限公司 | Search method based on big data |
CN106446028A (en) * | 2016-08-31 | 2017-02-22 | 成都科来软件有限公司 | Novel index system of network conversation package |
CN108509592A (en) * | 2018-03-30 | 2018-09-07 | 贵阳朗玛信息技术股份有限公司 | Date storage method, read method based on Redis and device |
CN108563711A (en) * | 2018-03-28 | 2018-09-21 | 山东昭元信息科技有限公司 | A kind of time series data storage method based on timing node |
CN109902088A (en) * | 2019-02-13 | 2019-06-18 | 北京航空航天大学 | A kind of data index method towards streaming time series data |
KR102102313B1 (en) * | 2019-11-27 | 2020-04-20 | 주식회사 리얼타임테크 | System for Managing TimeSeries data in In-Memory Database |
CN111913950A (en) * | 2019-05-10 | 2020-11-10 | 上海顶岩自动化工程有限公司 | Event index analysis system for time sequence data storage |
CN112115361A (en) * | 2020-09-17 | 2020-12-22 | 浪潮卓数大数据产业发展有限公司 | Data retrieval optimization method and system based on elastic search |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106033324B (en) * | 2015-03-09 | 2020-03-06 | 杭州海康威视数字技术股份有限公司 | Data storage method and device |
CN106326464B (en) * | 2016-08-31 | 2019-09-10 | 成都科来软件有限公司 | A kind of network session packet indexing means based on retrieval information projection |
US20190079943A1 (en) * | 2017-09-11 | 2019-03-14 | Blackfynn Inc. | Real time and retrospective query integration |
CN110209686A (en) * | 2018-02-22 | 2019-09-06 | 北京嘀嘀无限科技发展有限公司 | Storage, querying method and the device of data |
CN109471905B (en) * | 2018-11-16 | 2020-08-25 | 华东师范大学 | Block chain indexing method supporting time range and attribute range compound query |
CN110046183A (en) * | 2019-04-16 | 2019-07-23 | 北京易沃特科技有限公司 | A kind of time series data polymerization search method, equipment and medium |
CN110134723A (en) * | 2019-05-22 | 2019-08-16 | 网易(杭州)网络有限公司 | A kind of method and database of storing data |
CN111552687B (en) * | 2020-03-10 | 2023-08-04 | 远景智能国际私人投资有限公司 | Time sequence data storage method, query method, device, equipment and storage medium |
-
2020
- 2020-12-29 CN CN202011590816.6A patent/CN112650756B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105117502A (en) * | 2015-10-13 | 2015-12-02 | 四川中科腾信科技有限公司 | Search method based on big data |
CN106446028A (en) * | 2016-08-31 | 2017-02-22 | 成都科来软件有限公司 | Novel index system of network conversation package |
CN108563711A (en) * | 2018-03-28 | 2018-09-21 | 山东昭元信息科技有限公司 | A kind of time series data storage method based on timing node |
CN108509592A (en) * | 2018-03-30 | 2018-09-07 | 贵阳朗玛信息技术股份有限公司 | Date storage method, read method based on Redis and device |
CN109902088A (en) * | 2019-02-13 | 2019-06-18 | 北京航空航天大学 | A kind of data index method towards streaming time series data |
CN111913950A (en) * | 2019-05-10 | 2020-11-10 | 上海顶岩自动化工程有限公司 | Event index analysis system for time sequence data storage |
KR102102313B1 (en) * | 2019-11-27 | 2020-04-20 | 주식회사 리얼타임테크 | System for Managing TimeSeries data in In-Memory Database |
CN112115361A (en) * | 2020-09-17 | 2020-12-22 | 浪潮卓数大数据产业发展有限公司 | Data retrieval optimization method and system based on elastic search |
Also Published As
Publication number | Publication date |
---|---|
CN112650756A (en) | 2021-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108376143B (en) | Novel OLAP pre-calculation system and method for generating pre-calculation result | |
CN108446349B (en) | GIS abnormal data detection method | |
CN105677683B (en) | Batch data querying method and device | |
CN102622434B (en) | Data storage method, data searching method and device | |
Wu | FastBit: an efficient indexing technology for accelerating data-intensive science | |
CN106528787B (en) | query method and device based on multidimensional analysis of mass data | |
CN110825733B (en) | Multi-sampling-stream-oriented time series data management method and system | |
CN106709851B (en) | Big data retrieval method and device | |
CN107766445B (en) | Efficient and rapid data retrieval method supporting multi-dimensional retrieval | |
CN102222099A (en) | Methods and devices for storing and searching data | |
CN112650756B (en) | Time projection indexing method and system based on time sequence data | |
CN110597852A (en) | Data processing method, device, terminal and storage medium | |
CN104765754A (en) | Data storage method and device | |
CN111090705A (en) | Multidimensional data processing method, multidimensional data processing device, multidimensional data processing equipment and storage medium | |
Gou et al. | Graph stream sketch: Summarizing graph streams with high speed and accuracy | |
CN109800228B (en) | Method for efficiently and quickly solving hash conflict | |
CN113468080B (en) | Caching method, system and related device for full-flash metadata | |
CN110825744B (en) | Cluster environment-based air quality monitoring big data partition storage method | |
CN104301182B (en) | A kind of querying method and device of the exception information of website visiting at a slow speed | |
CN113268636A (en) | Rapid retrieval method and device based on time sequence data | |
CN113360551B (en) | Method and system for storing and rapidly counting time sequence data in shooting range | |
CN110765128A (en) | Optimized storage method based on large-scale GPS data | |
CN112988846B (en) | Flow real-time statistical method and engine based on absolute time sliding window | |
CN114077581A (en) | Database based on data aggregation storage mode | |
CN113343034A (en) | IP searching method, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 610041 12th, 13th and 14th floors, unit 1, building 4, No. 966, north section of Tianfu Avenue, Chengdu hi tech Zone, China (Sichuan) pilot Free Trade Zone, Chengdu, Sichuan Applicant after: Kelai Network Technology Co.,Ltd. Address before: 41401-41406, 14th floor, unit 1, building 4, No. 966, north section of Tianfu Avenue, Chengdu hi tech Zone, Chengdu Free Trade Zone, Sichuan 610041 Applicant before: Chengdu Kelai Network Technology Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |