CN105302909B - Network security log system big data search method based on subregion calculations of offset - Google Patents
Network security log system big data search method based on subregion calculations of offset Download PDFInfo
- Publication number
- CN105302909B CN105302909B CN201510747856.XA CN201510747856A CN105302909B CN 105302909 B CN105302909 B CN 105302909B CN 201510747856 A CN201510747856 A CN 201510747856A CN 105302909 B CN105302909 B CN 105302909B
- Authority
- CN
- China
- Prior art keywords
- data scale
- retrieval
- data
- retrieval data
- scale
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The network security log system big data search method based on subregion calculations of offset that the invention discloses a kind of, subregion is carried out to database table according to time rule and obtains several child partitions, at least one child partition for including in the time parameter is determined according to the time parameter to be retrieved of input, determine the retrieval data scale in the initial time and the time parameter of time parameter between first child partition of initial time, when the retrieval data scale is less than current page retrieval data scale, redefine the retrieval data scale in the initial time and the time parameter of time parameter between second child partition of initial time, it successively determines to the retrieval data scale between child partition once, when the retrieval data scale redefined is more than or equal to current page retrieval data scale, termination redefines retrieval The retrieval data scale currently redefined is shown by data scale.The present invention can effectively improve data processing performance.
Description
Technical field
The invention belongs to technical field of network information safety, and in particular to a kind of network security based on subregion calculations of offset
Log system big data search method.
Background technique
With becoming increasingly popular for Internet, network environment goes from strength to strength, the security log data of Network Security Device
Scale also rapidly increases therewith, and traditional data processing scheme for relying solely on database itself processing energy can not expire completely
Foot is actual to use needs.Log analysis effectiveness of retrieval directly affects Network Security Device event, wind in user environment
It nearly analyzes, the user satisfaction of the timeliness of early warning and Network Security Device.Therefore, safety equipment daily record data is effectively promoted
Retrieval process ability can promote the capacity index of safety analysis to a certain extent.
The performance of data retrieval is mainly by the cpu performance of Network Security Device, memory, database table design, retrieval software
It realizes.And for safety equipment, the hardware such as CPU, memory often by cost impact, be it is relatively limited, database table design exists
Retrieval performance can be promoted to a certain extent, however on facing single table 1,000,000,000 or more extensive situation under, often promoted cannot expire
Foot requires.It therefore, is most important with improving performance by optimizing data retrievad algorithm under the premise of not influencing functions of the equipments
's.
Summary of the invention
In view of this, the main purpose of the present invention is to provide a kind of network security day aspiration based on subregion calculations of offset
System big data search method.
In order to achieve the above objectives, the technical scheme of the present invention is realized as follows:
The embodiment of the present invention provides a kind of network security log system big data search method based on subregion calculations of offset,
This method are as follows: subregion is carried out to database table according to time rule and obtains several child partitions, according to input it is to be retrieved when
Between parameter determine at least one child partition for including in the time parameter, determine time parameter initial time and the time
Retrieval data scale in parameter between first child partition of initial time is worked as when the retrieval data scale is less than
When preceding page retrieval data scale, redefine in the initial time and the time parameter of time parameter near initial time
Retrieval data scale between second child partition is successively determined to the retrieval data scale between child partition once, Zhi Daochong
When newly determining retrieval data scale is more than or equal to current page retrieval data scale, termination redefines retrieval data scale, will
The retrieval data scale currently redefined is shown.
In above scheme, this method further include: retrieve data scale when the retrieval data scale is more than or equal to current page
When, the retrieval data scale is shown.
In above scheme, this method further include: when in the initial time and the time parameter for redefining time parameter
The last one child partition between retrieval data scale when, directly the retrieval data scale redefined is opened up
Show.
In above scheme, this method further include: determine time parameter deadline and the time parameter near
Retrieval data scale between first child partition of deadline retrieves data when the retrieval data scale is less than current page
When scale, second child partition in the deadline and the time parameter of time parameter near deadline is redefined
Between retrieval data scale, the retrieval data scale between once child partition is successively determined, until the retrieval redefined
When data scale is more than or equal to current page retrieval data scale, termination redefines retrieval data scale, will currently redefine
Retrieval data scale be shown.
In above scheme, this method further include: by the initial time of determining time parameter or deadline and it is described when
Between retrieval data scale in parameter between child partition cached, when next input time parameter, directly search in caching
Between child partition with the presence or absence of the corresponding initial time of the input time parameter or in deadline and input time parameter
Retrieve data scale, if it does, directly from caching obtain and with current page retrieval data scale be compared, according to than
Relatively result is shown or continues to redefine retrieval data scale and search in caching with the presence or absence of the retrieval redefined
Data scale;Conversely, redefining retrieval data scale and being cached.
Compared with prior art, beneficial effects of the present invention:
The present invention can effectively improve the performance aimed at Network Security Device day under large-scale data, greatly reduce equipment
The time cost of daily data retrieval improves the real-time and Product's Ease of Use of device analysis response, promotes product customer satisfaction.
Detailed description of the invention
Fig. 1 provides a kind of network security log system big data retrieval side based on subregion calculations of offset for present invention implementation
The migration algorithm schematic diagram of method.
Specific embodiment
The following describes the present invention in detail with reference to the accompanying drawings and specific embodiments.
The embodiment of the present invention provides a kind of network security log system big data search method based on subregion calculations of offset,
This method are as follows: subregion is carried out to database table according to time rule and obtains several child partitions, according to input it is to be retrieved when
Between parameter determine at least one child partition for including in the time parameter, determine time parameter initial time and the time
Retrieval data scale in parameter between first child partition of initial time is worked as when the retrieval data scale is less than
When preceding page retrieval data scale, redefine in the initial time and the time parameter of time parameter near initial time
Retrieval data scale between second child partition is successively determined to the retrieval data scale between child partition once, Zhi Daochong
When newly determining retrieval data scale is more than or equal to current page retrieval data scale, termination redefines retrieval data scale, will
The retrieval data scale currently redefined is shown.
It is described that several child partitions are obtained to database table progress subregion according to time rule, it can be according to hour or day
Equal chronomeres carry out subregion to database table.
When the retrieval data scale is more than or equal to current page retrieval data scale, the retrieval data scale is carried out
It shows.
This method further include: last height in the initial time and the time parameter for redefining time parameter
When retrieval data scale between subregion, directly the retrieval data scale redefined is shown.
This method further include: determine in the deadline and the time parameter of time parameter near the of deadline
Retrieval data scale between one child partition, when the retrieval data scale is less than current page retrieval data scale, again
Determine the retrieval in the deadline and the time parameter of time parameter between second child partition of deadline
Data scale is successively determined to the retrieval data scale between child partition once, until the retrieval data scale redefined is big
When being equal to current page retrieval data scale, termination redefines retrieval data scale, the retrieval data that will currently redefine
Scale is shown.
This method further include: by son point in the initial time of determining time parameter or deadline and the time parameter
Retrieval data scale between area is cached, and when next input time parameter, is directly searched in caching with the presence or absence of described
Data scale is retrieved between the corresponding initial time of input time parameter or child partition in deadline and input time parameter,
If it does, obtaining directly from caching and being compared with current page retrieval data scale, opened up according to comparison result
Show or continues to redefine retrieval data scale and search in caching with the presence or absence of the retrieval data scale redefined;Instead
It, redefines retrieval data scale and is cached.
Embodiment 1:
It is illustrated so that positive sequence is retrieved as an example through the invention.
Step 1: the time range of input time parameter T1, T2 is calculated.
How many child partition between T1, T2 is calculated, passes through the subregion of database query system itself using SQL instruction
Table, inquiry zone time point (by taking time rule part describes example as an example, which is the morning zero point on the same day) are greater than T1,
Subregion less than or equal to T2 have those subregions (sql instruction: select P.TABLE_NAME, P.PARTITION_NAME,
P.PARTITION_DESCRIPTION,TABLE_ROWS from information_schema.PARTITIONS as P
where P.TABLE_NAME='log_event_http';), as shown in Figure 1, obtaining P2, P3, P4, P5, P6, and record each
At the beginning of subregion and the end time, such as: the P2 subregion time started is D2, and the end time is D3 etc., and so on.
Step 2: according to page data scale S before list, in positive sequence retrieval, (i.e. using the time as sequence permutation with positive order, sql instruction is
Order by gtime asc) in the case of.
Assuming that first subregion time started after T1 is Dn(n=2), steps are as follows for calculations of offset:
Step 201, the size of data Sn between T1 ~ Dn is calculated, the data inquired between T1 ~ Dn first in partition cache are big
Small sn executes step 2 if it does, Sn is assigned to retrieval data scale ST.If it does not exist, then needing to carry out database
Inquiry calculates (to be inquired: select count (1) from log_event_http where gtime >=T1 and using sql
Gtime≤Dn), and Sn is assigned to retrieval data scale ST, i.e. ST=Sn, continue step 2.
Step 202, if ST >=S, step 203 is carried out, otherwise carries out step 204.
Step 203, retrieve current page data (such as: select * from log_event_http where gtime
>=T1 and gtime≤Dn limit PS*P, P), and return to application program and be shown, it calculates and stops.
Step 204, the size of data Sn+1 between Dn ~ Dn+1 is calculated according to the current offset time, and recalculates ST, i.e.,
ST=ST+Sn+1.N=n+1, and execute step 202.
Circulation executes above step, and until retrieving T1, when the last one subregion or ST=S between T2 stops.
Embodiment 2:
According to page data scale S before list, in backward retrieval, (i.e. using the time as sequence permutation with positive order, sql instruction is order by
Gtime desc) in the case of, it is assumed that first subregion time started before T2 is Dn(n=7), steps are as follows for calculations of offset:
Step 1, the size of data Sn between T1 ~ Dn is calculated, the size of data between T1 ~ Dn is inquired first in partition cache
Sn executes step 2 if it does, Sn is assigned to retrieval data scale ST.It is looked into if it does not exist, then needing to carry out database
It askes to calculate and (be inquired using sql: select count (1) from log_event_http where gtime >=T1 and
Gtime≤Dn), and Sn is assigned to retrieval data scale ST, i.e. ST=Sn, continue step 2.
Step 2, if ST >=S, step 3 is carried out, otherwise carries out step 4.
Step 3, retrieve current page data (such as: select * from log_event_http where gtime >=
T1 and gtime≤Dn order by gtime desc limit PS*P, P), and return to application program and be shown,
It calculates and stops.
Step 4, the size of data Sn-1 between Dn ~ Dn-1 is calculated according to the current offset time, and recalculate ST, i.e. ST
=ST+Sn-1.N=n-1, and execute step 2.
Circulation executes above step, and until retrieving T1, when the last one subregion or ST=S between T2 stops.
Embodiment 3:
The data scale (Sn i.e. in above step 1) of single subregion is cached (with the beginning of current cache subregion
Time and end time are mark, establish cache file, and hereof by the scale Sn of time subregion record), when with weight
It is multiple when carrying out same search condition to certain subregion and retrieving, first retrieve cache file.In the absence of caching, it is cached.
When it is present, the record number Sn of direct return cache.Will pass through reading caching record, directly progress data judgement, return refers to
Determine the data of subregion.Using the calculations of offset after caching, steps are as follows:
Step 1, according to time migration T1 ~ Dn as identifying, the data scale Sn of current offset piece is obtained from caching, and
Sn is assigned to retrieval data scale ST, i.e. ST=Sn.
Step 2, if ST >=S, step 3 is carried out, otherwise carries out step 4.
Step 3, data are retrieved, and returns to and is shown using data.
Step 4, the data scale Sn+ of current offset piece is obtained as identifying from caching according to current offset Dn ~ Dn+1
1, and recalculate ST, i.e. ST=ST+Sn+1.N=n+1, and execute step 2.
Circulation executes above step, until the data scale S size data of retrieval calculating to current page, stops
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention.
Claims (4)
1. a kind of network security log system big data search method based on subregion calculations of offset, which is characterized in that this method
Are as follows: subregion is carried out to database table according to time rule and obtains several child partitions, according to the time parameter to be retrieved of input
It determines at least one child partition for including in the time parameter, determines in initial time and the time parameter of time parameter
Retrieval data scale between first child partition of initial time, when the retrieval data scale is examined less than current page
When rope data scale, redefine in the initial time and the time parameter of time parameter near second of initial time
Retrieval data scale between child partition successively determines the retrieval data scale between next child partition, until again true
When fixed retrieval data scale is more than or equal to current page retrieval data scale, termination redefines retrieval data scale, will be current
The retrieval data scale redefined is shown;
The retrieval data are realized especially by following steps:
Step 1: the time range of input time parameter T1, T2 is calculated;
Step 2: according to page data scale S before list, in positive sequence retrieval, it is assumed that first subregion time started after T1 be
Dn, n=2, steps are as follows for calculations of offset:
Step 201, the size of data Sn between T1 ~ Dn is calculated, the size of data between T1 ~ Dn is inquired first in partition cache
Sn executes step 2, looks into if it does not exist, then needing to carry out database if it does, Sn is assigned to retrieval data scale ST
It askes and calculates, and Sn is assigned to retrieval data scale ST, i.e. ST=Sn, continue step 2;
Step 202, if ST >=S, step 203 is carried out, otherwise carries out step 204;
Step 203, current page data is retrieved, and returns to application program and is shown, calculates and stops;
Step 204, the size of data Sn+1 between Dn ~ Dn+1 is calculated according to the current offset time, and recalculate ST, i.e. ST=
ST+Sn+1, n=n+1, and execute step 202;
Circulation executes above step, and until retrieving T1, when the last one subregion or ST=S between T2 stops;
Or
According to page data scale S before list, in backward retrieval, it is assumed that first subregion time started before T2 is Dn, partially
Moving calculating, steps are as follows:
Step A calculates the size of data Sn between T1 ~ Dn, and the size of data sn between T1 ~ Dn is inquired first in partition cache,
If it does, Sn is assigned to retrieval data scale ST, step B is executed;If it does not exist, then needing to carry out data base querying meter
It calculates, and Sn is assigned to retrieval data scale ST, i.e. ST=Sn, continue step B;
Step B carries out step C if ST >=S, otherwise carries out step D;
Step C retrieves current page data, and returns to application program and be shown, and calculates and stops;
Step D calculates the size of data Sn-1 between Dn ~ Dn-1 according to the current offset time, and recalculates ST, i.e. ST=ST+
Sn-1, n=n-1, and execute step B;
Circulation executes above step, and until retrieving T1, when the last one subregion or ST=S between T2 stops;
Or
The data scale of single subregion is cached, the same search condition of certain subregion progress is retrieved when with duplicate
When, cache file is first retrieved, in the absence of caching, caches it, when it is present, the record number Sn of direct return cache;So as to
By reading caching record, directly progress data judgement returns to the data of specified partition;Use the calculations of offset step after caching
It is as follows:
Step 301, according to time migration T1 ~ Dn as identifying, the data scale Sn of current offset piece is obtained from caching, and will
Sn is assigned to retrieval data scale ST, i.e. ST=Sn;
Step 302, if ST >=S, step 303 is carried out, otherwise carries out step 304;
Step 303, data are retrieved, and returns to and is shown using data;
Step 304, the data scale Sn+1 of current offset piece is obtained as identifying from caching according to current offset Dn ~ Dn+1,
And recalculate ST, i.e. ST=ST+Sn+1, n=n+1, and execute step 302;
Circulation executes above step, after retrieval calculates and arrives the size data of the data scale S of current page, stops.
2. the network security log system big data search method according to claim 1 based on subregion calculations of offset,
It is characterized in that, this method further include: when the last one in the initial time and the time parameter for redefining time parameter
When retrieval data scale between child partition, directly the retrieval data scale redefined is shown.
3. the network security log system big data search method according to claim 1 based on subregion calculations of offset,
It is characterized in that, this method further include: determine in the deadline and the time parameter of time parameter near deadline
Retrieval data scale between first child partition, when the retrieval data scale is less than current page retrieval data scale, weight
Inspection in the new deadline for determining time parameter and the time parameter between second child partition of deadline
Rope data scale successively determines the retrieval data scale between next child partition, until the retrieval data rule redefined
When mould is more than or equal to current page retrieval data scale, termination redefines retrieval data scale, the retrieval that will currently redefine
Data scale is shown.
4. the network security log system big data search method according to claim 3 based on subregion calculations of offset,
It is characterized in that, this method further include: by son in the initial time of determining time parameter or deadline and the time parameter
Retrieval data scale between subregion is cached, and when next input time parameter, directly searching whether there is institute in caching
State retrieval data rule between the corresponding initial time of input time parameter or child partition in deadline and input time parameter
Mould carries out if it does, obtaining directly from caching and being compared with current page retrieval data scale according to comparison result
It shows or continues to redefine retrieval data scale and search in caching with the presence or absence of the retrieval data scale redefined;Instead
It, redefines retrieval data scale and is cached.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510747856.XA CN105302909B (en) | 2015-11-06 | 2015-11-06 | Network security log system big data search method based on subregion calculations of offset |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510747856.XA CN105302909B (en) | 2015-11-06 | 2015-11-06 | Network security log system big data search method based on subregion calculations of offset |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105302909A CN105302909A (en) | 2016-02-03 |
CN105302909B true CN105302909B (en) | 2019-03-26 |
Family
ID=55200178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510747856.XA Active CN105302909B (en) | 2015-11-06 | 2015-11-06 | Network security log system big data search method based on subregion calculations of offset |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105302909B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766408A (en) * | 2017-08-31 | 2018-03-06 | 西安交大捷普网络科技有限公司 | The storage method of audit log |
CN109543079B (en) * | 2018-11-27 | 2021-02-02 | 北京锐安科技有限公司 | Data query method and device, computing equipment and storage medium |
CN110321388B (en) * | 2019-02-26 | 2021-07-02 | 南威软件股份有限公司 | Quick sequencing query method and system based on Greenplus |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1858735A (en) * | 2005-12-30 | 2006-11-08 | 华为技术有限公司 | Method for processing mass data |
CN101425064A (en) * | 2007-10-29 | 2009-05-06 | 英业达股份有限公司 | Processing method and system for testing log |
CN104281684A (en) * | 2014-09-30 | 2015-01-14 | 东软集团股份有限公司 | Method and system for storing and querying mass logs |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014053313A1 (en) * | 2012-10-04 | 2014-04-10 | Alcatel Lucent | Data logs management in a multi-client architecture |
-
2015
- 2015-11-06 CN CN201510747856.XA patent/CN105302909B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1858735A (en) * | 2005-12-30 | 2006-11-08 | 华为技术有限公司 | Method for processing mass data |
CN101425064A (en) * | 2007-10-29 | 2009-05-06 | 英业达股份有限公司 | Processing method and system for testing log |
CN104281684A (en) * | 2014-09-30 | 2015-01-14 | 东软集团股份有限公司 | Method and system for storing and querying mass logs |
Also Published As
Publication number | Publication date |
---|---|
CN105302909A (en) | 2016-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7836044B2 (en) | Anticipated query generation and processing in a search engine | |
US8880512B2 (en) | Method, apparatus and system, for rewriting search queries | |
US9189047B2 (en) | Organizing databases for energy efficiency | |
US7941426B2 (en) | Optimizing database queries | |
CN107783985B (en) | Distributed database query method, device and management system | |
CN105981011B (en) | Trend response management | |
CN106294815B (en) | A kind of clustering method and device of URL | |
CN104778185B (en) | Anomaly sxtructure query language SQL statement determines method and server | |
CN104090934B (en) | A kind of standards service platform Distributed Parallel Computing database and its search method | |
US10929397B2 (en) | Forecasting query access plan obsolescence | |
CN103176974A (en) | Method and device used for optimizing access path in data base | |
CN105302909B (en) | Network security log system big data search method based on subregion calculations of offset | |
US20220358178A1 (en) | Data query method, electronic device, and storage medium | |
US20150234883A1 (en) | Method and system for retrieving real-time information | |
US8938443B2 (en) | Runtime optimization of spatiotemporal events processing | |
CN110347706A (en) | For handling method, Database Systems and the computer readable storage medium of inquiry | |
CN104391923A (en) | Data set query method and apparatus | |
CN106874332B (en) | Database access method and device | |
KR102476620B1 (en) | Cache automatic control system | |
CN106649489B (en) | Continuous skyline query processing mechanism in geographic text information data | |
CN108536819B (en) | Method, device, server and storage medium for comparing integer column and character string | |
Gan et al. | Processing online aggregation on skewed data in mapreduce | |
CN102339292A (en) | Distributed searching method and system | |
CN112883064B (en) | Self-adaptive sampling and query method and system | |
CN110633430B (en) | Event discovery method, apparatus, device, and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |