CN105302909B

CN105302909B - Network security log system big data search method based on subregion calculations of offset

Info

Publication number: CN105302909B
Application number: CN201510747856.XA
Authority: CN
Inventors: 王平; 何建锋; 郭增晖; 刘亚轩
Original assignee: Jiepu Network Science & Technology Co Ltd Xi'an Jiaoda
Current assignee: Jiepu Network Science & Technology Co Ltd Xi'an Jiaoda
Priority date: 2015-11-06
Filing date: 2015-11-06
Publication date: 2019-03-26
Anticipated expiration: 2035-11-06
Also published as: CN105302909A

Abstract

The network security log system big data search method based on subregion calculations of offset that the invention discloses a kind of, subregion is carried out to database table according to time rule and obtains several child partitions, at least one child partition for including in the time parameter is determined according to the time parameter to be retrieved of input, determine the retrieval data scale in the initial time and the time parameter of time parameter between first child partition of initial time, when the retrieval data scale is less than current page retrieval data scale, redefine the retrieval data scale in the initial time and the time parameter of time parameter between second child partition of initial time, it successively determines to the retrieval data scale between child partition once, when the retrieval data scale redefined is more than or equal to current page retrieval data scale, termination redefines retrieval The retrieval data scale currently redefined is shown by data scale.The present invention can effectively improve data processing performance.

Description

Network security log system big data search method based on subregion calculations of offset

Technical field

The invention belongs to technical field of network information safety, and in particular to a kind of network security based on subregion calculations of offset Log system big data search method.

Background technique

With becoming increasingly popular for Internet, network environment goes from strength to strength, the security log data of Network Security Device Scale also rapidly increases therewith, and traditional data processing scheme for relying solely on database itself processing energy can not expire completely Foot is actual to use needs.Log analysis effectiveness of retrieval directly affects Network Security Device event, wind in user environment It nearly analyzes, the user satisfaction of the timeliness of early warning and Network Security Device.Therefore, safety equipment daily record data is effectively promoted Retrieval process ability can promote the capacity index of safety analysis to a certain extent.

The performance of data retrieval is mainly by the cpu performance of Network Security Device, memory, database table design, retrieval software It realizes.And for safety equipment, the hardware such as CPU, memory often by cost impact, be it is relatively limited, database table design exists Retrieval performance can be promoted to a certain extent, however on facing single table 1,000,000,000 or more extensive situation under, often promoted cannot expire Foot requires.It therefore, is most important with improving performance by optimizing data retrievad algorithm under the premise of not influencing functions of the equipments 's.

Summary of the invention

In view of this, the main purpose of the present invention is to provide a kind of network security day aspiration based on subregion calculations of offset System big data search method.

In order to achieve the above objectives, the technical scheme of the present invention is realized as follows:

The embodiment of the present invention provides a kind of network security log system big data search method based on subregion calculations of offset, This method are as follows: subregion is carried out to database table according to time rule and obtains several child partitions, according to input it is to be retrieved when Between parameter determine at least one child partition for including in the time parameter, determine time parameter initial time and the time Retrieval data scale in parameter between first child partition of initial time is worked as when the retrieval data scale is less than When preceding page retrieval data scale, redefine in the initial time and the time parameter of time parameter near initial time Retrieval data scale between second child partition is successively determined to the retrieval data scale between child partition once, Zhi Daochong When newly determining retrieval data scale is more than or equal to current page retrieval data scale, termination redefines retrieval data scale, will The retrieval data scale currently redefined is shown.

In above scheme, this method further include: retrieve data scale when the retrieval data scale is more than or equal to current page When, the retrieval data scale is shown.

In above scheme, this method further include: when in the initial time and the time parameter for redefining time parameter The last one child partition between retrieval data scale when, directly the retrieval data scale redefined is opened up Show.

In above scheme, this method further include: determine time parameter deadline and the time parameter near Retrieval data scale between first child partition of deadline retrieves data when the retrieval data scale is less than current page When scale, second child partition in the deadline and the time parameter of time parameter near deadline is redefined Between retrieval data scale, the retrieval data scale between once child partition is successively determined, until the retrieval redefined When data scale is more than or equal to current page retrieval data scale, termination redefines retrieval data scale, will currently redefine Retrieval data scale be shown.

In above scheme, this method further include: by the initial time of determining time parameter or deadline and it is described when Between retrieval data scale in parameter between child partition cached, when next input time parameter, directly search in caching Between child partition with the presence or absence of the corresponding initial time of the input time parameter or in deadline and input time parameter Retrieve data scale, if it does, directly from caching obtain and with current page retrieval data scale be compared, according to than Relatively result is shown or continues to redefine retrieval data scale and search in caching with the presence or absence of the retrieval redefined Data scale；Conversely, redefining retrieval data scale and being cached.

Compared with prior art, beneficial effects of the present invention:

The present invention can effectively improve the performance aimed at Network Security Device day under large-scale data, greatly reduce equipment The time cost of daily data retrieval improves the real-time and Product's Ease of Use of device analysis response, promotes product customer satisfaction.

Detailed description of the invention

Fig. 1 provides a kind of network security log system big data retrieval side based on subregion calculations of offset for present invention implementation The migration algorithm schematic diagram of method.

Specific embodiment

The following describes the present invention in detail with reference to the accompanying drawings and specific embodiments.

It is described that several child partitions are obtained to database table progress subregion according to time rule, it can be according to hour or day Equal chronomeres carry out subregion to database table.

When the retrieval data scale is more than or equal to current page retrieval data scale, the retrieval data scale is carried out It shows.

This method further include: last height in the initial time and the time parameter for redefining time parameter When retrieval data scale between subregion, directly the retrieval data scale redefined is shown.

This method further include: determine in the deadline and the time parameter of time parameter near the of deadline Retrieval data scale between one child partition, when the retrieval data scale is less than current page retrieval data scale, again Determine the retrieval in the deadline and the time parameter of time parameter between second child partition of deadline Data scale is successively determined to the retrieval data scale between child partition once, until the retrieval data scale redefined is big When being equal to current page retrieval data scale, termination redefines retrieval data scale, the retrieval data that will currently redefine Scale is shown.

This method further include: by son point in the initial time of determining time parameter or deadline and the time parameter Retrieval data scale between area is cached, and when next input time parameter, is directly searched in caching with the presence or absence of described Data scale is retrieved between the corresponding initial time of input time parameter or child partition in deadline and input time parameter, If it does, obtaining directly from caching and being compared with current page retrieval data scale, opened up according to comparison result Show or continues to redefine retrieval data scale and search in caching with the presence or absence of the retrieval data scale redefined；Instead It, redefines retrieval data scale and is cached.

Embodiment 1:

It is illustrated so that positive sequence is retrieved as an example through the invention.

Step 1: the time range of input time parameter T1, T2 is calculated.

How many child partition between T1, T2 is calculated, passes through the subregion of database query system itself using SQL instruction Table, inquiry zone time point (by taking time rule part describes example as an example, which is the morning zero point on the same day) are greater than T1, Subregion less than or equal to T2 have those subregions (sql instruction: select P.TABLE_NAME, P.PARTITION_NAME, P.PARTITION_DESCRIPTION,TABLE_ROWS from information_schema.PARTITIONS as P where P.TABLE_NAME='log_event_http';), as shown in Figure 1, obtaining P2, P3, P4, P5, P6, and record each At the beginning of subregion and the end time, such as: the P2 subregion time started is D2, and the end time is D3 etc., and so on.

Step 2: according to page data scale S before list, in positive sequence retrieval, (i.e. using the time as sequence permutation with positive order, sql instruction is Order by gtime asc) in the case of.

Assuming that first subregion time started after T1 is Dn(n=2), steps are as follows for calculations of offset:

Step 201, the size of data Sn between T1 ~ Dn is calculated, the data inquired between T1 ~ Dn first in partition cache are big Small sn executes step 2 if it does, Sn is assigned to retrieval data scale ST.If it does not exist, then needing to carry out database Inquiry calculates (to be inquired: select count (1) from log_event_http where gtime >=T1 and using sql Gtime≤Dn), and Sn is assigned to retrieval data scale ST, i.e. ST=Sn, continue step 2.

Step 202, if ST >=S, step 203 is carried out, otherwise carries out step 204.

Step 203, retrieve current page data (such as: select * from log_event_http where gtime >=T1 and gtime≤Dn limit PS*P, P), and return to application program and be shown, it calculates and stops.

Step 204, the size of data Sn+1 between Dn ~ Dn+1 is calculated according to the current offset time, and recalculates ST, i.e., ST=ST+Sn+1.N=n+1, and execute step 202.

Circulation executes above step, and until retrieving T1, when the last one subregion or ST=S between T2 stops.

Embodiment 2:

According to page data scale S before list, in backward retrieval, (i.e. using the time as sequence permutation with positive order, sql instruction is order by Gtime desc) in the case of, it is assumed that first subregion time started before T2 is Dn(n=7), steps are as follows for calculations of offset:

Step 1, the size of data Sn between T1 ~ Dn is calculated, the size of data between T1 ~ Dn is inquired first in partition cache Sn executes step 2 if it does, Sn is assigned to retrieval data scale ST.It is looked into if it does not exist, then needing to carry out database It askes to calculate and (be inquired using sql: select count (1) from log_event_http where gtime >=T1 and Gtime≤Dn), and Sn is assigned to retrieval data scale ST, i.e. ST=Sn, continue step 2.

Step 2, if ST >=S, step 3 is carried out, otherwise carries out step 4.

Step 3, retrieve current page data (such as: select * from log_event_http where gtime >= T1 and gtime≤Dn order by gtime desc limit PS*P, P), and return to application program and be shown, It calculates and stops.

Step 4, the size of data Sn-1 between Dn ~ Dn-1 is calculated according to the current offset time, and recalculate ST, i.e. ST =ST+Sn-1.N=n-1, and execute step 2.

Embodiment 3:

The data scale (Sn i.e. in above step 1) of single subregion is cached (with the beginning of current cache subregion Time and end time are mark, establish cache file, and hereof by the scale Sn of time subregion record), when with weight It is multiple when carrying out same search condition to certain subregion and retrieving, first retrieve cache file.In the absence of caching, it is cached. When it is present, the record number Sn of direct return cache.Will pass through reading caching record, directly progress data judgement, return refers to Determine the data of subregion.Using the calculations of offset after caching, steps are as follows:

Step 1, according to time migration T1 ~ Dn as identifying, the data scale Sn of current offset piece is obtained from caching, and Sn is assigned to retrieval data scale ST, i.e. ST=Sn.

Step 2, if ST >=S, step 3 is carried out, otherwise carries out step 4.

Step 3, data are retrieved, and returns to and is shown using data.

Step 4, the data scale Sn+ of current offset piece is obtained as identifying from caching according to current offset Dn ~ Dn+1 1, and recalculate ST, i.e. ST=ST+Sn+1.N=n+1, and execute step 2.

Circulation executes above step, until the data scale S size data of retrieval calculating to current page, stops

The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention.

Claims

1. a kind of network security log system big data search method based on subregion calculations of offset, which is characterized in that this method Are as follows: subregion is carried out to database table according to time rule and obtains several child partitions, according to the time parameter to be retrieved of input It determines at least one child partition for including in the time parameter, determines in initial time and the time parameter of time parameter Retrieval data scale between first child partition of initial time, when the retrieval data scale is examined less than current page When rope data scale, redefine in the initial time and the time parameter of time parameter near second of initial time Retrieval data scale between child partition successively determines the retrieval data scale between next child partition, until again true When fixed retrieval data scale is more than or equal to current page retrieval data scale, termination redefines retrieval data scale, will be current The retrieval data scale redefined is shown；

The retrieval data are realized especially by following steps:

Step 1: the time range of input time parameter T1, T2 is calculated；

Step 2: according to page data scale S before list, in positive sequence retrieval, it is assumed that first subregion time started after T1 be Dn, n=2, steps are as follows for calculations of offset:

Step 201, the size of data Sn between T1 ~ Dn is calculated, the size of data between T1 ~ Dn is inquired first in partition cache Sn executes step 2, looks into if it does not exist, then needing to carry out database if it does, Sn is assigned to retrieval data scale ST It askes and calculates, and Sn is assigned to retrieval data scale ST, i.e. ST=Sn, continue step 2；

Step 202, if ST >=S, step 203 is carried out, otherwise carries out step 204；

Step 203, current page data is retrieved, and returns to application program and is shown, calculates and stops；

Step 204, the size of data Sn+1 between Dn ~ Dn+1 is calculated according to the current offset time, and recalculate ST, i.e. ST= ST+Sn+1, n=n+1, and execute step 202；

Circulation executes above step, and until retrieving T1, when the last one subregion or ST=S between T2 stops；

Or

According to page data scale S before list, in backward retrieval, it is assumed that first subregion time started before T2 is Dn, partially Moving calculating, steps are as follows:

Step A calculates the size of data Sn between T1 ~ Dn, and the size of data sn between T1 ~ Dn is inquired first in partition cache, If it does, Sn is assigned to retrieval data scale ST, step B is executed；If it does not exist, then needing to carry out data base querying meter It calculates, and Sn is assigned to retrieval data scale ST, i.e. ST=Sn, continue step B；

Step B carries out step C if ST >=S, otherwise carries out step D；

Step C retrieves current page data, and returns to application program and be shown, and calculates and stops；

Step D calculates the size of data Sn-1 between Dn ~ Dn-1 according to the current offset time, and recalculates ST, i.e. ST=ST+ Sn-1, n=n-1, and execute step B；

Or

The data scale of single subregion is cached, the same search condition of certain subregion progress is retrieved when with duplicate When, cache file is first retrieved, in the absence of caching, caches it, when it is present, the record number Sn of direct return cache；So as to By reading caching record, directly progress data judgement returns to the data of specified partition；Use the calculations of offset step after caching It is as follows:

Step 301, according to time migration T1 ~ Dn as identifying, the data scale Sn of current offset piece is obtained from caching, and will Sn is assigned to retrieval data scale ST, i.e. ST=Sn；

Step 302, if ST >=S, step 303 is carried out, otherwise carries out step 304；

Step 303, data are retrieved, and returns to and is shown using data；

Step 304, the data scale Sn+1 of current offset piece is obtained as identifying from caching according to current offset Dn ~ Dn+1, And recalculate ST, i.e. ST=ST+Sn+1, n=n+1, and execute step 302；

Circulation executes above step, after retrieval calculates and arrives the size data of the data scale S of current page, stops.

2. the network security log system big data search method according to claim 1 based on subregion calculations of offset, It is characterized in that, this method further include: when the last one in the initial time and the time parameter for redefining time parameter When retrieval data scale between child partition, directly the retrieval data scale redefined is shown.

3. the network security log system big data search method according to claim 1 based on subregion calculations of offset, It is characterized in that, this method further include: determine in the deadline and the time parameter of time parameter near deadline Retrieval data scale between first child partition, when the retrieval data scale is less than current page retrieval data scale, weight Inspection in the new deadline for determining time parameter and the time parameter between second child partition of deadline Rope data scale successively determines the retrieval data scale between next child partition, until the retrieval data rule redefined When mould is more than or equal to current page retrieval data scale, termination redefines retrieval data scale, the retrieval that will currently redefine Data scale is shown.

4. the network security log system big data search method according to claim 3 based on subregion calculations of offset, It is characterized in that, this method further include: by son in the initial time of determining time parameter or deadline and the time parameter Retrieval data scale between subregion is cached, and when next input time parameter, directly searching whether there is institute in caching State retrieval data rule between the corresponding initial time of input time parameter or child partition in deadline and input time parameter Mould carries out if it does, obtaining directly from caching and being compared with current page retrieval data scale according to comparison result It shows or continues to redefine retrieval data scale and search in caching with the presence or absence of the retrieval data scale redefined；Instead It, redefines retrieval data scale and is cached.