CN105302909B - Network security log system big data search method based on subregion calculations of offset - Google Patents

Network security log system big data search method based on subregion calculations of offset Download PDF

Info

Publication number
CN105302909B
CN105302909B CN201510747856.XA CN201510747856A CN105302909B CN 105302909 B CN105302909 B CN 105302909B CN 201510747856 A CN201510747856 A CN 201510747856A CN 105302909 B CN105302909 B CN 105302909B
Authority
CN
China
Prior art keywords
data scale
retrieval
data
retrieval data
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510747856.XA
Other languages
Chinese (zh)
Other versions
CN105302909A (en
Inventor
王平
何建锋
郭增晖
刘亚轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiepu Network Science & Technology Co Ltd Xi'an Jiaoda
Original Assignee
Jiepu Network Science & Technology Co Ltd Xi'an Jiaoda
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiepu Network Science & Technology Co Ltd Xi'an Jiaoda filed Critical Jiepu Network Science & Technology Co Ltd Xi'an Jiaoda
Priority to CN201510747856.XA priority Critical patent/CN105302909B/en
Publication of CN105302909A publication Critical patent/CN105302909A/en
Application granted granted Critical
Publication of CN105302909B publication Critical patent/CN105302909B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The network security log system big data search method based on subregion calculations of offset that the invention discloses a kind of, subregion is carried out to database table according to time rule and obtains several child partitions, at least one child partition for including in the time parameter is determined according to the time parameter to be retrieved of input, determine the retrieval data scale in the initial time and the time parameter of time parameter between first child partition of initial time, when the retrieval data scale is less than current page retrieval data scale, redefine the retrieval data scale in the initial time and the time parameter of time parameter between second child partition of initial time, it successively determines to the retrieval data scale between child partition once, when the retrieval data scale redefined is more than or equal to current page retrieval data scale, termination redefines retrieval The retrieval data scale currently redefined is shown by data scale.The present invention can effectively improve data processing performance.

Description

Network security log system big data search method based on subregion calculations of offset
Technical field
The invention belongs to technical field of network information safety, and in particular to a kind of network security based on subregion calculations of offset Log system big data search method.
Background technique
With becoming increasingly popular for Internet, network environment goes from strength to strength, the security log data of Network Security Device Scale also rapidly increases therewith, and traditional data processing scheme for relying solely on database itself processing energy can not expire completely Foot is actual to use needs.Log analysis effectiveness of retrieval directly affects Network Security Device event, wind in user environment It nearly analyzes, the user satisfaction of the timeliness of early warning and Network Security Device.Therefore, safety equipment daily record data is effectively promoted Retrieval process ability can promote the capacity index of safety analysis to a certain extent.
The performance of data retrieval is mainly by the cpu performance of Network Security Device, memory, database table design, retrieval software It realizes.And for safety equipment, the hardware such as CPU, memory often by cost impact, be it is relatively limited, database table design exists Retrieval performance can be promoted to a certain extent, however on facing single table 1,000,000,000 or more extensive situation under, often promoted cannot expire Foot requires.It therefore, is most important with improving performance by optimizing data retrievad algorithm under the premise of not influencing functions of the equipments 's.
Summary of the invention
In view of this, the main purpose of the present invention is to provide a kind of network security day aspiration based on subregion calculations of offset System big data search method.
In order to achieve the above objectives, the technical scheme of the present invention is realized as follows:
The embodiment of the present invention provides a kind of network security log system big data search method based on subregion calculations of offset, This method are as follows: subregion is carried out to database table according to time rule and obtains several child partitions, according to input it is to be retrieved when Between parameter determine at least one child partition for including in the time parameter, determine time parameter initial time and the time Retrieval data scale in parameter between first child partition of initial time is worked as when the retrieval data scale is less than When preceding page retrieval data scale, redefine in the initial time and the time parameter of time parameter near initial time Retrieval data scale between second child partition is successively determined to the retrieval data scale between child partition once, Zhi Daochong When newly determining retrieval data scale is more than or equal to current page retrieval data scale, termination redefines retrieval data scale, will The retrieval data scale currently redefined is shown.
In above scheme, this method further include: retrieve data scale when the retrieval data scale is more than or equal to current page When, the retrieval data scale is shown.
In above scheme, this method further include: when in the initial time and the time parameter for redefining time parameter The last one child partition between retrieval data scale when, directly the retrieval data scale redefined is opened up Show.
In above scheme, this method further include: determine time parameter deadline and the time parameter near Retrieval data scale between first child partition of deadline retrieves data when the retrieval data scale is less than current page When scale, second child partition in the deadline and the time parameter of time parameter near deadline is redefined Between retrieval data scale, the retrieval data scale between once child partition is successively determined, until the retrieval redefined When data scale is more than or equal to current page retrieval data scale, termination redefines retrieval data scale, will currently redefine Retrieval data scale be shown.
In above scheme, this method further include: by the initial time of determining time parameter or deadline and it is described when Between retrieval data scale in parameter between child partition cached, when next input time parameter, directly search in caching Between child partition with the presence or absence of the corresponding initial time of the input time parameter or in deadline and input time parameter Retrieve data scale, if it does, directly from caching obtain and with current page retrieval data scale be compared, according to than Relatively result is shown or continues to redefine retrieval data scale and search in caching with the presence or absence of the retrieval redefined Data scale;Conversely, redefining retrieval data scale and being cached.
Compared with prior art, beneficial effects of the present invention:
The present invention can effectively improve the performance aimed at Network Security Device day under large-scale data, greatly reduce equipment The time cost of daily data retrieval improves the real-time and Product's Ease of Use of device analysis response, promotes product customer satisfaction.
Detailed description of the invention
Fig. 1 provides a kind of network security log system big data retrieval side based on subregion calculations of offset for present invention implementation The migration algorithm schematic diagram of method.
Specific embodiment
The following describes the present invention in detail with reference to the accompanying drawings and specific embodiments.
The embodiment of the present invention provides a kind of network security log system big data search method based on subregion calculations of offset, This method are as follows: subregion is carried out to database table according to time rule and obtains several child partitions, according to input it is to be retrieved when Between parameter determine at least one child partition for including in the time parameter, determine time parameter initial time and the time Retrieval data scale in parameter between first child partition of initial time is worked as when the retrieval data scale is less than When preceding page retrieval data scale, redefine in the initial time and the time parameter of time parameter near initial time Retrieval data scale between second child partition is successively determined to the retrieval data scale between child partition once, Zhi Daochong When newly determining retrieval data scale is more than or equal to current page retrieval data scale, termination redefines retrieval data scale, will The retrieval data scale currently redefined is shown.
It is described that several child partitions are obtained to database table progress subregion according to time rule, it can be according to hour or day Equal chronomeres carry out subregion to database table.
When the retrieval data scale is more than or equal to current page retrieval data scale, the retrieval data scale is carried out It shows.
This method further include: last height in the initial time and the time parameter for redefining time parameter When retrieval data scale between subregion, directly the retrieval data scale redefined is shown.
This method further include: determine in the deadline and the time parameter of time parameter near the of deadline Retrieval data scale between one child partition, when the retrieval data scale is less than current page retrieval data scale, again Determine the retrieval in the deadline and the time parameter of time parameter between second child partition of deadline Data scale is successively determined to the retrieval data scale between child partition once, until the retrieval data scale redefined is big When being equal to current page retrieval data scale, termination redefines retrieval data scale, the retrieval data that will currently redefine Scale is shown.
This method further include: by son point in the initial time of determining time parameter or deadline and the time parameter Retrieval data scale between area is cached, and when next input time parameter, is directly searched in caching with the presence or absence of described Data scale is retrieved between the corresponding initial time of input time parameter or child partition in deadline and input time parameter, If it does, obtaining directly from caching and being compared with current page retrieval data scale, opened up according to comparison result Show or continues to redefine retrieval data scale and search in caching with the presence or absence of the retrieval data scale redefined;Instead It, redefines retrieval data scale and is cached.
Embodiment 1:
It is illustrated so that positive sequence is retrieved as an example through the invention.
Step 1: the time range of input time parameter T1, T2 is calculated.
How many child partition between T1, T2 is calculated, passes through the subregion of database query system itself using SQL instruction Table, inquiry zone time point (by taking time rule part describes example as an example, which is the morning zero point on the same day) are greater than T1, Subregion less than or equal to T2 have those subregions (sql instruction: select P.TABLE_NAME, P.PARTITION_NAME, P.PARTITION_DESCRIPTION,TABLE_ROWS from information_schema.PARTITIONS as P where P.TABLE_NAME='log_event_http';), as shown in Figure 1, obtaining P2, P3, P4, P5, P6, and record each At the beginning of subregion and the end time, such as: the P2 subregion time started is D2, and the end time is D3 etc., and so on.
Step 2: according to page data scale S before list, in positive sequence retrieval, (i.e. using the time as sequence permutation with positive order, sql instruction is Order by gtime asc) in the case of.
Assuming that first subregion time started after T1 is Dn(n=2), steps are as follows for calculations of offset:
Step 201, the size of data Sn between T1 ~ Dn is calculated, the data inquired between T1 ~ Dn first in partition cache are big Small sn executes step 2 if it does, Sn is assigned to retrieval data scale ST.If it does not exist, then needing to carry out database Inquiry calculates (to be inquired: select count (1) from log_event_http where gtime >=T1 and using sql Gtime≤Dn), and Sn is assigned to retrieval data scale ST, i.e. ST=Sn, continue step 2.
Step 202, if ST >=S, step 203 is carried out, otherwise carries out step 204.
Step 203, retrieve current page data (such as: select * from log_event_http where gtime >=T1 and gtime≤Dn limit PS*P, P), and return to application program and be shown, it calculates and stops.
Step 204, the size of data Sn+1 between Dn ~ Dn+1 is calculated according to the current offset time, and recalculates ST, i.e., ST=ST+Sn+1.N=n+1, and execute step 202.
Circulation executes above step, and until retrieving T1, when the last one subregion or ST=S between T2 stops.
Embodiment 2:
According to page data scale S before list, in backward retrieval, (i.e. using the time as sequence permutation with positive order, sql instruction is order by Gtime desc) in the case of, it is assumed that first subregion time started before T2 is Dn(n=7), steps are as follows for calculations of offset:
Step 1, the size of data Sn between T1 ~ Dn is calculated, the size of data between T1 ~ Dn is inquired first in partition cache Sn executes step 2 if it does, Sn is assigned to retrieval data scale ST.It is looked into if it does not exist, then needing to carry out database It askes to calculate and (be inquired using sql: select count (1) from log_event_http where gtime >=T1 and Gtime≤Dn), and Sn is assigned to retrieval data scale ST, i.e. ST=Sn, continue step 2.
Step 2, if ST >=S, step 3 is carried out, otherwise carries out step 4.
Step 3, retrieve current page data (such as: select * from log_event_http where gtime >= T1 and gtime≤Dn order by gtime desc limit PS*P, P), and return to application program and be shown, It calculates and stops.
Step 4, the size of data Sn-1 between Dn ~ Dn-1 is calculated according to the current offset time, and recalculate ST, i.e. ST =ST+Sn-1.N=n-1, and execute step 2.
Circulation executes above step, and until retrieving T1, when the last one subregion or ST=S between T2 stops.
Embodiment 3:
The data scale (Sn i.e. in above step 1) of single subregion is cached (with the beginning of current cache subregion Time and end time are mark, establish cache file, and hereof by the scale Sn of time subregion record), when with weight It is multiple when carrying out same search condition to certain subregion and retrieving, first retrieve cache file.In the absence of caching, it is cached. When it is present, the record number Sn of direct return cache.Will pass through reading caching record, directly progress data judgement, return refers to Determine the data of subregion.Using the calculations of offset after caching, steps are as follows:
Step 1, according to time migration T1 ~ Dn as identifying, the data scale Sn of current offset piece is obtained from caching, and Sn is assigned to retrieval data scale ST, i.e. ST=Sn.
Step 2, if ST >=S, step 3 is carried out, otherwise carries out step 4.
Step 3, data are retrieved, and returns to and is shown using data.
Step 4, the data scale Sn+ of current offset piece is obtained as identifying from caching according to current offset Dn ~ Dn+1 1, and recalculate ST, i.e. ST=ST+Sn+1.N=n+1, and execute step 2.
Circulation executes above step, until the data scale S size data of retrieval calculating to current page, stops
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention.

Claims (4)

1. a kind of network security log system big data search method based on subregion calculations of offset, which is characterized in that this method Are as follows: subregion is carried out to database table according to time rule and obtains several child partitions, according to the time parameter to be retrieved of input It determines at least one child partition for including in the time parameter, determines in initial time and the time parameter of time parameter Retrieval data scale between first child partition of initial time, when the retrieval data scale is examined less than current page When rope data scale, redefine in the initial time and the time parameter of time parameter near second of initial time Retrieval data scale between child partition successively determines the retrieval data scale between next child partition, until again true When fixed retrieval data scale is more than or equal to current page retrieval data scale, termination redefines retrieval data scale, will be current The retrieval data scale redefined is shown;
The retrieval data are realized especially by following steps:
Step 1: the time range of input time parameter T1, T2 is calculated;
Step 2: according to page data scale S before list, in positive sequence retrieval, it is assumed that first subregion time started after T1 be Dn, n=2, steps are as follows for calculations of offset:
Step 201, the size of data Sn between T1 ~ Dn is calculated, the size of data between T1 ~ Dn is inquired first in partition cache Sn executes step 2, looks into if it does not exist, then needing to carry out database if it does, Sn is assigned to retrieval data scale ST It askes and calculates, and Sn is assigned to retrieval data scale ST, i.e. ST=Sn, continue step 2;
Step 202, if ST >=S, step 203 is carried out, otherwise carries out step 204;
Step 203, current page data is retrieved, and returns to application program and is shown, calculates and stops;
Step 204, the size of data Sn+1 between Dn ~ Dn+1 is calculated according to the current offset time, and recalculate ST, i.e. ST= ST+Sn+1, n=n+1, and execute step 202;
Circulation executes above step, and until retrieving T1, when the last one subregion or ST=S between T2 stops;
Or
According to page data scale S before list, in backward retrieval, it is assumed that first subregion time started before T2 is Dn, partially Moving calculating, steps are as follows:
Step A calculates the size of data Sn between T1 ~ Dn, and the size of data sn between T1 ~ Dn is inquired first in partition cache, If it does, Sn is assigned to retrieval data scale ST, step B is executed;If it does not exist, then needing to carry out data base querying meter It calculates, and Sn is assigned to retrieval data scale ST, i.e. ST=Sn, continue step B;
Step B carries out step C if ST >=S, otherwise carries out step D;
Step C retrieves current page data, and returns to application program and be shown, and calculates and stops;
Step D calculates the size of data Sn-1 between Dn ~ Dn-1 according to the current offset time, and recalculates ST, i.e. ST=ST+ Sn-1, n=n-1, and execute step B;
Circulation executes above step, and until retrieving T1, when the last one subregion or ST=S between T2 stops;
Or
The data scale of single subregion is cached, the same search condition of certain subregion progress is retrieved when with duplicate When, cache file is first retrieved, in the absence of caching, caches it, when it is present, the record number Sn of direct return cache;So as to By reading caching record, directly progress data judgement returns to the data of specified partition;Use the calculations of offset step after caching It is as follows:
Step 301, according to time migration T1 ~ Dn as identifying, the data scale Sn of current offset piece is obtained from caching, and will Sn is assigned to retrieval data scale ST, i.e. ST=Sn;
Step 302, if ST >=S, step 303 is carried out, otherwise carries out step 304;
Step 303, data are retrieved, and returns to and is shown using data;
Step 304, the data scale Sn+1 of current offset piece is obtained as identifying from caching according to current offset Dn ~ Dn+1, And recalculate ST, i.e. ST=ST+Sn+1, n=n+1, and execute step 302;
Circulation executes above step, after retrieval calculates and arrives the size data of the data scale S of current page, stops.
2. the network security log system big data search method according to claim 1 based on subregion calculations of offset, It is characterized in that, this method further include: when the last one in the initial time and the time parameter for redefining time parameter When retrieval data scale between child partition, directly the retrieval data scale redefined is shown.
3. the network security log system big data search method according to claim 1 based on subregion calculations of offset, It is characterized in that, this method further include: determine in the deadline and the time parameter of time parameter near deadline Retrieval data scale between first child partition, when the retrieval data scale is less than current page retrieval data scale, weight Inspection in the new deadline for determining time parameter and the time parameter between second child partition of deadline Rope data scale successively determines the retrieval data scale between next child partition, until the retrieval data rule redefined When mould is more than or equal to current page retrieval data scale, termination redefines retrieval data scale, the retrieval that will currently redefine Data scale is shown.
4. the network security log system big data search method according to claim 3 based on subregion calculations of offset, It is characterized in that, this method further include: by son in the initial time of determining time parameter or deadline and the time parameter Retrieval data scale between subregion is cached, and when next input time parameter, directly searching whether there is institute in caching State retrieval data rule between the corresponding initial time of input time parameter or child partition in deadline and input time parameter Mould carries out if it does, obtaining directly from caching and being compared with current page retrieval data scale according to comparison result It shows or continues to redefine retrieval data scale and search in caching with the presence or absence of the retrieval data scale redefined;Instead It, redefines retrieval data scale and is cached.
CN201510747856.XA 2015-11-06 2015-11-06 Network security log system big data search method based on subregion calculations of offset Active CN105302909B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510747856.XA CN105302909B (en) 2015-11-06 2015-11-06 Network security log system big data search method based on subregion calculations of offset

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510747856.XA CN105302909B (en) 2015-11-06 2015-11-06 Network security log system big data search method based on subregion calculations of offset

Publications (2)

Publication Number Publication Date
CN105302909A CN105302909A (en) 2016-02-03
CN105302909B true CN105302909B (en) 2019-03-26

Family

ID=55200178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510747856.XA Active CN105302909B (en) 2015-11-06 2015-11-06 Network security log system big data search method based on subregion calculations of offset

Country Status (1)

Country Link
CN (1) CN105302909B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766408A (en) * 2017-08-31 2018-03-06 西安交大捷普网络科技有限公司 The storage method of audit log
CN109543079B (en) * 2018-11-27 2021-02-02 北京锐安科技有限公司 Data query method and device, computing equipment and storage medium
CN110321388B (en) * 2019-02-26 2021-07-02 南威软件股份有限公司 Quick sequencing query method and system based on Greenplus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1858735A (en) * 2005-12-30 2006-11-08 华为技术有限公司 Method for processing mass data
CN101425064A (en) * 2007-10-29 2009-05-06 英业达股份有限公司 Processing method and system for testing log
CN104281684A (en) * 2014-09-30 2015-01-14 东软集团股份有限公司 Method and system for storing and querying mass logs

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014053313A1 (en) * 2012-10-04 2014-04-10 Alcatel Lucent Data logs management in a multi-client architecture

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1858735A (en) * 2005-12-30 2006-11-08 华为技术有限公司 Method for processing mass data
CN101425064A (en) * 2007-10-29 2009-05-06 英业达股份有限公司 Processing method and system for testing log
CN104281684A (en) * 2014-09-30 2015-01-14 东软集团股份有限公司 Method and system for storing and querying mass logs

Also Published As

Publication number Publication date
CN105302909A (en) 2016-02-03

Similar Documents

Publication Publication Date Title
US7836044B2 (en) Anticipated query generation and processing in a search engine
US8880512B2 (en) Method, apparatus and system, for rewriting search queries
US9189047B2 (en) Organizing databases for energy efficiency
US7941426B2 (en) Optimizing database queries
CN107783985B (en) Distributed database query method, device and management system
CN105981011B (en) Trend response management
CN106294815B (en) A kind of clustering method and device of URL
CN104778185B (en) Anomaly sxtructure query language SQL statement determines method and server
CN104090934B (en) A kind of standards service platform Distributed Parallel Computing database and its search method
US10929397B2 (en) Forecasting query access plan obsolescence
CN103176974A (en) Method and device used for optimizing access path in data base
CN105302909B (en) Network security log system big data search method based on subregion calculations of offset
US20220358178A1 (en) Data query method, electronic device, and storage medium
US20150234883A1 (en) Method and system for retrieving real-time information
US8938443B2 (en) Runtime optimization of spatiotemporal events processing
CN110347706A (en) For handling method, Database Systems and the computer readable storage medium of inquiry
CN104391923A (en) Data set query method and apparatus
CN106874332B (en) Database access method and device
KR102476620B1 (en) Cache automatic control system
CN106649489B (en) Continuous skyline query processing mechanism in geographic text information data
CN108536819B (en) Method, device, server and storage medium for comparing integer column and character string
Gan et al. Processing online aggregation on skewed data in mapreduce
CN102339292A (en) Distributed searching method and system
CN112883064B (en) Self-adaptive sampling and query method and system
CN110633430B (en) Event discovery method, apparatus, device, and computer-readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant