CN1866821A - Network monitoring data compression storing and combination detecting method based on similar data set - Google Patents

Network monitoring data compression storing and combination detecting method based on similar data set Download PDF

Info

Publication number
CN1866821A
CN1866821A CN 200610031764 CN200610031764A CN1866821A CN 1866821 A CN1866821 A CN 1866821A CN 200610031764 CN200610031764 CN 200610031764 CN 200610031764 A CN200610031764 A CN 200610031764A CN 1866821 A CN1866821 A CN 1866821A
Authority
CN
China
Prior art keywords
data
stamp
data set
compression
record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200610031764
Other languages
Chinese (zh)
Other versions
CN100555935C (en
Inventor
朱培栋
宁洪
邓文平
蔡开裕
赵建强
周丽涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CNB2006100317642A priority Critical patent/CN100555935C/en
Publication of CN1866821A publication Critical patent/CN1866821A/en
Application granted granted Critical
Publication of CN100555935C publication Critical patent/CN100555935C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a network monitoring data compaction storing and joint checking method of similar data set, which comprises the following steps: adopting relational model to build model for data; using relation data base to store and manage data; using compress method of state mark sequence to compress and store several similar datas under the same relation frame; using information sentence to recover original data set; inquiring compressed databook; realizing severl types of association detecting of initial data set. The invention compresses several data sets incrementally in the same table of the data bank.

Description

Network monitoring data compression storage and associated detecting method based on similar data set
Technical field
The present invention relates to the compression storage and the associated detecting method of data, especially have the compression storage and the associated detecting method of the mass network Monitoring Data of similar data set.
Background technology
Recent years, along with the development of information technology, the particularly development of Intemet technology, the amount of information of countries in the world all is explosive increase trend, is higher than 10 12The high-volume database of byte has become common database.For example, the artificial satellite of U.S. NASA emission will return 10 every year earthward 15The observation data of byte.The high-energy physics experiment data of U.S. Lao Lunsi National Laboratory are up to annual 3 * 10 14Byte.
Lot of data will be stored and handle to the network security monitoring equally.The data that the network security monitoring is preserved can be the information of types such as routing table, route message and IP message.BGP (Border Gateway Protocol) is the most important Routing Protocol of backbone network, bgp routing table is monitored can find the unusual and routing attack behavior of route, and moving for the safety assurance of route system and basic network and health provides early warning.The number that core network BGP transmits is from 130 000 about 245 000 of rising in May, 2006 in 2002, owing to more than one to the route of an objective network, the number of bgp routing table reaches 674000 (being numbered the data of 1221 autonomous system according to Telstra company) especially, and the size of bgp routing table still presents the trend of exponential increase.If some route system is carried out long term monitoring, and to the historical information statistics that performs an analysis, the routing table that constantly collects will make the data quantity stored linear expansion, becomes very huge.
Route message and IP message etc. is generally caught in real time and detected in real time, but for potential network security threats is carried out more deep data mining, often need the data of different location or different time collection are preserved, analysis of uniting and detection, the message data that needs like this to store can be very many.With regard to BGP route message, every day, the routing update information of a router had hundreds of Mbytes, often will analyze the routing attack behavior that could find that some are hidden to the routing update of a plurality of a plurality of routers of time.For the IP datagram literary composition,,, need the data volume that is used for safety detection of preservation still very huge even carry out preliminary treatment and filter operation because the bandwidth of backbone network reaches more than the 10Gbps.
For the ease of tissue and processing, network monitoring data can adopt database storage.The data model that database is commonly used mainly contains the model of relational model, hierarchical model, object oriented programming model and logic-based etc.Relational database adopts the organizational form of relational model as data.Relational database because of its strict mathematical theory, use characteristics such as simple and flexible, data independence are strong, be acknowledged as the most promising a kind of data base management system.Its development is very rapid, has become the data base management system of dominate at present.
This explosive growth of amount of information has brought challenge for current database management technology.From the angle of hardware, at present the main method of carrying out storage and management for this mass data is three grades of storages and parallel storage.The hardware spending of three grades of storage meanss is bigger, mainly is to obtain bigger memory space by expanding hardware device, has also increased the processing time of inquiry when adding large storage capacity greatly, has reduced the efficient of database.The parallel database technology also is to obtain processing at a high speed by increasing hardware spending, but the growth rate of hardware handles ability does not catch up with the speed of information explosion far away.Therefore, current is that the data that lane database need be stored are compressed to the more cost effective a kind of storage means of high-volume database, so people have proposed data compression method.Data compression can improve the storage efficiency of mass data, also can improve database performance.
The basis of data compression is that data itself exist redundancy, and method is the memory module according to data, carries out the compression of particular form for the data of AD HOC.
From compression effectiveness, compression method is divided into lossy compression method and lossless compress; From the object of compression, compression method mainly is divided into conventional data compression and multi-medium data compression.The multi-medium data compression method is mainly used in the compression transmission of video and audio signal; The conventional data compression method comprises based on the compression method of statistical model with based on the compression method of dictionary model; The increment compression is to utilize two content differences between the file to carry out encoding compression; In the storage of database, when visit, all to expend very big time overhead to data compression applications usually for data decompression.
But existing these compression methods all do not make full use of some inherent characteristics of network monitoring data.Network monitoring data is the special data set of a class, has a large amount of identical records between the different data sets, and these identical recordings make and have great data redundancy between each data set.Data set with this specific character is called similar data set.Routing table with inter-domain routing protocol BGP is an example, and the routing state of certain router is monitored, and regularly gathers and preserve this router at difference routing state constantly.The relative stability of route makes the change of routing state of same router meet the temporal locality principle, has a large amount of identical route table items in the routing table that two adjacent time gathers, thereby also exists the lot of data redundancy between each tables of data.Mostly relevant compression and storage method is the polymerization or realize the compression of file inside on bigger time granularity of individual data file is not proposed effective storage scheme at the similitude between a plurality of data acquisition system data files.
Network monitoring data can be gathered from different places or the different collections constantly in same place, except the individual data collection is detected, often need a plurality of data sets are carried out joint-detection, for example find the difference of different acquisition point data item, know the stable case of certain data item, by mutual with reference to finding need confirming by other collection point unusually of a more hidden network security problem or a collection point discovery to the data of different acquisition point in a plurality of collection points.Mostly existing method is the individual data collection is analyzed, and finds characteristic sequence wherein, then other data sets is mated; Perhaps a plurality of raw data sets are analyzed one by one, again aggregation process.When data set was many, the efficient of carrying out joint-detection like this was lower.
Summary of the invention
The technical problem to be solved in the present invention is a plurality of data sets that have more same data item at classes such as network monitoring datas, a kind of relation data base compression and storage method based on table is proposed, a plurality of similar data set on the same relationship frame is compressed in the same table of database, and realizes the joint-detection between a plurality of raw data sets based on the tables of data of compression.
Technical scheme is: adopt relational model to the data modeling, utilize relational database that data are stored and managed, adopt the compression method based on the status indication sequence that a plurality of similar data set under the same relation framework is compressed storage; Recover original data set with using simple query statement quick nondestructive; By directly the tables of data after the compression being inquired about, just can realize polytype joint-detection to a plurality of raw data sets.
If R<X 1, X 2..., X nBe the relationship frame of a n unit, wherein X iI the attribute of sign R.Make r and r ' be two physical records on the framework R, wherein r=<x 1, x 2..., x n, r '=<y 1, y 2..., y n, r '=r and if only if for  i ∈ 1,2 ..., n} has x i=y i
Definition 1. is if set A and set B are two physical relationships on the same relationship frame R, then A and the B similar data set on the R each other.
Definition 2.A is two similar data sets of non-NULL with B, the element number #B=k that B comprised, and the element number # that A ∩ B is comprised (A ∩ B)=k ' makes β B → A=k '/k claims β B → ARedundancy for set B pair set A.
Know β by definition 2 B → A∈ [0,1], when A ∩ B=φ, β B → A=0; When A ∩ B=B, β B → A=1, at this moment, the relative A of the data of B is a fully redundance.β B → ABig more, B is big more to the redundancy of A in expression.
For K similar data set, according to the attribute tectonic relationship data model of data set, defining relation framework R<X 1, X 2..., X n, X wherein iI the attribute of sign R.The individual similar data set of K has originally become K physical relationship S on the relationship frame R 1, S 2..., S K, be respectively each data set and assign a status indicator STAMP[1]=" 1 ", STAMP[2]=" 2 " ..., STAMP[K]=" K ".Be that every record increases a status indication sequence field stamp, be used for mark this be recorded in which data centralization and exist, write down the active state of this record with this, with S 1, S 2..., S KBe compressed to one by one in the same table of database.The information of utilizing stamp to comprise can nondestructively recover original data set.Owing to just a plurality of data sets are mapped to same compressed data set by extended field stamp, make those a plurality of different pieces of informations concentrate have repeat record only concentrate appearance once at packed data, thereby realized the compression between a plurality of data sets; Because the content of wall scroll record does not change, thereby do not need decompress(ion) during visit wall scroll record, when visit individual data collection, only need to extract yet according to the stamp of every record; Simultaneously, can also carry out joint-detection based on each raw data set of packed data set pair, and finish more complicated joint-detection function on this basis: whether contain STAMP[i simultaneously among the status indication sequence stamp according to record] with STAMP[j] judge whether this data item consistent in data set Si and Sj, find out and do not contain STAMP[i] or STAMP[j] record obtain the difference of data set Si and Sj, judge the stability of this data item according to the status indicator that whether comprises all data set correspondences among the stamp, find out all data item that the record that do not comprise all status indicators obtains changing in overall process, or the like.
1. a plurality of similar data sets are compressed the method for storage
Make R 1=<X 1, X 2..., X n, R 2=<stamp 〉, stamp is a character string, R=R 1* R 2S 1, S 2..., S KRelationship frame R each other 1On similar data set.S 0Be the physical relationship on the relationship frame R, as S 1, S 2..., S KThe compression collection, be used for data set after the recording compressed, its initial value is an empty set.The bonding state flag sequence is successively with S 1, S 2..., S KCompression deposits database in one by one, and the compression storing process is:
Input: relationship frame R 1On K similar data set S 1, S 2..., S K, the status indication sign STAMP[1 of data set]=" 1 ", STAMP[2]=" 2 " ..., STAMP[K]=" K ".
Output: the compressed data set S of carrier state flag sequence 0, S 0It is the physical relationship on the relationship frame R.
1) S 0When initial is an empty set;
2) i=1,2 ..., K repeats following steps:
3) j=1,2 ..., #S i(data set S iElement number), repeat following steps:
4) from data set S iIn appoint and to get an element r, and with r from S iRemove;
5) if there is r ' ∈ S 0Make that its each property value is all identical with r, i.e. r ' .x 1=r.x 1, r ' .x 2=r.x 2..., r ' .x n=r.x n, then revise the status indication sequence of r ' correspondence: r ' .stamp=r ' .stamp+STAMP[i];
6) otherwise, construct a new stamp, stamp=STAMP[i],<r, stamp add S to as a new record 0In.
2. to the restoration methods of packed data
Compressed database is used simple query statement, from database, select in all stamp property values comprise STAMP[i] record, just can recover all initial data set S i(1≤i≤K), need not extra decompress(ion) expense, thereby realized harmless decompress(ion) to packed data.
3. to the method for a plurality of similar data set joint-detection
Can realize joint-detection function based on compressed data set to each raw data set.Utilize the historic state information that writes down among the status indication sequence stamp, can realize effectively that to a plurality of similar data sets (establish compressed data set is S to following four kinds of basic joint-detection 0):
1) judges that certain data item is at two data set S iAnd S jIn whether consistent method be: in compressed database, adopt keyword query to find the record that satisfies certain attributive character, if comprise STAMP[i among its stamp simultaneously] and STAMP[j], then this data item is consistent in two data sets; Otherwise this data item is inconsistent in two data sets.
2) obtain two data set S iAnd S jThe method of data item difference is: with S 0In all stamp only comprise STAMP[i] or STAMP[j] record take out, be designated as S ', every the record that then comprises among the S ' is all only at S iOr S jIn occur separately, obviously, S ' is exactly S iAnd S jData item difference.
3) judge that certain data item r ' in the method for the stability of overall process is: r ' .stamp is taken out, if r ' .stamp has comprised the status indicator of all data sets, illustrate that r ' occurs in each data set, then r ' is stable in overall process; If r ' .stamp has comprised the status indicator of partitioned data set (PDS), illustrate that r ' only occurs in partitioned data set (PDS), then r ' is unsettled in overall process.
4) method that draws the data item that overall process changes is: if one to be recorded in overall process all be stable, then status indication sequence that this record is corresponding must comprise the status indicator of all data set correspondences.With S 0In all do not comprise the status indicator of all data sets among all stamp record take out, be designated as S ', S ' is exactly all data item that change in overall process.
More than four kinds of basic detection methods, can be by the relatively similitude and the stability of data set be analyzed between the data set, can carry out stability analysis to the wall scroll record according to the status indication sequence again.For example, above method is applied in the compressed database of routing table, can carries out Conjoint Analysis and comparison the variation of routing table, the stability of routing table, the stability of wall scroll route etc.
Based on these four kinds of basic detection methods, can realize complicated more joint-detection function.For example, find more hidden network security problem by the mutual reference to the different pieces of information collection, perhaps data concentrate find unusual further to confirm by other data set, and the accuracy so that the raising network security detects reduces rate of failing to report and rate of false alarm.
Adopt the present invention can reach following beneficial effect:
The present invention has made full use of the similitude between the mass data collection, a plurality of data sets is compressed to incrementally in the same table of database, can be applied to the compression storage of polytype network monitoring data, comprises routing table, route message, IP message etc.; Also can be used for storing the network history data of other types, perhaps have the identical similar data set of more data item.Adopt the present invention can obtain following effect:
1) compression storage: come mark whether this is recorded in each data set and exists by increase a new status indication field for every record, realized a plurality of data sets are compressed to the individual data collection.Owing to have bigger redundancy between each data set of network monitoring data, thereby can realize very high compression efficiency;
2) joint-detection: after adopting the present invention that a plurality of data sets are compressed, utilize the status indication field that a plurality of data sets are mapped to a data set, strengthened the degree of coupling between each data set,, then be convenient to time domain analysis if a plurality of data set picks up from different time points; If pick up from different places, then be convenient to the spatial domain and analyze; If gather based on other scale, then be convenient to analyze at this scale space.Employing the present invention is based on the associated detecting method of compressed data set, does not only need directly to preserve each raw data set, and can realize polytype joint-detection with higher efficient.
3) efficient inquiry: adopt the present invention can realize the efficient inquiry of multiple granularity.Because the present invention does not compress writing down itself, thereby does not need extra decompression procedure in inquiry, the search efficiency of database does not have to reduce substantially.
Description of drawings
Fig. 1 is multidata collection compression warehouse-in of the present invention and handling process thereof;
Fig. 2 is K original similar data set S 1, S 2..., S K
Fig. 3 is the database table that obtains after adopting compression method of the present invention to Fig. 2 data set increment warehouse-in;
Fig. 4 is that certain router is at three different preceding four records of routing table constantly;
Fig. 5 adopts compression method of the present invention to the database table after Fig. 4 data compression;
Fig. 6 is numbered 10 of 1221 the autonomous systems number of route table items and the cumulative record number before and after the compression constantly;
Fig. 7 is the comparison diagram that is numbered before and after the compression of 1221 autonomous system route table items record count.
Embodiment
Fig. 1 is multidata collection compression warehouse-in of the present invention and handling process schematic diagram thereof.Gather initial data from different places or from same place constantly in difference, deposit different data files in; Adopt relational model to carry out data modeling to original data file, obtain a plurality of similar data set S with some same data item 1, S 2..., S KAdopt compression method increment type warehouse-in of the present invention, obtain compressed data set S 0Based on S 0Can recover similar data set S by simple query manipulation i(1≤i≤K), can efficiently realize the query manipulation of a plurality of granularities can realize joint-detection efficiently to the initial data set.
Fig. 2 is general example, wherein a S of K similar data set i(1≤i≤K) corresponding classify the record that this data set comprises as.Fig. 3 uses the present invention a plurality of similar data set shown in Figure 2 to be compressed the result who obtains after the storage.Record identical in each raw data set is only stored once, thereby saved the space.
The present invention now has been applied among the inter-domain routing safety monitoring system ISP-HEALTH of National University of Defense Technology's research and development.Compression with a large amount of route datas of route monitoring system is stored as example below, further specifies the present invention.
1. raw data acquisition
With the actual route Monitoring Data of obtaining is example, and www.routeviews.org/ obtains experimental data from http://.The RouteViews project is a project at U.S. Oregon university high-level network center, its main target is to obtain the route system view of global Internet from the angle of a plurality of different autonomous systems, and it is regularly gathered the BGP route data of global Internet and data are regularly issued.What use below is to be numbered 1221 the autonomous system bgp routing table in one's duty continuous 10 time points of in January, 2006.
2. data modeling
Adopt relational data model to the data modeling, for the K that collects an original similar data set, according to the attribute tectonic relationship data model of data set, defining relation framework R<X 1, X 2..., X n.The individual similar data set of K has originally become K physical relationship S on the relationship frame R 1, S 2..., S KMay comprise hundreds thousand of route table items in the bgp routing table of an autonomous system, article one, the bgp routing table item comprises purpose network address network, next a plurality of attribute such as jumping next_hop, AS path as_path, extract the list structure of several determinant attribute tectonic relationship databases: concern R=<Network, Next_hop, AS_path 〉.For convenience of explanation, take out router as an example, as Fig. 4 at three different part route table items constantly.At this moment raw data set is three, four records of each data set.Wherein t1 comprises data item r1, r2, r3 and r4 constantly, and t2 comprises data item r1, r2, r3 and r5 constantly, and t3 comprises data item r1, r3, r4 and r6 constantly.
3. compression storing process
Adopt compression and storage method of the present invention successively the different routing tables constantly that are numbered 1221 autonomous system to be compressed storage.The status indication sequential recording of every route record correspondence this route each constantly have a situation.For example, Fig. 5 is the database table that obtains after the compression of Fig. 4 data set, has only 6 records in the database, and the stamp attribute of every record shows this data item occurs in which raw data set.
4. data are recovered
Adopt the present invention can intactly recover arbitrary data set according to the status indication sequence.For example in Fig. 5, the 1st, 2,3,5 the corresponding stamp of record comprised " 2 ", can obtain this router thus the t2 route complete or collected works that routing table comprised constantly, is the 1st, 2,3,5 record among Fig. 5.
5. inquiry
After data are compressed storage, can inquire about a plurality of granularities such as relational database dependency value, wall scroll record, certain data set and all records, concrete grammar is:
1) according to property value " x " to X iInquire about: from database, select all X iProperty value is the record of " x ";
2) to the inquiry of wall scroll record r ': from database, select the record that is complementary fully with r ' attribute;
3) to certain data set S i(inquiry of 1≤i≤K): from database, select in all stamp property values to comprise STAMP[i] record;
4) to the inquiry of all records: under the situation of given any querying condition not, can obtain records whole in the database.
For the example of routing table, what specifically will inquire about is exactly route entry, the complete information of searching certain bar route, the routing table that recovers certain time that satisfies attribute conditions, all route entries of knowing appearance in the monitoring time.
Owing to just realize compression between a plurality of data sets by extended field, the wall scroll record is not through overcompression, so the efficient of the four class data base queryings of more than enumerating does not significantly reduce after database compresses, the accuracy of inquiry and direct inquiry raw data set do not have difference.
6. joint-detection
6.1 associated detecting method
Utilize the status indication field that a plurality of data sets are mapped to a data set, strengthened the degree of coupling between each data set, made things convenient for the joint-detection between the data set, alternate analysis and processing, used compressed data set than directly using raw data set more to help the analysis of time domain, spatial domain or other scales.Specify below with reference to Fig. 5.
(1) judges that certain data item is at two data set S iAnd S jIn whether consistent: for example, whether the route that determine network 1.0.0.0 changes at t1, t2 constantly, need not be directly to raw data set S 1And S 2Analyze, and can from compressed database, obtain fast.Concentrating corresponding data item to the route of network 1.0.0.0 at packed data is r1, and r1.stamp comprises S simultaneously 1And S 2Status indicator, so this does not change at t1 and t2 to route of network 1.0.0.0 constantly.
(2) obtain two data set S iAnd S iData item difference: for example, obtain t1 and the t2 difference of route constantly, can inquire about compressed data set based on stamp.Only comprise route that " 1 " do not comprise " 2 " among the stamp and be<6.1.0.0,202.12.6.2,4,538 9,407 668, only comprise route that " 2 " do not comprise " 1 " and be<6.1.0.0,202.12.6.4,4,538 9407668.Can conclude thus t2 constantly relatively t1 have only constantly to the route of network 6.1.0.0 and variation taken place, just its next-hop attribute of variation.
(3) judge the stability of certain data item r ' in overall process: for example, arrive the data item r1 of the route correspondence of network 1.0.0.0, r1.stamp has comprised the status indicator of all data sets, so r1 is stable in overall process; And to the data item r2 of the route correspondence of network 3.0.0.0, because r2.stamp only comprises data set S 1And S 2Status indicator, do not comprise data set S 3Status indicator, so r2 is unstable in overall process.
(4) the status indication sequence that draws the data item that overall process changes: r2, r4, r5 and r6 does not comprise the status indicator of all data set correspondences, so overall process changes is route to 3.0.0.0,6.1.0.0 and 6.2.0.0.
6.2 combined detection performance
Stability analysis with data item is an example, if adopt conventional method, search in a plurality of data centralizations, gathers lookup result then.And adopt associated detecting method of the present invention, as long as once to the compressed data set inquiry.Not only reduce inquiry times, and removed from and gathered operation.
7. compression ratio is calculated
To all data set S 1, S 2..., S KAfter the compression, the record number in the database is n ′ = # ( U i = 1 K S i ) , And the record sum before the compression n = Σ i = 1 K # S i , Compression ratio is β=n '/n.At the example of Fig. 4 and Fig. 5, the list item number before the compression is 12, and the list item number after the compression is 6, and compression ratio is 6/12=50% so.
If #S 1=#S 2=...=#S K=N, and to  S i, S j∈ { S 1, S 2..., S KHave
β S i → ∪ m = 1 K S m = β S j → ∪ m = 1 K S m = β
Record number n=KN before then compressing, the record number n after the compression '=N+N (1-β) is (K-1).When K was enough big, compression ratio was: μ = n ′ n ≈ 1 - β .
The route data situation of 10 time points gathering from be numbered 1221 autonomous system as shown in Figure 6, each routing table has constantly all comprised and has surpassed 170,000 route record, the relative stability of route makes the change of routing state of same router meet the temporal locality principle, there is a large amount of identical route table items in the routing table of two adjacent moment correspondences, redundancy β between each routing table 〉=0.99, the result of calculation of compression ratio is:
Figure A20061003176400135
List item number after the compression only is 1/10 before the compression, and along with the increase of K, compression ratio can be more and more littler, and finally near 1-β.Before and after its compression to such as Fig. 7.
The present invention not only can be used in the monitoring of BGP route, also can use safety monitoring and intrusion detection based on route message and IP message.Can either compress storage and joint-detection in the data of a plurality of time points to same data collection point, also can handle the similar data set of different acquisition point.Not only can be used for the network monitor field, also can be used for the storage or the data analysis field of the magnanimity historical information of other types.

Claims (3)

1. the network monitoring data compression based on similar data set is stored and associated detecting method, it is characterized in that adopting relational model to the data modeling, utilize relational database that data are stored and managed, employing is compressed storage based on the compression method of status indication sequence to a plurality of similar data set under the same relation framework, use query statement to recover original data set, by the tables of data after the compression is inquired about the polytype joint-detection of realization to a plurality of raw data sets; Process based on the compression method of status indication sequence is: make R 1=<X 1, X 2..., X n, R 2=<stamp 〉, stamp is a character string, R=R 1* R 2S 1, S 2..., S KRelationship frame R each other 1On similar data set; S 0Be the physical relationship on the relationship frame R, as S 1, S 2..., S KThe compression collection, be used for data set after the recording compressed, its initial value is an empty set; The bonding state flag sequence is successively with S 1, S 2..., S KCompression deposits database in one by one, and the compression storing process is:
Input: relationship frame R 1On K similar data set S 1, S 2..., S K, the status indication sign STAMP[1 of data set]=" 1 ", STAMP[2]=" 2 " ..., STAMP[K]=" K ";
Output: the compressed data set S of carrier state flag sequence 0, S 0It is the physical relationship on the relationship frame R;
1) S 0When initial is an empty set;
2) i=1,2 ..., K repeats following steps:
3) j=1,2 ..., #S i(data set S iElement number), repeat following steps:
4) from data set S iIn appoint and to get an element r, and with r from S iRemove;
5) if there is r ' ∈ S 0Make that its each property value is all identical with r, i.e. r ' .x 1=r.x 1, r ' .x 2=r.x 2..., r ' .x n=r.x n, then revise the status indication sequence of r ' correspondence: r ' .stamp=r ' .stamp+STAMP[i];
6) otherwise, construct a new stamp, stamp=STAMP[i],<r, stamp add S to as a new record 0In.
2. network monitoring data compression storage and associated detecting method based on similar data set as claimed in claim 1, it is characterized in that the method for using query statement to recover raw data set is: use query statement, from database, select in all stamp property values and comprise STAMP[i] record, just recover all initial data set S i(1≤i≤K).
3. network monitoring data compression storage and associated detecting method based on similar data set as claimed in claim 1 is characterized in that the method for a plurality of similar data set joint-detection being:
3.1 judge that certain data item is at two data set S iAnd S jIn whether consistent method be: in compressed database, adopt keyword query to find the record that satisfies certain attributive character, if comprise STAMP[i among its stamp simultaneously] and STAMP[j], then this data item is consistent in two data sets; Otherwise this data item is inconsistent in two data sets;
3.2 obtain two data set S iAnd S jThe method of data item difference is: with S 0In all stamp only comprise STAMP[i] or STAMP[j] record take out, be designated as S ', every the record that then comprises among the S ' is all only at S iOr S jIn occur separately, S ' is exactly S iAnd S jData item difference;
3.3 judge certain data item r ' in the method for the stability of overall process be: r ' .stamp is taken out, if r ' .stamp has comprised the status indicator of all data sets, illustrate that r ' occurs in each data set, then r ' is stable in overall process; If r ' .stamp has comprised the status indicator of partitioned data set (PDS), illustrate that r ' only occurs in partitioned data set (PDS), then r ' is unsettled in overall process;
3.4 draw the method for the data item that overall process changes be: if one to be recorded in overall process all be stable, then status indication sequence that this record is corresponding must comprise the status indicator of all data set correspondences, with S 0In all do not comprise the status indicator of all data sets among all stamp record take out, be designated as S ', S ' is exactly all data item that change in overall process.
CNB2006100317642A 2006-06-05 2006-06-05 Network monitoring data compression storage and associated detecting method based on similar data set Expired - Fee Related CN100555935C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100317642A CN100555935C (en) 2006-06-05 2006-06-05 Network monitoring data compression storage and associated detecting method based on similar data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100317642A CN100555935C (en) 2006-06-05 2006-06-05 Network monitoring data compression storage and associated detecting method based on similar data set

Publications (2)

Publication Number Publication Date
CN1866821A true CN1866821A (en) 2006-11-22
CN100555935C CN100555935C (en) 2009-10-28

Family

ID=37425713

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100317642A Expired - Fee Related CN100555935C (en) 2006-06-05 2006-06-05 Network monitoring data compression storage and associated detecting method based on similar data set

Country Status (1)

Country Link
CN (1) CN100555935C (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023978A (en) * 2009-09-15 2011-04-20 腾讯科技(深圳)有限公司 Mass data processing method and system
CN103973634A (en) * 2013-01-24 2014-08-06 腾讯科技(深圳)有限公司 Application data construction method, related equipment and network system
CN105578192A (en) * 2015-12-16 2016-05-11 国网浙江省电力公司湖州供电公司 Power visual metamodel agglomeration compression method
CN106681720A (en) * 2016-12-23 2017-05-17 光锐恒宇(北京)科技有限公司 Compression method and device and decompression method and device for installation packages
CN106708522A (en) * 2016-12-23 2017-05-24 光锐恒宇(北京)科技有限公司 Batch distribution method and device for installation packages
CN106844479A (en) * 2016-12-23 2017-06-13 光锐恒宇(北京)科技有限公司 The compression of file, decompressing method and device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023978A (en) * 2009-09-15 2011-04-20 腾讯科技(深圳)有限公司 Mass data processing method and system
CN102023978B (en) * 2009-09-15 2015-04-15 腾讯科技(深圳)有限公司 Mass data processing method and system
CN103973634A (en) * 2013-01-24 2014-08-06 腾讯科技(深圳)有限公司 Application data construction method, related equipment and network system
CN103973634B (en) * 2013-01-24 2015-03-18 腾讯科技(深圳)有限公司 Application data construction method, related equipment and network system
CN105578192A (en) * 2015-12-16 2016-05-11 国网浙江省电力公司湖州供电公司 Power visual metamodel agglomeration compression method
CN106681720A (en) * 2016-12-23 2017-05-17 光锐恒宇(北京)科技有限公司 Compression method and device and decompression method and device for installation packages
CN106708522A (en) * 2016-12-23 2017-05-24 光锐恒宇(北京)科技有限公司 Batch distribution method and device for installation packages
CN106844479A (en) * 2016-12-23 2017-06-13 光锐恒宇(北京)科技有限公司 The compression of file, decompressing method and device
CN106844479B (en) * 2016-12-23 2020-07-07 光锐恒宇(北京)科技有限公司 Method and device for compressing and decompressing file

Also Published As

Publication number Publication date
CN100555935C (en) 2009-10-28

Similar Documents

Publication Publication Date Title
US20210152444A1 (en) Aggregation of select network traffic statistics
Li et al. Online mining (recently) maximal frequent itemsets over data streams
Ivkin et al. Qpipe: Quantiles sketch fully in the data plane
CN1866821A (en) Network monitoring data compression storing and combination detecting method based on similar data set
CN102915373B (en) A kind of date storage method and device
CN110291518A (en) Merge tree garbage index
CN113612749A (en) Intrusion behavior-oriented tracing data clustering method and device
CN105335300B (en) A kind of date storage method and device
Rashid et al. Dependable large scale behavioral patterns mining from sensor data using Hadoop platform
Rashid et al. Mining associated patterns from wireless sensor networks
Xie et al. Compressing provenance graphs
CN102420831B (en) Multi-domain network packet classification method
RU2753189C2 (en) System for preparing network traffic for quick analysis
CN113162818A (en) Method and system for realizing distributed flow acquisition and analysis
Hernández et al. Compressed representation of web and social networks via dense subgraphs
Rashid et al. Mining associated sensor patterns for data stream of wireless sensor networks
Buddhika et al. Living on the edge: Data transmission, storage, and analytics in continuous sensing environments
CN105718521A (en) Wavelet Tree based network data packet indexing system
Khan et al. Set-based unified approach for attributed graph summarization
CN105207793B (en) A kind of acquisition methods and system of tree topology interior joint information
CN110019152A (en) A kind of big data cleaning method
JP2005209115A (en) Log summarization device, log summarization program and recording medium
Cuzzocrea et al. Exploiting compression and approximation paradigms for effective and efficient online analytical processing over sensor network readings in data grid environments
Djordjevic et al. Detecting regular visit patterns
Tanbeer et al. An efficient single-pass algorithm for mining association rules from wireless sensor networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20091028

Termination date: 20120605