CN106649344A - Network log compression method and apparatus - Google Patents

Network log compression method and apparatus Download PDF

Info

Publication number
CN106649344A
CN106649344A CN201510728041.7A CN201510728041A CN106649344A CN 106649344 A CN106649344 A CN 106649344A CN 201510728041 A CN201510728041 A CN 201510728041A CN 106649344 A CN106649344 A CN 106649344A
Authority
CN
China
Prior art keywords
data set
network log
characteristic
data
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510728041.7A
Other languages
Chinese (zh)
Other versions
CN106649344B (en
Inventor
才宇东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Digital Technologies Suzhou Co Ltd
Original Assignee
Huawei Digital Technologies Suzhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Digital Technologies Suzhou Co Ltd filed Critical Huawei Digital Technologies Suzhou Co Ltd
Priority to CN201510728041.7A priority Critical patent/CN106649344B/en
Publication of CN106649344A publication Critical patent/CN106649344A/en
Application granted granted Critical
Publication of CN106649344B publication Critical patent/CN106649344B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files

Abstract

The invention discloses a network log compression method and apparatus, which is used for improving the problem of low compression rate of an existing network log compression method. The method comprises the steps of analyzing an acquired network log and determining at least one feature contained in the network log; if a business type union set of an existing first data set does not contain the first feature of the network log, determining the similarity between a feature set of the network log and a feature set of the first data set; if it is determined that the similarity between the feature set of the network log and the feature set of the first data set is greater than a set threshold, incorporating the network log into the first data set; if it is determined that the similarity between the feature set of the network log and the feature set of the first data set is not greater than the set threshold, creating a second data set, and incorporating the network log into the second data set; and performing compression and storage processing on each data set. Therefore, the number of compressed packages is effectively reduced and then the storage space is reduced.

Description

A kind of network log compression method and device
Technical field
The present invention relates to networking technology area, more particularly to a kind of network log compression method and device.
Background technology
In the epoch that current internet is extremely flourishing, network log collection has extensive use with inquiry system On the way.Various IT systems, the network equipment, safety means can all produce substantial amounts of network log, these networks Often there is larger difference in the form of daily record data, network log collection and inquiry system need to be adapted to substantial amounts of Unstructured data, to carry out business diagnosis.In the face of the unstructured data of magnanimity, typically will can gather To network log be compressed storage, so can effectively save storage resource, reduce user's purchase and deposit The cost of storage equipment.
A kind of conventional network log compression method is:First all network log unifications for collecting are deposited Storage, then carries out second-compressed storage to the network log after storage.Due to network log to be carried out unify to deposit Chu Hou, then be compressed process, is finally stored the compressed package for obtaining write disk, i.e., the process according to It is secondary including once writing, once read and once write, result in input and output (Input and Output, abbreviation IO) On waste;Generally, have differences between the feature that different network logs has, these are different Feature be referred to as hybrid characteristic.When being compressed to network log, due to there is substantial amounts of hybrid characteristic, Similarity between network log is relatively low, result in compression ratio low.
The conventional network log compression method of another kind is:By the all-network daily record for collecting, first unification is carried out Compression is processed, and is then stored the compressed package for obtaining write disk, i.e., the process includes once reading and one It is secondary to write, although reduction is once write, but due in compression, however it remains the field of substantial amounts of hybrid characteristic Data, result in the low of compression ratio.
Another conventional network log compression method is:By the network log for collecting first according to network log Type of service classified, be then compressed respectively simultaneously for the network log of every kind of different service types Storage.Although improve compression ratio compared to first two compression method, due to the service class of network log Type is more, stores after being compressed to the network log of every kind of type of service, it is still necessary to larger memory space, And compression ratio is still relatively low.
In sum, as the quantity of network log is more and more huger, using existing network log compression method Because compression ratio is relatively low, the daily record after compression needs to take larger memory space.
The content of the invention
A kind of network log compression method and device are embodiments provided, for improving existing network day The low problem of will compression method compression ratio.
In a first aspect, a kind of network log compression method, methods described includes:
Network log to collecting is parsed, and determines that the network log includes at least one is special Levy;
If the type of service of existing first data set simultaneously concentrates the fisrt feature not comprising the network log, Determine the characteristic set of the network log and the similarity of the characteristic set of first data set, wherein, The fisrt feature is the spy of the type of service for being used to represent the network log at least one feature Levy, the type of service union of first data set is the service class of the network log in first data set The union of type, the characteristic set of the network log is the set of the feature composition of the network log, described The characteristic set of the first data set is the union of the feature of the all-network daily record in first data set;
If it is determined that the phase of the characteristic set of the network log and the characteristic set of first data set It is more than given threshold like degree, by the network log merger to first data set;If it is determined that described The characteristic set of network log is not more than given threshold with the similarity of the characteristic set of first data set, The second data set is created, and by the network log merger to second data set;
Each data set is compressed and storage is processed, wherein, if the data set includes the described first number According to collection, then first data set is compressed and storage is processed;If the data set includes described first Data set and second data set, then press first data set and second data set respectively Contracting and storage are processed.
In the method for the embodiment of the present invention, existing first data set type of service and concentrate do not include institute When stating the fisrt feature of network log, according to characteristic set and first data set of the network log The similarity of characteristic set, sorts out to the network log.In the merger scheme provided due to the present invention Can be by different service types and the high network log of similarity is classified as same class, so as to effectively reduce compression The quantity of bag, and then reduce memory space.
In possible implementation, the characteristic set and first data set of the network log is determined The similarity of characteristic set, including:
The first numerical value and second value are determined, wherein, first numerical value is the feature of the network log Gather and the Characteristic Number in the common factor of the characteristic set of first data set, the second value is described Characteristic set and concentration the Characteristic Number of the characteristic set of network log and first data set;
According to first numerical value and the second value, determine the characteristic set of the network log with it is described The similarity of the characteristic set of the first data set, wherein, the characteristic set of the network log and described first The similarity of the characteristic set of data set is the ratio of first numerical value and the second value.
In possible implementation, by the network log merger to first data set after, also wrap Include:
The characteristic set of the network log is defined as with the union of the characteristic set of first data set The characteristic set of first data set.
In possible implementation, each data set is compressed and storage is processed, including:
After the number of the network log for having stored reaches the first threshold value of setting, to number each described It is compressed according to collection and storage is processed;Or
After the data volume sum of the network log for having stored reaches the second threshold value of setting, to each The data set is compressed and storage is processed;Or
When the press cycles of setting arrive, data set each described is compressed and storage is processed.
In possible implementation, each data set is compressed and storage is processed, including:
By the way of column storage, data set each described is compressed and storage is processed.Due to adopting The mode of column storage is compressed and stores, and is obtained in that higher compression ratio.
In possible implementation, after determining at least one feature that the network log is included, also wrap Include:
According to the fisrt feature of the network log, in the type of service union for determining first data set In comprising the fisrt feature when, by the network log merger to include first data set.
In possible implementation, each data set is compressed and storage process after, also include:
At least one feature that network log according to collecting in setting time section is included, forms the 3rd data Collection;
If the type of service union of the 3rd data set is the type of service union of first data set Subset, using the 3rd data set first data set is replaced, wherein, the industry of the 3rd data set Service type union is the union of the type of service of the network log in the 3rd data set;
If the data set includes first data set and second data set, and the 3rd data set Type of service union be second data set type of service union subset, using the 3rd data Collection replaces second data set.
A kind of second aspect, network log compression set, described device includes:
Feature analysis module, for parsing to the network log for collecting, determines the network log Comprising at least one feature;
First processing module, if the type of service and concentration for existing first data set is not comprising the net The fisrt feature of network daily record, determines the characteristic set of the network log and the feature set of first data set The similarity of conjunction, wherein, the fisrt feature is for representing the network day at least one feature The feature of the type of service of will, the type of service union of first data set is in first data set The union of the type of service of network log, the characteristic set of the network log is the feature of the network log The set of composition, the characteristic set of first data set is the all-network daily record in first data set Feature union;
Second processing module, for if it is determined that the characteristic set and first data of the network log The similarity of the characteristic set of collection is more than given threshold, by the network log merger to first data Collection;If it is determined that the characteristic set of the network log is similar to the characteristic set of first data set Degree is not more than given threshold, creates the second data set, and by the network log merger to second data Concentrate;
Compression module, for each data set is compressed and storage process, wherein, if the data set Including first data set, then first data set is compressed and storage is processed;If the data Collection includes first data set and second data set, then respectively to first data set and described the Two data sets are compressed and storage is processed.
In the device of the embodiment of the present invention, existing first data set type of service and concentrate do not include institute When stating the fisrt feature of network log, according to characteristic set and first data set of the network log The similarity of characteristic set, sorts out to the network log.In the merger scheme provided due to the present invention Can be by different service types and the high network log of similarity is classified as same class, so as to effectively reduce compression The quantity of bag, and then reduce memory space.
In possible implementation, the first processing module determine the characteristic set of the network log with During the similarity of the characteristic set of first data set, specifically for:
The first numerical value and second value are determined, wherein, first numerical value is the feature of the network log Gather and the Characteristic Number in the common factor of the characteristic set of first data set, the second value is described Characteristic set and concentration the Characteristic Number of the characteristic set of network log and first data set;
According to first numerical value and the second value, determine the characteristic set of the network log with it is described The similarity of the characteristic set of the first data set, wherein, the characteristic set of the network log and described first The similarity of the characteristic set of data set is the ratio of first numerical value and the second value.
In possible implementation, the Second processing module is by the network log merger to described first After data set, it is additionally operable to:
The characteristic set of the network log is defined as with the union of the characteristic set of first data set The characteristic set of first data set.
In possible implementation, the compression module data set each described is compressed and storage at During reason, specifically for:
After the number of the network log for having stored reaches the first threshold value of setting, to number each described It is compressed according to collection and storage is processed;Or
After the data volume sum of the network log for having stored reaches the second threshold value of setting, to each The data set is compressed and storage is processed;Or
When the press cycles of setting arrive, data set each described is compressed and storage is processed.
In possible implementation, the first processing module is additionally operable to:
According to the fisrt feature of the network log, in the type of service union for determining first data set In comprising the fisrt feature when, by the network log merger to include first data set.
In possible implementation, described device also includes:
Optimization module, at least one for being included according to the network log collected in setting time section is special Levy, form the 3rd data set;If the type of service union of the 3rd data set is first data set The subset of type of service union, using the 3rd data set first data set is replaced, wherein, it is described The type of service union of the 3rd data set be the type of service of the network log in the 3rd data set and Collection;If the data set includes first data set and second data set, and the 3rd data set Type of service union be second data set type of service union subset, using the 3rd data Collection replaces second data set.
The third aspect, a kind of server, including:Processor, input interface, output interface, memory and System bus;Wherein:
When server runs, processor reads the program in memory, and performs said method embodiment.
Memory is used to store the data that the processor is used when operation is performed;
Input interface is used to read in data under the control of the processor;
Output interface output data under the control of the processor.
In the server of the embodiment of the present invention, do not include in the type of service and concentration of existing first data set During the fisrt feature of the network log, according to characteristic set and first data set of the network log Characteristic set similarity, the network log is sorted out.Due to the merger scheme that the present invention is provided In can be by different service types and the high network log of similarity is classified as same class, so as to effectively reduce pressure The quantity of contracting bag, and then reduce memory space.
Description of the drawings
Fig. 1 is a kind of schematic diagram of network log compression method provided in an embodiment of the present invention;
Fig. 2 is the schematic diagram of another kind of network log compression method provided in an embodiment of the present invention;
Fig. 3 is the schematic diagram of the classification tree that the embodiment of the present invention is formed;
Fig. 4 is a kind of schematic diagram of network log compression set provided in an embodiment of the present invention;
Fig. 5 is the schematic diagram of another kind of network log compression set provided in an embodiment of the present invention;
Fig. 6 is the schematic diagram of server provided in an embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described in further detail with reference to Figure of description.It should be appreciated that herein Described embodiment is merely to illustrate and explains the present invention, is not intended to limit the present invention.
A kind of network log compression method provided in an embodiment of the present invention, as shown in figure 1, the method includes:
S11, the network log to collecting are parsed, and determine the feature that the network log is included;
Wherein, the feature of network log is for storing the field of different content, such as srcip in network log (source IP), dstip (purpose IP), srcport (source port), dspport (destination interface) etc..
If S12, the type of service of existing first data set and concentration not comprising the network log first Feature, determines the characteristic set of the network log and the similarity of the characteristic set of first data set.
In the embodiment of the present invention, the fisrt feature is for representing the network at least one feature The feature of the type of service of daily record.
Illustrate, the fisrt feature of network log is the eventType fields in the network log, use In store the network log type of service, such as intrusion prevention system (Intrusion Prevention System, IPS) type of service, LOGIN (login) type of service, distributed denial of service (Distributed Denial Of Service, DDoS) type of service etc..
In the embodiment of the present invention, the type of service union of first data set is in first data set The union of the type of service of network log.
Illustrate, it is assumed that the network log 1 in data set belongs to IPS types of service, network log 2 Belong to IPS types of service, network log 3 belongs to LOGIN types of service, and network log 4 belongs to DDoS Type of service, then the corresponding type of service union of the data set is { IPS types of service, LOGIN service class Type, DDoS types of service.
In the embodiment of the present invention, the characteristic set of the network log is the feature composition of the network log Set.
In the embodiment of the present invention, the characteristic set of first data set is all in first data set The union of the feature of network log.
Illustrate, it is assumed that two network logs are included in first data set, first network log Feature includes srcip, dstip, srcport, dspport, natsrcip, natdspip, username, describe;Second The feature of network log includes srcip, dstip, srcport, dspport, username, appname, domain;Then institute The characteristic set for stating the first data set is:
{srcip,dstip,srcport,dspport,natsrcip,natdspip,username,describe, appname,domain}。
S13A, if it is determined that the characteristic set of the characteristic set of the network log and first data set Similarity be more than given threshold, by the network log merger to first data set.
S13B, if it is determined that the characteristic set of the characteristic set of the network log and first data set Similarity be not more than given threshold, create the second data set, and by the network log merger to described the In two data sets.
S14, each data set is compressed and storage process;Wherein:If the data set includes described First data set, then be compressed to first data set and storage is processed;If the data set includes institute The first data set and second data set are stated, then respectively to first data set and second data set It is compressed and storage is processed.
In the embodiment of the present invention, in units of data set, each data set is compressed and storage is processed.
Illustrate, if the data set includes the first data set, each first data set is carried out respectively Compression and storage are processed;If the data set includes the first data set and the second data set, to the first data Collection and the second data set are compressed respectively and storage is processed.
In the embodiment of the present invention, existing first data set type of service and concentrate do not include the network During the fisrt feature of daily record, according to the feature set of the characteristic set of the network log and first data set The similarity of conjunction, sorts out, specially to the network log:If the characteristic set of the network log It is more than given threshold with the similarity of the characteristic set of first data set, by the network log merger extremely First data set;If the characteristic set of the network log and the characteristic set of first data set Similarity is not more than given threshold, creates the second data set, and by the network log merger to described second In data set.Can be by different service types and the high net of similarity in the merger scheme provided due to the present invention Network daily record is classified as same class, so as to effectively reduce the quantity of compressed package, and then reduces memory space.
In the embodiment of the present invention, as another kind of optional implementation, as shown in Fig. 2 after S11, Methods described also includes:
S15, according to the fisrt feature of the network log, determining that existing first data set is corresponding Type of service and concentrate comprising the fisrt feature when, by the network log merger to first data set In.
In the embodiment of the present invention, the characteristic set and first data set of the network log is determined in S12 Characteristic set similarity, including:
The first numerical value and second value are determined, wherein, first numerical value is the feature of the network log Gather and the Characteristic Number in the common factor of the characteristic set of first data set, the second value is described Characteristic set and concentration the Characteristic Number of the characteristic set of network log and first data set;
According to first numerical value and the second value, determine the characteristic set of the network log with it is described The similarity of the characteristic set of the first data set, wherein, the characteristic set of the network log and described first The similarity of the characteristic set of data set is the ratio of first numerical value and the second value.
In implementing, knowledge base can be pre-set, the knowledge base is the characteristic set of all-network daily record In feature according to setting ordering rule formed characteristic sequence.It is determined that the first numerical value and second value When, first the feature in the characteristic set of the network log is also formed into first according to the ordering rule of setting special Sequence is levied, and by the feature in the characteristic set of first data set also according to the ordering rule shape of setting Into second feature sequence;Again by the fisrt feature sequence and the second feature sequence respectively with it is set Knowledge base is compared, to form the first flag sequence and the second flag sequence, wherein, the first flag sequence It is identical with the length of set knowledge base with the length of the second flag sequence, and first flag sequence The bit sequence for only including 0 and 1 is with second flag sequence, wherein, first flag sequence Middle bit value be 1 bit it is corresponding be characterized as the feature that the network log is included, bit value is 0 Bit is corresponding be characterized as in the network log without feature;Bit value in second flag sequence For the 1 corresponding feature for being characterized as being included in the characteristic set of first data set of bit, bit value For 0 bit it is corresponding be characterized as in the characteristic set of first data set without feature.
Illustrate, it is assumed that by the characteristic set of network log according to the ordering rule of setting formed it is first special Levying sequence is:srcip,dstip,srcport,dspport,natsrcip,natdspip,username,describe;
The second feature sequence that the characteristic set of first data set is formed according to the ordering rule that sets as: srcip,dstip,srcport,dspport,username,appname,domain;
Set knowledge base is:srcip,dstip,srcport,dspport, natsrcip,natdspip,username,describe,appname,domain,netid,localinfo;
Then:The fisrt feature sequence is compared the first flag sequence to be formed with set knowledge base For:1,1,1,1,1,1,1,1,0,0,0,0;The second feature sequence compares to be formed with set knowledge base The second flag sequence be:1,1,1,1,0,0,1,0,1,1,0,0.Calculate above-mentioned first flag sequence and the second mark The bit number that same position is 1 in sequence is 5 (i.e. the first numerical value);Calculate above-mentioned first flag sequence and As long as it is 10 (i.e. second values) that same position has the bit number that is 1 in the second flag sequence.Calculate The characteristic set of the network log is 5/10=0.5 with the similarity of the characteristic set of first data set.
Optionally, after in S13A by the network log merger to first data set, also include:
The characteristic set of the network log is defined as with the union of the characteristic set of first data set The characteristic set of first data set.
Specifically, after by the network log merger to first data set, in addition it is also necessary to described The characteristic set of one data set is updated, will the network log characteristic set and first data The union of the characteristic set of collection is defined as the characteristic set of first data set.
In the embodiment of the present invention, the classification tree that formed after being classified using aforesaid way as shown in figure 3, Classification one, classify second-class for father node, father node represents the data set to be formed, service class 1, service class 2 Deng for child node, child node represents the network log included in data set.
In the embodiment of the present invention, each data set is compressed and storage process in S14, including following three Plant triggering:
Mode 1, event A is triggered, i.e., in the number for meeting the network log for having stored, i.e. network day The bar number of will, after reaching the first threshold value of setting, triggering compression and storage are processed, specially:
After the number of the network log for having stored reaches the first threshold value of setting, for example, second Limit value can be 1000, data set each described is compressed and storage is processed.
Mode 2, event B is triggered, i.e., reach in the data volume sum for meeting the network log for having stored After second threshold value of setting, triggering compression and storage are processed, specially:
After the data volume sum of the network log for having stored reaches the second threshold value of setting, for example, Second threshold value can be 100M bytes, data set each described is compressed and storage is processed.
Mode 3, cycle is triggered, i.e., after each press cycles for setting arrives, at triggering compression and storage Reason, specially:
When the press cycles of setting arrive, data set each described is compressed and storage is processed.
Based on any of the above-described embodiment, optionally, each data set is compressed and storage process in S14, Including:
By the way of column storage, data set each described is compressed and storage is processed.Due to adopting The mode of column storage is compressed and stores, and is obtained in that higher compression ratio.
Certainly, the embodiment of the present invention is not limited to be compressed by the way of column is stored and storage is processed, Each data set can be compressed using other modes well known in the art and storage is processed, such as line is deposited Storage mode etc..
Based on any of the above-described embodiment, optionally, in S14 data set each described is compressed and is stored After process, the corresponding compressed package of each data set is obtained, each compressed package is stored using TLV forms, Wherein, T represents signature identification (such as srcip, dstip, srcport etc.), and L represents the length of compressed package, V Represent compressed package itself.
Illustrate, TLV is triple, its full name is Type (type), Length (length) and Value (value).Wherein, T, the length of L field often fix (usually 1~4bytes), and V field lengths can Become.The representation of T, L and V can customize, and in the embodiment of the present invention, T represents signature identification (i.e. A feature in the feature of network log, which feature represent storage is), L represents stored pressure The length of contracting bag, V represents stored compressed package.
Based on any of the above-described embodiment, data set each described is compressed in S14 and after storage is processed, Also include being optimized the type of service of data set each described, specially:
At least one feature that network log according to collecting in setting time section is included, forms the 3rd data Collection;
If the type of service union of the 3rd data set is the type of service union of first data set Subset, using the 3rd data set first data set is replaced, wherein, the industry of the 3rd data set Service type union is the union of the type of service of the network log in the 3rd data set;
If the data set includes first data set and second data set, and the 3rd data set Type of service union be second data set type of service union subset, using the 3rd data Collection replaces second data set.
Illustrate, after the compression and storage that complete network log are processed, can be to current established Classification tree is optimized, specially:After the compression and storage that complete network log are processed, according to setting The feature that the network log collected in time period is included, for example, according to current time before 1 day in The feature that the network log for collecting is included forms new data set (i.e. the 3rd data set), to form optimization Classification tree;For the 3rd data set, if the type of service union of the 3rd data set is described the The subset of the type of service union of one data set, using the 3rd data set first data set is replaced; If the data set includes first data set and second data set, and the industry of the 3rd data set Service type union is the subset of the type of service union of second data set, is replaced using the 3rd data set Second data set is changed, so as to replace original classification tree using the classification tree of optimization.
Said method handling process can realize that the software program can be stored in storage medium with software program In, when the software program of storage is called, perform said method step.
Based on same inventive concept, a kind of network log compression set is additionally provided in the embodiment of the present invention, should The principle of device solve problem is similar to a kind of above-mentioned network log compression method, with said method in the device Identical part, referring specifically to the associated description in Fig. 1 and embodiment illustrated in fig. 2, here is omitted.
A kind of network log compression set provided in an embodiment of the present invention, as shown in figure 4, including:
Feature analysis module 41, for parsing to the network log for collecting, determines the network day At least one feature that will is included;
First processing module 42, if the type of service and concentration for existing first data set is not comprising described The fisrt feature of network log, determines the characteristic set of the network log and the feature of first data set The similarity of set, wherein, the fisrt feature is for representing the network at least one feature The feature of the type of service of daily record, the type of service union of first data set is in first data set Network log type of service union, the characteristic set of the network log is the spy of the network log The set of composition is levied, the characteristic set of first data set is the all-network day in first data set The union of the feature of will;
Second processing module 43, for if it is determined that the characteristic set and the described first number of the network log It is more than given threshold according to the similarity of the characteristic set of collection, by the network log merger to first data Collection;If it is determined that the characteristic set of the network log is similar to the characteristic set of first data set Degree is not more than given threshold, creates the second data set, and by the network log merger to second data Concentrate;
Compression module 44, for each data set is compressed and storage process, wherein, if the data Collection includes first data set, then first data set is compressed and storage is processed;If the number Include first data set and second data set according to collection, then respectively to first data set and described Second data set is compressed and storage is processed.
In the embodiment of the present invention, existing first data set type of service and concentrate do not include the network During the fisrt feature of daily record, according to the feature set of the characteristic set of the network log and first data set The similarity of conjunction, sorts out to the network log.Can be by the merger scheme provided due to the present invention The high network log of different service types and similarity is classified as same class, so as to effectively reduce the number of compressed package Amount, and then reduce memory space.
Optionally, first processing module 42 determines the characteristic set and first data of the network log During the similarity of the characteristic set of collection, specifically for:
The first numerical value and second value are determined, wherein, first numerical value is the feature of the network log Gather and the Characteristic Number in the common factor of the characteristic set of first data set, the second value is described Characteristic set and concentration the Characteristic Number of the characteristic set of network log and first data set;
According to first numerical value and the second value, determine the characteristic set of the network log with it is described The similarity of the characteristic set of the first data set, wherein, the characteristic set of the network log and described first The similarity of the characteristic set of data set is the ratio of first numerical value and the second value.
Based on any of the above-described embodiment, optionally, Second processing module 43 by the network log merger extremely After first data set, it is additionally operable to:
The characteristic set of the network log is defined as with the union of the characteristic set of first data set The characteristic set of first data set.
Optionally, compression module 44 specifically for:
After the number of the network log for having stored reaches the first threshold value of setting, to number each described It is compressed according to collection and storage is processed;Or
After the data volume sum of the network log for having stored reaches the second threshold value of setting, to each The data set is compressed and storage is processed;Or
When the press cycles of setting arrive, data set each described is compressed and storage is processed.
Used as another kind of optional implementation, first processing module 42 is additionally operable to:
According to the fisrt feature of the network log, in the type of service union for determining first data set In comprising the fisrt feature when, by the network log merger to include first data set.
Based on any of the above-described embodiment, optionally, as shown in figure 5, described device also includes:
Optimization module 45, at least one for being included according to the network log collected in setting time section is special Levy, form the 3rd data set;If the type of service union of the 3rd data set is first data set The subset of type of service union, using the 3rd data set first data set is replaced, wherein, it is described The type of service union of the 3rd data set be the type of service of the network log in the 3rd data set and Collection;If the data set includes first data set and second data set, and the 3rd data set Type of service union be second data set type of service union subset, using the 3rd data Collection replaces second data set.
In the embodiment of the present invention, the side of above-mentioned Fig. 1 and embodiment illustrated in fig. 2 can be realized by server Method, as shown in fig. 6, the server includes:Processor 61, input interface 62, output interface 63, deposit Reservoir 64 and system bus 65;Wherein:
Processor 61 is responsible for logical operation and process.When server runs, processor 61 reads memory Program in 64, and said method embodiment is performed, specially:Execution above-mentioned steps S11 of processor 61, S12, S13A, S13B and S14.Optionally, the processor 61 can also carry out above-mentioned steps S15.
Memory 64 includes internal memory and hard disk, can store the number that processor 61 is used when operation is performed According to (such as the first data set, the second data set, compressed package for obtaining etc. being compressed to data set).Input Interface 62 is used to read in data (such as network log) under the control of processor 61, and output interface 63 exists Output data (such as compressed package) under the control of processor 61.
Bus architecture can include the bus and bridge of any number of interconnection, specifically be represented by processor 61 The internal memory and the various circuits of hard disk that one or more processors and memory 64 are represented is linked together.Always Line architecture can also be by various other circuits of such as ancillary equipment, voltage-stablizer and management circuit or the like Link together, these be all it is known in the art, therefore, no longer it is described further herein.
Those skilled in the art are it should be appreciated that embodiments of the invention can be provided as method, system or meter Calculation machine program product.Therefore, the present invention can be using complete hardware embodiment, complete software embodiment or knot Close the form of the embodiment in terms of software and hardware.And, the present invention can be adopted and wherein wrapped at one or more Computer-usable storage medium containing computer usable program code (including but not limited to magnetic disc store, CD-ROM, optical memory etc.) on implement computer program form.
The present invention is produced with reference to method according to embodiments of the present invention, equipment (system) and computer program The flow chart and/or block diagram of product is describing.It should be understood that can by computer program instructions flowchart and / or block diagram in each flow process and/or square frame and flow chart and/or the flow process in block diagram and/ Or the combination of square frame.These computer program instructions can be provided to all-purpose computer, special-purpose computer, embedded The processor of formula processor or other programmable data processing devices is producing a machine so that by calculating The instruction of the computing device of machine or other programmable data processing devices is produced for realizing in flow chart one The device of the function of specifying in individual flow process or one square frame of multiple flow processs and/or block diagram or multiple square frames.
These computer program instructions may be alternatively stored in can guide computer or other programmable datas process to set In the standby computer-readable memory for working in a specific way so that in being stored in the computer-readable memory Instruction produce and include the manufacture of command device, command device realization is in one flow process or multiple of flow chart The function of specifying in one square frame of flow process and/or block diagram or multiple square frames.
These computer program instructions also can be loaded into computer or other programmable data processing devices, made Obtain and series of operation steps is performed on computer or other programmable devices to produce computer implemented place Reason, so as to the instruction performed on computer or other programmable devices is provided for realizing in flow chart one The step of function of specifying in flow process or one square frame of multiple flow processs and/or block diagram or multiple square frames.
, but those skilled in the art once know base although preferred embodiments of the present invention have been described This creative concept, then can make other change and modification to these embodiments.So, appended right will Ask and be intended to be construed to include preferred embodiment and fall into having altered and changing for the scope of the invention.
Obviously, those skilled in the art can carry out various changes and modification without deviating from this to the present invention Bright spirit and scope.So, if the present invention these modification and modification belong to the claims in the present invention and Within the scope of its equivalent technologies, then the present invention is also intended to comprising these changes and modification.

Claims (13)

1. a kind of network log compression method, it is characterised in that methods described includes:
Network log to collecting is parsed, and determines that the network log includes at least one is special Levy;
If the type of service of existing first data set simultaneously concentrates the fisrt feature not comprising the network log, Determine the characteristic set of the network log and the similarity of the characteristic set of first data set, wherein, The fisrt feature is the spy of the type of service for being used to represent the network log at least one feature Levy, the type of service union of first data set is the service class of the network log in first data set The union of type, the characteristic set of the network log is the set of the feature composition of the network log, described The characteristic set of the first data set is the union of the feature of the all-network daily record in first data set;
If it is determined that the phase of the characteristic set of the network log and the characteristic set of first data set It is more than given threshold like degree, by the network log merger to first data set;If it is determined that described The characteristic set of network log is not more than given threshold with the similarity of the characteristic set of first data set, The second data set is created, and by the network log merger to second data set;
Each data set is compressed and storage is processed, wherein, if the data set includes the described first number According to collection, then first data set is compressed and storage is processed;If the data set includes described first Data set and second data set, then press first data set and second data set respectively Contracting and storage are processed.
2. the method for claim 1, it is characterised in that determine the feature set of the network log The similarity with the characteristic set of first data set is closed, including:
The first numerical value and second value are determined, wherein, first numerical value is the feature of the network log Gather and the Characteristic Number in the common factor of the characteristic set of first data set, the second value is described Characteristic set and concentration the Characteristic Number of the characteristic set of network log and first data set;
According to first numerical value and the second value, determine the characteristic set of the network log with it is described The similarity of the characteristic set of the first data set, wherein, the characteristic set of the network log and described first The similarity of the characteristic set of data set is the ratio of first numerical value and the second value.
3. method as claimed in claim 1 or 2, it is characterised in that by the network log merger extremely After first data set, also include:
The characteristic set of the network log is defined as with the union of the characteristic set of first data set The characteristic set of first data set.
4. the method for claim 1, it is characterised in that each data set is compressed and is deposited Storage is processed, including:
After the number of the network log for having stored reaches the first threshold value of setting, to number each described It is compressed according to collection and storage is processed;Or
After the data volume sum of the network log for having stored reaches the second threshold value of setting, to each The data set is compressed and storage is processed;Or
When the press cycles of setting arrive, data set each described is compressed and storage is processed.
5. the method as described in claim 1 or 4, it is characterised in that each data set is compressed Process with storage, including:
By the way of column storage, data set each described is compressed and storage is processed.
6. the method for claim 1, it is characterised in that determine what the network log was included After at least one feature, also include:
According to the fisrt feature of the network log, in the type of service union for determining first data set In comprising the fisrt feature when, by the network log merger to include first data set.
7. the method as described in Claims 1 to 4,6 any one, it is characterised in that each data set is entered After row compression and storage are processed, also include:
At least one feature that network log according to collecting in setting time section is included, forms the 3rd data Collection;
If the type of service union of the 3rd data set is the type of service union of first data set Subset, using the 3rd data set first data set is replaced, wherein, the industry of the 3rd data set Service type union is the union of the type of service of the network log in the 3rd data set;
If the data set includes first data set and second data set, and the 3rd data set Type of service union be second data set type of service union subset, using the 3rd data Collection replaces second data set.
8. a kind of network log compression set, it is characterised in that described device includes:
Feature analysis module, for parsing to the network log for collecting, determines the network log Comprising at least one feature;
First processing module, if the type of service and concentration for existing first data set is not comprising the net The fisrt feature of network daily record, determines the characteristic set of the network log and the feature set of first data set The similarity of conjunction, wherein, the fisrt feature is for representing the network day at least one feature The feature of the type of service of will, the type of service union of first data set is in first data set The union of the type of service of network log, the characteristic set of the network log is the feature of the network log The set of composition, the characteristic set of first data set is the all-network daily record in first data set Feature union;
Second processing module, for if it is determined that the characteristic set and first data of the network log The similarity of the characteristic set of collection is more than given threshold, by the network log merger to first data Collection;If it is determined that the characteristic set of the network log is similar to the characteristic set of first data set Degree is not more than given threshold, creates the second data set, and by the network log merger to second data Concentrate;
Compression module, for each data set is compressed and storage process, wherein, if the data set Including first data set, then first data set is compressed and storage is processed;If the data Collection includes first data set and second data set, then respectively to first data set and described the Two data sets are compressed and storage is processed.
9. device as claimed in claim 8, it is characterised in that the first processing module specifically for:
The first numerical value and second value are determined, wherein, first numerical value is the feature of the network log Gather and the Characteristic Number in the common factor of the characteristic set of first data set, the second value is described Characteristic set and concentration the Characteristic Number of the characteristic set of network log and first data set;
According to first numerical value and the second value, determine the characteristic set of the network log with it is described The similarity of the characteristic set of the first data set, wherein, the characteristic set of the network log and described first The similarity of the characteristic set of data set is the ratio of first numerical value and the second value.
10. device as claimed in claim 8 or 9, it is characterised in that the Second processing module is by institute Network log merger is stated to first data set, is additionally operable to:
The characteristic set of the network log is defined as with the union of the characteristic set of first data set The characteristic set of first data set.
11. devices as claimed in claim 8, it is characterised in that the compression module specifically for:
After the number of the network log for having stored reaches the first threshold value of setting, to number each described It is compressed according to collection and storage is processed;Or
After the data volume sum of the network log for having stored reaches the second threshold value of setting, to each The data set is compressed and storage is processed;Or
When the press cycles of setting arrive, data set each described is compressed and storage is processed.
12. devices as claimed in claim 8, it is characterised in that the first processing module is additionally operable to:
According to the fisrt feature of the network log, in the type of service union for determining first data set In comprising the fisrt feature when, by the network log merger to include first data set.
13. devices as described in claim 8~11,12 any one, it is characterised in that described device is also Including:
Optimization module, at least one for being included according to the network log collected in setting time section is special Levy, form the 3rd data set;If the type of service union of the 3rd data set is first data set The subset of type of service union, using the 3rd data set first data set is replaced, wherein, it is described The type of service union of the 3rd data set be the type of service of the network log in the 3rd data set and Collection;If the data set includes first data set and second data set, and the 3rd data set Type of service union be second data set type of service union subset, using the 3rd data Collection replaces second data set.
CN201510728041.7A 2015-10-31 2015-10-31 Weblog compression method and device Active CN106649344B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510728041.7A CN106649344B (en) 2015-10-31 2015-10-31 Weblog compression method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510728041.7A CN106649344B (en) 2015-10-31 2015-10-31 Weblog compression method and device

Publications (2)

Publication Number Publication Date
CN106649344A true CN106649344A (en) 2017-05-10
CN106649344B CN106649344B (en) 2020-01-10

Family

ID=58809347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510728041.7A Active CN106649344B (en) 2015-10-31 2015-10-31 Weblog compression method and device

Country Status (1)

Country Link
CN (1) CN106649344B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897500A (en) * 2018-05-23 2018-11-27 联想图像(天津)科技有限公司 Data transmission method, device and electronic equipment
CN109189763A (en) * 2018-09-17 2019-01-11 北京锐安科技有限公司 A kind of date storage method, device, server and storage medium
CN112559618A (en) * 2020-12-23 2021-03-26 光大兴陇信托有限责任公司 External data integration method based on financial wind control service
CN113535654A (en) * 2021-06-11 2021-10-22 安徽安恒数智信息技术有限公司 Log processing method, system, electronic device and storage medium
CN113553589A (en) * 2021-07-30 2021-10-26 江苏易安联网络技术有限公司 Extraction method, device and application of malicious software propagation characteristics

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1842021A (en) * 2005-03-28 2006-10-04 华为技术有限公司 Log information storage method
WO2012031269A1 (en) * 2010-09-03 2012-03-08 Loglogic, Inc. Random access data compression
CN102541863A (en) * 2010-12-14 2012-07-04 联芯科技有限公司 Webpage compression method applied to mobile terminal
CN102609491A (en) * 2012-01-20 2012-07-25 东华大学 Column-storage oriented area-level data compression method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1842021A (en) * 2005-03-28 2006-10-04 华为技术有限公司 Log information storage method
WO2012031269A1 (en) * 2010-09-03 2012-03-08 Loglogic, Inc. Random access data compression
CN102541863A (en) * 2010-12-14 2012-07-04 联芯科技有限公司 Webpage compression method applied to mobile terminal
CN102609491A (en) * 2012-01-20 2012-07-25 东华大学 Column-storage oriented area-level data compression method

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897500A (en) * 2018-05-23 2018-11-27 联想图像(天津)科技有限公司 Data transmission method, device and electronic equipment
CN108897500B (en) * 2018-05-23 2021-10-26 联想图像(天津)科技有限公司 Data transmission method and device and electronic equipment
CN109189763A (en) * 2018-09-17 2019-01-11 北京锐安科技有限公司 A kind of date storage method, device, server and storage medium
CN112559618A (en) * 2020-12-23 2021-03-26 光大兴陇信托有限责任公司 External data integration method based on financial wind control service
CN112559618B (en) * 2020-12-23 2023-07-11 光大兴陇信托有限责任公司 External data integration method based on financial wind control business
CN113535654A (en) * 2021-06-11 2021-10-22 安徽安恒数智信息技术有限公司 Log processing method, system, electronic device and storage medium
CN113535654B (en) * 2021-06-11 2023-10-31 安徽安恒数智信息技术有限公司 Log processing method, system, electronic device and storage medium
CN113553589A (en) * 2021-07-30 2021-10-26 江苏易安联网络技术有限公司 Extraction method, device and application of malicious software propagation characteristics
CN113553589B (en) * 2021-07-30 2022-09-02 江苏易安联网络技术有限公司 Extraction method, device and application of malicious software propagation characteristics

Also Published As

Publication number Publication date
CN106649344B (en) 2020-01-10

Similar Documents

Publication Publication Date Title
CN106649344A (en) Network log compression method and apparatus
US11487772B2 (en) Multi-party data joint query method, device, server and storage medium
US20160253366A1 (en) Analyzing a parallel data stream using a sliding frequent pattern tree
CN103067218B (en) A kind of express network packet content analytical equipment
US20100095374A1 (en) Graph based bot-user detection
US9992269B1 (en) Distributed complex event processing
CN108282497A (en) For the ddos attack detection method of SDN control planes
CN110287688A (en) Associated account number analysis method, device and computer readable storage medium
JP2017507572A5 (en)
US20100238820A1 (en) System analysis method, system analysis apparatus, and computer readable storage medium storing system analysis program
CN106909454B (en) Rule processing method and equipment
CN112347501A (en) Data processing method, device, equipment and storage medium
CN107133231A (en) A kind of data capture method and device
US8875158B2 (en) Method for request profiling in service systems with kernel events
CN106557483B (en) Data processing method, data query method, data processing equipment and data query equipment
CN103179116B (en) A kind of 10,000,000,000 protocol analysis methods and system
Marchetto et al. Optimizing the trade-off between complexity and conformance in process reduction
US20090055420A1 (en) Method, system, and computer program product for identifying common factors associated with network activity with reduced resource utilization
CN106375351B (en) A kind of method and device of abnormal domain name detection
CN108243058A (en) A kind of method and apparatus based on alarm positioning failure
CA3022435A1 (en) Adaptive event aggregation
CN108667893A (en) Data recommendation method, device and electronic equipment
CN106649315A (en) Method and device for processing path navigation
CN106304122A (en) A kind of business datum analyzes method and system
CN106326470A (en) Streaming big data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant