CN114625929A - Method and device for sampling and collecting message - Google Patents

Method and device for sampling and collecting message Download PDF

Info

Publication number
CN114625929A
CN114625929A CN202210267837.7A CN202210267837A CN114625929A CN 114625929 A CN114625929 A CN 114625929A CN 202210267837 A CN202210267837 A CN 202210267837A CN 114625929 A CN114625929 A CN 114625929A
Authority
CN
China
Prior art keywords
strategy
acquisition
source port
hash table
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210267837.7A
Other languages
Chinese (zh)
Other versions
CN114625929B (en
Inventor
邵慧丽
李亚辉
肖成民
王虹
杨晓娟
万焱
李向通
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING LEADSEC TECHNOLOGY CO LTD
Beijing Venustech Cybervision Co ltd
Original Assignee
BEIJING LEADSEC TECHNOLOGY CO LTD
Beijing Venustech Cybervision Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING LEADSEC TECHNOLOGY CO LTD, Beijing Venustech Cybervision Co ltd filed Critical BEIJING LEADSEC TECHNOLOGY CO LTD
Priority to CN202210267837.7A priority Critical patent/CN114625929B/en
Publication of CN114625929A publication Critical patent/CN114625929A/en
Application granted granted Critical
Publication of CN114625929B publication Critical patent/CN114625929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and a device for sampling and collecting messages are provided, the method comprises the following steps: reading an acquisition strategy file, creating a time hash table according to acquisition time information corresponding to each acquisition strategy in the acquisition strategy file, and creating a source port hash table according to source port information corresponding to each acquisition strategy in the acquisition strategy file; matching the message to be acquired with the time hash table and the source port hash table, and selecting an acquisition strategy from the acquisition strategy file according to a matching result and putting the acquisition strategy into a pre-screening strategy set; and if the message to be collected is matched with the confirmation function corresponding to any one of the pre-screening strategy sets, collecting the message to be collected.

Description

Method and device for sampling and collecting message
Technical Field
The present application relates to the field of computer networks, and in particular, to a method and an apparatus for sampling and collecting packets.
Background
The method for sampling and collecting the message mainly exists in the field of computer networks. In some technologies, there are two methods for sampling and collecting packets used in the field of computer networks, one is a direct matching collection method, the time and quintuple for capturing a packet are matched with a first collection strategy, the packet is collected after successful matching, and the next collection strategy is continuously matched until all collection strategies are completely matched if unsuccessful matching. However, this method has several drawbacks:
(1) the matching efficiency is low. In a network environment, the amount of messages in a short time is very large. Each message needs to be matched with the time and quintuple information of all the acquisition strategies, and the time complexity is high.
(2) And (4) losing the packet. In the background of large messages in a short time, the low matching efficiency can cause the accumulation of subsequent messages and the phenomenon of packet loss.
Another method is to use a cascaded hash table, which coarsely screens the acquisition policy, reduces the range, and then performs matching, and the method of cascading hash tables occupies a large memory space, for example: only the source port and the destination port are cascaded to build a hash table, and 2 is needed32I.e. 4G memory. Therefore, although the method can accelerate the matching efficiency, the memory space is large.
Disclosure of Invention
The application provides a method for sampling and collecting messages, which realizes the message collecting method with low memory occupation, fast pre-screening and fast and accurate matching.
The application provides a method for sampling and collecting messages, which comprises the following steps:
reading an acquisition strategy file, creating a time hash table according to acquisition time information corresponding to each acquisition strategy in the acquisition strategy file, and creating a source port hash table according to source port information corresponding to each acquisition strategy in the acquisition strategy file;
matching the message to be acquired with the time hash table and the source port hash table, and selecting an acquisition strategy from the acquisition strategy file according to a matching result and putting the acquisition strategy into a pre-screening strategy set;
and if the message to be collected is matched with the confirmation function corresponding to any one of the pre-screening strategy set, collecting the message to be collected.
In an exemplary embodiment, the creating a time hash table according to the collection time information corresponding to each collection policy in the collection policy file includes:
establishing a time hash number table with a preset length;
analyzing the acquisition strategy file into a strategy array, traversing the strategy array, and calculating a time hash value of the acquisition time range of each acquisition strategy through a time hash function;
and creating a time hash table according to the time hash value of each acquisition strategy.
In an exemplary embodiment, the creating a temporal hash table according to the temporal hash value of each acquisition policy includes:
putting the strategy array subscript into a time hash table, wherein the strategy array subscript represents an acquisition strategy stored in the corresponding position of the analyzed strategy array;
and if the strategy array subscript conflicts with the subscript in the time hash table, linking the strategy array subscript to the time hash table conflict chain.
In an exemplary embodiment, the creating a source port hash table according to the source port information corresponding to each collection policy in the collection policy file includes:
establishing a source port hash number table with preset length;
analyzing the acquisition strategy file into a strategy array, traversing the strategy array, and calculating the hash value of the source port of the acquisition port range of each acquisition strategy through a source port hash function;
and creating a source port hash number table according to the source port hash value of each acquisition strategy.
In an example embodiment, the creating a source port hash table according to the source port hash value of each acquisition policy includes:
putting the strategy array subscript into a source port hash table, wherein the strategy array subscript represents an acquisition strategy stored in the corresponding position of the analyzed strategy array;
and if the strategy array subscript conflicts with the subscript in the source port hash table, linking the strategy array subscript to the source port hash table conflict chain.
In an exemplary embodiment, the matching the packet to be collected with the time hash table and the source port hash table, and selecting a collection policy from the collection policy file according to a matching result and placing the collection policy into a pre-screening policy set includes:
calculating a time hash value of a message to be acquired, inquiring the time hash table, and determining a first acquisition strategy set of the time period;
calculating a source port hash value of a message to be acquired, inquiring a source port hash table, and determining a second acquisition strategy set of the source port range;
and selecting the intersection of the first collection strategy set and the second collection strategy set as a pre-screening strategy set.
In an exemplary embodiment, when an intersection of the first collection policy set and the second collection policy set is empty, no collection operation is performed.
In an exemplary embodiment, the confirmation function is generated according to each acquisition policy in the acquisition policy file, and the confirmation function names correspond to the acquisition policies one to one; the confirmation function name is generated uniquely by the collection strategy id, and the confirmation function body is generated by other collection conditions of the collection strategy.
The application also provides a device for sampling and collecting messages, which comprises: a memory and a processor; the memory is used for storing a program for collecting messages, and the processor is used for reading and executing the program for collecting messages and executing the method for sampling and collecting messages in any one of the above embodiments.
The present application also provides a storage medium having stored therein a program for collecting messages, the program being arranged to perform the method of sampling collected messages as described in any one of the above embodiments when run.
Compared with the related art, the application provides a method for sampling and collecting messages, which comprises the following steps: reading an acquisition strategy file, creating a time hash table according to acquisition time information corresponding to each acquisition strategy in the acquisition strategy file, and creating a source port hash table according to source port information corresponding to each acquisition strategy in the acquisition strategy file; matching the message to be acquired with the time hash table and the source port hash table, and selecting an acquisition strategy from the acquisition strategy file according to a matching result and putting the acquisition strategy into a pre-screening strategy set; and if the message to be collected is matched with the confirmation function corresponding to any one of the pre-screening strategy set, collecting the message to be collected. In the embodiment of the application, the acquired message is inquired by the hash table, so that the memory can be saved, and the quick pre-screening can be realized, so that the strategy matching range is reduced. And when the final matching is carried out, the execution speed of the hard code of the confirmation function is higher than that of variable indirect addressing, and the strategy matching efficiency is improved. The embodiment of the application realizes a sampling message collecting method with less memory occupation, rapid pre-screening and rapid and accurate matching.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. Other advantages of the present application may be realized and attained by the instrumentalities and combinations particularly pointed out in the specification and the drawings.
Drawings
The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.
Fig. 1 is a flowchart of a method for sampling a packet according to an embodiment of the present application;
FIG. 2 is a flow diagram of a method for sampling acquisition messages according to an acquisition policy in an exemplary embodiment;
FIG. 3 is an illustration of establishing a temporal hash representation according to an acquisition policy in an exemplary embodiment;
FIG. 4 illustrates establishment of a source port hash representation intent based on an acquisition policy in an exemplary embodiment;
fig. 5 is a schematic diagram of a device for sampling and collecting a packet according to an embodiment of the present application;
fig. 6 is a flow diagram of an exemplary method for sampling acquisition messages.
Detailed Description
The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.
The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Further, various modifications and changes may be made within the scope of the appended claims.
Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other sequences of steps are possible as will be appreciated by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.
The embodiment of the present disclosure provides a method for sampling and collecting a packet, as shown in fig. 1, the method includes steps S100 to S120, which are specifically as follows:
s100, reading an acquisition strategy file, creating a time hash table according to acquisition time information corresponding to each acquisition strategy in the acquisition strategy file, and creating a source port hash table according to source port information corresponding to each acquisition strategy in the acquisition strategy file;
s110, matching the message to be acquired with the time hash table and the source port hash table, and selecting an acquisition strategy from the acquisition strategy file according to a matching result and putting the acquisition strategy into a pre-screening strategy set;
and S120, if the message to be collected is matched with a confirmation function corresponding to any one of the pre-screening strategy set, collecting the message to be collected.
In this embodiment, the collection policy file includes a plurality of collection policies; each acquisition policy may be specified by a user and added to the acquisition policy profile. For example: the acquisition policy file format and the meaning of each field thereof are as follows:
id=0
src_ip=192.168.39.2-192.168.39.254
dst_ip=192.168.35.25-192.168.35.36
src_port=6000-8000
dst_port=80-8888
protocol=TCP
starttime=00:00:10
endtime=00:00:30
wherein, the id is the number of the acquisition strategy, and each acquisition strategy corresponds to a unique id; src _ ip, dst _ ip, src _ port, dst _ port are quadruple information, and this field may be a range value; protocol is a protocol name; the starttime and endtime identify the time range of acquisition.
In an exemplary embodiment, the creating a time hash table according to the collection time information corresponding to each collection policy in the collection policy file includes: establishing a time hash number table with a preset length; analyzing the acquisition strategy file into strategy arrays, traversing the strategy arrays, and calculating a time hash value of the acquisition time range of each acquisition strategy through a time hash function; and creating a time hash table according to the time hash value of each acquisition strategy.
In an exemplary embodiment, the creating a temporal hash table according to the temporal hash value of each acquisition policy includes: putting the strategy array subscript into a time hash table, wherein the strategy array subscript represents an acquisition strategy stored in the corresponding position of the analyzed strategy array; and if the strategy array subscript conflicts with the subscript in the time hash table, linking the strategy array subscript to the time hash table conflict chain.
In step S100, the process of creating the time hash table may be: and creating a time hash table according to the acquisition time of the acquisition strategy, wherein a specific algorithm is to establish a hash array with the length of 86400(24 × 60) in units of seconds and in a range of days, and initializing the hash array to be-1, namely, without the acquisition strategy. Assuming that the collection time is (x1: x2: x3), where x1 is, x2 is, and x3 is, second, the time hash value calculation formula is (y ═ x1 × 60+ x2 × 60+ x 3). Traversing the strategy arrays in sequence, calculating a hash value of the acquisition time range of each strategy through a time hash function, putting the strategy subscript into a hash table according to the hash value, if the strategy subscript conflicts, recording the strategy subscript by adopting a conflict chain, and finishing the creation of the time hash table.
In an exemplary embodiment, the creating a source port hash table according to the source port information corresponding to each collection policy in the collection policy file includes: establishing a source port hash number table with preset length; analyzing the acquisition strategy file into strategy arrays, traversing the strategy arrays, and calculating a source port hash value of an acquisition port range of each acquisition strategy through a source port hash function; and creating a source port hash number table according to the source port hash value of each acquisition strategy.
In an example embodiment, the creating a source port hash table according to the source port hash value of each acquisition policy includes: putting the strategy array subscript into a source port hash table, wherein the strategy array subscript represents an acquisition strategy stored in the corresponding position of the analyzed strategy array; and if the policy array subscript conflicts with the subscript in the source port hash table, linking the policy array subscript to the source port hash table conflict chain.
In step S100, the process of creating the source port hash table may be: the specific algorithm is to establish a hash table with the length of 65536 by using the port number range (0-65535), and initialize the hash table to-1, namely, no acquisition strategy. If x is the source port number, the source port hash value calculation formula is (y ═ x). And traversing the strategy arrays in sequence, calculating a hash value of the source port range of each strategy through a port hash function, putting the strategy subscript into a source port hash table according to the hash value, and if the strategy subscript conflicts, recording the strategy subscript by adopting a conflict chain.
In step S100, creating a source port hash table according to the source port hash value of each acquisition policy and creating a time hash table according to the time hash value of each acquisition policy may be performed in parallel, that is, creating the source port hash table and the time hash table may be performed in no-order.
In step S110, the packet to be collected is matched with the time hash table and the source port hash table, and a collection policy is selected from the collection policy file according to a matching result and placed into a pre-screening policy set. That is, the matching is determined for each collected packet, and the generated hash table may be used to determine a plurality of packets. If 00:00: the message with quadruple (192.168.39.2, 8000, 192.168.35.25, 9000) and TCP protocol is captured by the time hash table of 25, the hash value of the time is calculated according to the hash function of the time hash table to obtain 25, and the strategy with the subscript of the time hash table of 25 is inquired.
In an exemplary embodiment, a packet to be collected is matched with the time hash table and the source port hash table, and a collection policy is selected from the collection policy file according to a matching result and is put into a pre-screening policy set, where the implementation process is as follows: calculating a time hash value of a message to be acquired, inquiring the time hash table, and determining a first acquisition strategy set of the time period; calculating a source port hash value of a message to be acquired, inquiring a source port hash table, and determining a second acquisition strategy set of the source port range; and selecting the intersection of the first collection strategy set and the second collection strategy set as a pre-screening strategy set. And when the intersection of the first acquisition strategy set and the second acquisition strategy set is empty, not performing acquisition operation. For example: the specific acquisition algorithm of the time pre-screening set a is to calculate the time hash value of the acquisition time of the packet according to a time hash value calculation formula, then query a time hash table, and acquire an acquisition policy set of the time period, which is marked as the time pre-screening set a. The specific acquisition algorithm of the source port pre-screening set B is to calculate the hash value of the source port of the packet according to the hash value calculation formula of the source port, then query the hash table of the source port, acquire the collection policy set of the source port, and record as the source port pre-screening set B. And taking an intersection of the time pre-screening set A and the source port pre-screening set B to obtain a pre-screening strategy set C. When the hash table is created, the collection strategy arrays are traversed in sequence, so that the subscripts in the established collection subscript conflict chain are ordered, that is, the time pre-screening set a and the source port pre-screening set B are two ordered arrays. In this embodiment, the intersection is solved by using a double-pointer method, that is, the idea of merging and sorting two ways is used, two pointers are respectively marked in two sets, the sizes of the two pointers are compared and then the two pointers slide, and finally the intersection is obtained and recorded as a pre-screening policy set C. And traversing each strategy in the pre-screening strategy set C, calling a confirmation function of the strategy, acquiring the message if the matching is successful, and not acquiring if the matching is not successful after the acquisition strategy set is traversed.
And S120, if the message to be collected is matched with a confirmation function corresponding to any one of the pre-screening strategy set, collecting the message to be collected.
In an exemplary embodiment, the confirmation function is generated according to each acquisition policy in the acquisition policy file, and the confirmation function names correspond to the acquisition policies one to one. The process of generating the confirmation function is as follows: firstly reading an acquisition strategy file, then generating a function name according to the strategy id, generating a confirmation function source code for the strategy according to the acquisition condition of the strategy, and finally generating a code for recording the confirmation function address.
An example of
The embodiment provides a method for sampling and collecting messages according to a collection strategy, firstly, a collection condition of the collection strategy is established into a source code (for example, a C program source code, but the application is not limited to the C program source code), and the source code comprises two parts, namely a validation function and an address function of a confirmation function. And establishing a hash table for the acquisition time and the source port of the acquisition strategy respectively to realize the rapid screening of the strategies, and finally, accurately matching the screened acquisition strategy according to a confirmation function to confirm whether the message is acquired or not. As shown in fig. 2, the whole implementation process includes the following operations:
step 1, generating an acquisition strategy source code confirmation function according to an acquisition strategy file.
In step 1, if the content of the collection policy file set by the user is as follows:
id=1
src_ip=192.168.39.2-192.168.39.254
dst_ip=192.168.35.25-192.168.35.36
src_port=6000-8000
dst_port=8999-9200
protocol=TCP
starttime=00:00:10
endtime=00:00:30
id=2
src_ip=192.168.39.2-192.168.39.254
dst_ip=192.168.35.15-192.168.35.66
src_port=1-7777
dst_port=80-7777
protocol=TCP
starttime=00:00:20
endtime=23:59:59
and generating a confirmation function for each strategy by using the script according to the acquired strategy file, generating a matching function name according to the strategy id, and compiling into a target program. Take the collection policy with policy id 1 in step 1 as an example. Generating a confirmation function for each strategy by using the script according to the content of the collected strategy file, wherein the specific process comprises the following steps: the script reads the content of the strategy file, firstly analyzes and collects the id of the strategy according to the id name and the equal number and checks, then analyzes the minimum source IP and the maximum source IP according to the equal number and the minus number and checks each field, analyzes the values of the other fields such as the minimum target IP, the maximum target IP and the minimum source port according to the same method, and finally fills the blank of the analyzed value into the programming code file according to the comparison logic according to the programming language logic specification. And generating a matching function name according to the strategy id, recording the address of the matching function, and compiling into a target program. Taking the collection policy with policy id 1 in step 1 as an example, the generated pseudo code of the C language source code is as follows: pool capture _1(sip, dip, sport, dport, proto, time)
{
if(sip in range(192.168.39.2-192.168.39.254)
&&dip in range(192.168.35.25-192.168.35.36)
&&(sport>=6000&&sport<=8000)
&&(dport>=8999&&dport<=9200)
&&strcmp(proto,“TCP”)==0
&&time in range(00:00:10-00:00:30))
{
return true;
}
return false;
}
And 2, establishing a hash table according to the acquisition strategy file.
In step 2, the hash table includes two parts, namely a time hash table and a port hash table, and the creation process is as follows:
firstly, the acquisition strategy file is analyzed into a structure object array, the structure of the acquisition strategy with id 1 is placed at the position with the array subscript of 0, the structure of the acquisition strategy with id 2 is placed at the position with the array subscript of 1, and the subscript of the strategy array marks the acquisition strategy. And subsequently, establishing a hash table by using the array subscript.
Then, a time hash table is built according to the strategy time of the strategy array, and a hash array with the length of 86400(24 × 60) is built in units of each second and in a range of days. Performing hash operation on the time range of each collection strategy, namely the strategy collection time with id of 1 is 00:00:10-00:00:30, the strategy collection time with id of 2 is 00:00:20-23:59:59, calculating corresponding hash function values, namely the strategy collection time with id of 1 is 10(0 x 60+10) to 30(0 x 60+30) according to the hash function of the time hash table, the strategy collection time with id of 2 is 20(0 x 60+20) to 86399(23 x 60+59), firstly, the content of the array with hash table index of (10-30) is assigned with 0, then, the content of the array with hash table index of (20-86399) is assigned with 1, wherein the index of the hash table index of (20-30) has 0, and a conflict occurs, the index 1 is linked to the chain of time hash table collisions as shown in fig. 3.
Finally, a source port hash table is established according to the source port of the policy array, and a hash array with the length of 65536 is established first. And then carrying out hash operation on the source port range of each acquisition strategy, for example, calculating a hash value according to a hash function of the source port hash table in a port range (6000-. First, the content of the array with the hash table subscript of (6000-. The source port hash table and the time hash table can be established simultaneously and concurrently or separately, and the two have no specified context.
And 3, inquiring the time hash table to obtain a time pre-screening set A.
In step 3, if 00:00: the method comprises the steps of capturing a message with quadruplet (192.168.39.2, 8000, 192.168.35.25, 9000) and TCP protocol, carrying out hash value calculation on time according to a hash function of a time hash table to obtain 25, inquiring a strategy with a subscript of 25 in the time hash table, obtaining a strategy with a subscript of (0, 1) which needs to be acquired at the moment, namely, a time pre-screening set A is (0, 1), and the A is not null.
And 4, inquiring the hash table of the source port to obtain a pre-screening set B of the source port.
In the step 4, the captured packet is the same as the step 3, the hash value 8000 is calculated for the packet source port according to the hash function of the source port hash table, the content of the position of the source port hash table array 8000 is checked, and the policy lower label 0 of the source port 8000 is acquired, that is, the source port pre-screening set B is (0), and B is not empty.
And 5, taking an intersection of the time pre-screening set A and the source port pre-screening set B to obtain a pre-screening strategy set C, and if the C is an empty set, not collecting.
In the step 5, the time pre-filtering set a (0, 1) obtained in the step 3 and the source port pre-filtering set B (0) obtained in the step 4 are intersected to obtain a pre-filtering policy set C (0), where C is not empty.
And 6, traversing each strategy in the pre-screening strategy set C, calling a confirmation function of the strategy, and checking whether the message needs to be collected.
In the step 6, the pre-screening policy set C (0) is traversed, the policy id of the policy and the confirmation function of the policy are obtained according to the subscript 0, matching is performed through the confirmation function of the policy, the confirmation function code is as shown in the step 1, and the message is acquired by matching that each acquisition condition is met and the matching is successful.
The embodiment of the application provides a method for sampling and collecting messages according to collection strategies, which is characterized in that collection conditions are compiled into source codes and Hash table primary screening collection strategies to realize quick strategy matching, and confirmation function codes are generated for each collection strategy configured by a user. And then, respectively establishing a hash table according to the strategy acquisition time and the source port. And finally, in the matching process during operation, firstly carrying out primary screening on the strategies through a hash table, and then carrying out confirmation function matching on the strategy set subjected to primary screening. The independent acquisition time and source port hash tables are adopted, the memory occupation is 148.3KB, and the memory occupation is far lower than that of the cascade hash table. The confirmation function is generated by hard coding during running, and the execution speed is higher than that of variable indirect addressing. Therefore, the method and the system for sampling and collecting the message according to the collection strategy provided by the embodiment of the application have the advantages of low memory occupation, quick pre-screening and quick and accurate matching, and overcome the defects of the traditional method for sampling and collecting the message.
The present application further provides a device for sampling and collecting packets, as shown in fig. 5, the device includes: a memory and a processor; the memory is used for storing a program for collecting messages, and the processor is used for reading and executing the program for collecting messages and executing the method for sampling and collecting messages in any one of the embodiments.
The present application also provides a storage medium having stored therein a program for sampling an acquisition message, the program being arranged to perform the method of sampling an acquisition message as described in any one of the above embodiments when executed.
An example of
The embodiment provides a process for sorting sampled and collected messages according to a collection policy, as shown in fig. 6:
step 600 captures traffic, i.e., obtains an acquisition policy file.
Step 601, calculating a corresponding time hash value and a corresponding port hash value according to the time information and the port information in the collection policy file.
Step 602, querying a time hash table according to the time hash value; when the time hash value matches the time hash table, go to step 603; if the set is empty, no collection is performed, and the process jumps to step 612.
Step 603, obtaining a time pre-screening set A; execution continues at step 604.
Step 604, querying a port hash table according to the port hash value; when the port hash value is matched with the port hash table, jumping to step 605; if the set is empty, no collection is performed, and the process jumps to step 612.
Step 605, obtaining a port pre-screening set B; execution continues with step 606.
Step 606 is to take the intersection of the port pre-screening set a and the source port pre-screening set B to obtain a pre-screening policy set C.
Step 607 judges whether C is empty; when the set is empty, jumping to step 612; step 608 is performed when C is not an empty set.
Step 608 judges whether the length of the pre-screening strategy set C is greater than i; the initial value of i is 0; if yes, go to step 609; otherwise, go to step 612.
Step 609 finds a validation function based on the policy index.
Step 610 confirms function matching; if the matching is successful, go to step 611; if the match fails, i is equal to i +1, and step 608 is executed again.
Step 611 collects the message.
Step 612 does not collect messages.
In this embodiment, the collected files and the hash table initially-screened collection strategy are quickly matched, the strategies are initially screened through the hash table in the matching process, and then the initially-screened strategy set is subjected to confirmation function matching. The independent acquisition time and source port hash tables are adopted, the memory occupation is 148.3KB, and the memory occupation is far lower than that of the cascade hash table. Therefore, the method and the system for sampling and collecting the message according to the collection strategy have the advantages of low memory occupation, quick pre-screening and quick and accurate matching, and overcome the defects of the traditional method for sampling and collecting the message.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims (10)

1. A method for sampling and collecting messages is characterized by comprising the following steps:
reading an acquisition strategy file, creating a time hash table according to acquisition time information corresponding to each acquisition strategy in the acquisition strategy file, and creating a source port hash table according to source port information corresponding to each acquisition strategy in the acquisition strategy file;
matching the message to be acquired with the time hash table and the source port hash table, and selecting an acquisition strategy from the acquisition strategy file according to a matching result and putting the acquisition strategy into a pre-screening strategy set;
and if the message to be collected is matched with the confirmation function corresponding to any one of the pre-screening strategy set, collecting the message to be collected.
2. The method for sampling packet collection according to claim 1, wherein the creating a time hash table according to the collection time information corresponding to each collection policy in the collection policy file comprises:
establishing a time hash number table with a preset length;
analyzing the acquisition strategy file into strategy arrays, traversing the strategy arrays, and calculating a time hash value of the acquisition time range of each acquisition strategy through a time hash function;
and creating a time hash table according to the time hash value of each acquisition strategy.
3. The method of sampling acquisition packets according to claim 2, wherein the creating a temporal hash table based on the temporal hash value of each acquisition policy comprises:
putting the strategy array subscript into a time hash table, wherein the strategy array subscript represents the collection strategy stored in the corresponding position of the analyzed strategy array;
and if the strategy array subscript conflicts with the subscript in the time hash table, linking the strategy array subscript to the time hash table conflict chain.
4. The method of claim 1, wherein the creating a source port hash table according to the source port information corresponding to each collection policy in the collection policy file comprises:
establishing a source port hash number table with preset length;
analyzing the acquisition strategy file into a strategy array, traversing the strategy array, and calculating the hash value of the source port of the acquisition source port range of each acquisition strategy through a source port hash function;
and creating a source port hash number table according to the source port hash value of each acquisition strategy.
5. The method of sampling acquisition packets as claimed in claim 4, wherein said creating a source port hash table based on the source port hash value of each acquisition policy comprises:
putting the subscript of the strategy array into a source port hash table, wherein the subscript of the strategy array represents the collection strategy stored in the corresponding position of the analyzed strategy array;
and if the strategy array subscript conflicts with the subscript in the source port hash table, linking the strategy array subscript to the source port hash table conflict chain.
6. The method according to any of claims 2-5, wherein the matching the packet to be collected with the time hash table and the source port hash table, and selecting a collection policy from the collection policy file according to the matching result and placing the collection policy into a pre-screening policy set comprises:
calculating a time hash value of a message to be acquired, inquiring the time hash table, and determining a first acquisition strategy set of the time period;
calculating a source port hash value of a message to be acquired, inquiring a source port hash table, and determining a second acquisition strategy set of the source port range;
and selecting the intersection of the first collection strategy set and the second collection strategy set as a pre-screening strategy set.
7. The method of sampling messages in accordance with claim 6, the method further comprising:
and when the intersection of the first acquisition strategy set and the second acquisition strategy set is empty, not performing acquisition operation.
8. The method of sampling messages in accordance with claim 1,
the confirmation function is generated according to each acquisition strategy in the acquisition strategy file, and the confirmation function names correspond to the acquisition strategies one to one; the confirmation function name is generated uniquely by the collection strategy id, and the confirmation function body is generated by other collection conditions of the collection strategy.
9. An apparatus for sampling packets, the apparatus comprising: a memory and a processor; the memory is used for storing a program for collecting messages, and the processor is used for reading and executing the program for collecting messages and executing the method for sampling message collection according to any one of claims 1-8.
10. A storage medium, in which a program for collecting messages is stored, which program is arranged to carry out the sample collection message method of any one of claims 1 to 8 when executed.
CN202210267837.7A 2022-03-17 2022-03-17 Method and device for sampling message Active CN114625929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210267837.7A CN114625929B (en) 2022-03-17 2022-03-17 Method and device for sampling message

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210267837.7A CN114625929B (en) 2022-03-17 2022-03-17 Method and device for sampling message

Publications (2)

Publication Number Publication Date
CN114625929A true CN114625929A (en) 2022-06-14
CN114625929B CN114625929B (en) 2024-08-13

Family

ID=81902142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210267837.7A Active CN114625929B (en) 2022-03-17 2022-03-17 Method and device for sampling message

Country Status (1)

Country Link
CN (1) CN114625929B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100070613A (en) * 2008-12-18 2010-06-28 삼성전자주식회사 Network traffic filtering method and apparatus
CN103973684A (en) * 2014-05-07 2014-08-06 北京神州绿盟信息安全科技股份有限公司 Rule compiling and matching method and device
US20140282830A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Firewall Packet Filtering
CN105913281A (en) * 2016-04-12 2016-08-31 宁波极动精准广告传媒有限公司 Advertisement publishing method based on classified Hash table
WO2016206389A1 (en) * 2015-06-26 2016-12-29 中兴通讯股份有限公司 Url matching method and apparatus
CN106790742A (en) * 2016-11-23 2017-05-31 北京锐安科技有限公司 A kind of method and device of IP matchings
WO2017114200A1 (en) * 2015-12-31 2017-07-06 阿里巴巴集团控股有限公司 Method and device for packet cleaning
WO2018103214A1 (en) * 2016-12-07 2018-06-14 武汉斗鱼网络科技有限公司 Scheme testing method, and server
CN111818099A (en) * 2020-09-02 2020-10-23 南京云信达科技有限公司 TCP (Transmission control protocol) message filtering method and device
CN112491901A (en) * 2020-11-30 2021-03-12 北京锐驰信安技术有限公司 Network flow fine screening device and method
CN112511441A (en) * 2020-11-18 2021-03-16 潍柴动力股份有限公司 Message processing method and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100070613A (en) * 2008-12-18 2010-06-28 삼성전자주식회사 Network traffic filtering method and apparatus
US20140282830A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Firewall Packet Filtering
CN103973684A (en) * 2014-05-07 2014-08-06 北京神州绿盟信息安全科技股份有限公司 Rule compiling and matching method and device
WO2016206389A1 (en) * 2015-06-26 2016-12-29 中兴通讯股份有限公司 Url matching method and apparatus
WO2017114200A1 (en) * 2015-12-31 2017-07-06 阿里巴巴集团控股有限公司 Method and device for packet cleaning
CN105913281A (en) * 2016-04-12 2016-08-31 宁波极动精准广告传媒有限公司 Advertisement publishing method based on classified Hash table
CN106790742A (en) * 2016-11-23 2017-05-31 北京锐安科技有限公司 A kind of method and device of IP matchings
WO2018103214A1 (en) * 2016-12-07 2018-06-14 武汉斗鱼网络科技有限公司 Scheme testing method, and server
CN111818099A (en) * 2020-09-02 2020-10-23 南京云信达科技有限公司 TCP (Transmission control protocol) message filtering method and device
CN112511441A (en) * 2020-11-18 2021-03-16 潍柴动力股份有限公司 Message processing method and device
CN112491901A (en) * 2020-11-30 2021-03-12 北京锐驰信安技术有限公司 Network flow fine screening device and method

Also Published As

Publication number Publication date
CN114625929B (en) 2024-08-13

Similar Documents

Publication Publication Date Title
US7657387B2 (en) Method of processing and storing mass spectrometry data
CN112347377B (en) IP address field searching method, service scheduling method, device and electronic equipment
US20100262684A1 (en) Method and device for packet classification
CN110781336B (en) Method and system for fusing portrait data and mobile phone feature data based on global filing
JP5703520B2 (en) Method for correcting blur of bar code image, terminal, and computer-readable recording medium
JP4059388B2 (en) Apparatus and method for identifying protocol pattern in protocol data unit
CN112087532B (en) Information acquisition method, device, equipment and storage medium
CN114708304B (en) Cross-camera multi-target tracking method, device, equipment and medium
CN108282414B (en) Data stream guiding method, server and system
CN111061707B (en) DPI equipment protocol rule base and rule sample optimization method and device
CN114743165A (en) Method and device for determining vehicle trajectory, storage medium and electronic device
CN114625929A (en) Method and device for sampling and collecting message
CN107590233B (en) File management method and device
CN112769635A (en) Service identification method and device for multi-granularity feature analysis
CN112738290A (en) NAT (network Address translation) conversion method, device and equipment
CN115474164B (en) Bluetooth broadcast filtering method and system
CN115174414A (en) Method, system and electronic device for automatically identifying devices and device paths in session
CN113946516A (en) Code coverage rate determining method and device and storage medium
CN116302095A (en) Instruction jump judging method and device, electronic equipment and readable storage medium
CN107577604B (en) Test data generation method and device and computer readable storage medium
CN112861652A (en) Method and system for tracking and segmenting video target based on convolutional neural network
CN117151136B (en) Method and device for identifying multiple two-dimensional codes, electronic equipment and medium
CN116682475B (en) Voltage offset determining method, voltage offset adjusting method, and storage medium
CN111209943A (en) Data fusion method and device and server
KR102690827B1 (en) Real-time cumulative data processing method and device for flow-oriented integrated analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant