Automatic flow characteristic collection method and system based on probe
Technical Field
The invention relates to the technical field of network traffic identification, in particular to a probe-based automatic traffic characteristic collection method and system.
Background
Today, flow deep recognition equipment such as next-generation firewalls and the like is widely applied, and deep recognition of flow is widely required. At present, the characteristic extraction of the flow is mainly focused on the manual deep analysis of captured flow samples and procedures for generating the flow, and the timeliness and the accuracy of the acquired network flow characteristics cannot be fully guaranteed due to the low efficiency and high cost of manual intervention.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a probe-based automatic traffic characteristic collection method and a probe-based automatic traffic characteristic collection system, which are used for carrying out user and process behavior identification by deploying a probe at a terminal, carrying out traffic capture by deploying the probe at a network side, establishing association between the terminal behavior and the network traffic so as to automatically extract and collect traffic characteristics, obtaining a linkage relation between terminal side operation and network side data by big data analysis, and dynamically judging the effectiveness of historical network traffic characteristics.
The specific invention content comprises:
a probe-based automatic flow feature collection method, comprising:
deploying probes at a terminal side and a network side respectively;
the terminal side probe monitors the calling of the API in the terminal system, corresponding stack tracing is carried out when a specified API calling action is found, corresponding calling process and calling module information are obtained, a specified data mark is generated according to the specification, and the specified data mark is sent to the network side probe; the specified API call actions include: requesting connection, sending data and finishing connection;
the terminal side probe monitors user operation, acquires data related to corresponding operation when finding the specified user operation, and stores the related data into a process operation record;
a terminal side probe monitors the process operation of the system, acquires process operation information and stores the process operation information into a process operation record;
the network side probe acquires the data stream according to the received appointed data mark;
and sending the information acquired by the terminal side probe, the corresponding specified data mark, the process operation record and the data stream acquired by the network side probe to a server for joint analysis to generate network traffic characteristics.
Further, the generating of the specified data tag according to the specification and the sending of the specified data tag to the network side probe are specifically: when the terminal side probe finds the API call action of the connection request, target quintuple information of the call action is obtained, a stream unique mark is generated according to the specific API and the call action, and the target quintuple information and the stream unique mark are sent to the network side probe so that the network side probe can prepare for obtaining the target stream; when the terminal side probe finds the API calling action of sending data, a data interval unique mark is generated according to the specific API and the calling action, and the data interval unique mark is sent to the network side probe, so that the network side probe determines the specific data stream to be acquired and acquires the data stream; when the terminal side probe finds out the API calling action for ending the connection, the corresponding stream unique mark is sent to the network side probe again according to the specific API, so that the network side probe ends the acquisition of the corresponding data stream.
Further, when a specified user operation is found, acquiring data related to the corresponding operation, specifically: when a terminal side probe finds the appointed user operation, acquiring the target process and window information of the terminal side probe; wherein the specified user operation comprises: clicking and inputting by a keyboard, wherein the form information comprises: the method comprises the steps of form name, form attribute, whether an input box exists in the form, form input box attribute, input box content, user input content and form content text data.
Further, the network side probe acquires the data stream according to the received specified data tag, and further includes: and for the acquired data stream, using the corresponding stream unique mark as the stream mark of the corresponding data stream, using the corresponding data interval unique mark as the data packet mark of the corresponding data stream, and performing protocol identification and analysis on the acquired data stream.
Further, the joint analysis further comprises: and performing correlation analysis on the information acquired by the terminal side probe, the corresponding specified data marks, the process operation records and the data stream acquired by the network side probe by using a machine learning algorithm, establishing the relation between the terminal side data and the network side data stream, classifying according to the type of the data stream, further analyzing the same type of data stream and the corresponding terminal side and network side data thereof, and obtaining the relation among the network flow characteristics, the user input and the interface functions of each type of data stream.
Further, the method further comprises evaluating the generated network traffic characteristics, specifically: in the process of dynamically and continuously performing the joint analysis on the server, judging whether the obtained user input information and the historical user input information of the data stream of the same type are stable or not in the further analysis of the data stream of the same type and the corresponding data of the terminal side and the network side, and if so, determining that the historical network traffic characteristics of the data stream of the type are kept effective; otherwise, carrying out detailed analysis on the type of network flow, and judging the effectiveness of the historical network flow characteristics; if the user input information obtained in each subsequent time and the user input information of the historical record can not be kept stable, the network traffic characteristics obtained in the history of the data stream of the type are regarded as invalid characteristics; wherein the stabilizing, comprises: the input contents are the same, the formats of the input contents are the same, and the types of the input contents are the same (the input contents are the same as numbers, letters, Chinese characters, symbols or the combination thereof and the like); wherein the refining analysis comprises: the classification analysis is performed according to the protocol used and according to the version of the protocol used.
A probe-based automatic flow characteristic collection system comprises a terminal, a network end and a server, and also comprises a terminal-side probe module deployed at the terminal, a network-side probe module deployed at the network end and a joint analysis module deployed at the server;
the terminal side probe module also comprises an API calling monitoring submodule, a user operation monitoring submodule and a process monitoring submodule;
in particular, the amount of the solvent to be used,
the API calls the monitoring submodule, is used for monitoring the call of API in the terminal system, carry on the corresponding stack to trace back when finding the appointed API calls movements, obtain corresponding call progress and call module information, and produce the designated data mark according to stipulation, and send the designated data mark to the probe module of network side; the specified API call actions include: requesting connection, sending data and finishing connection;
the user operation monitoring submodule is used for monitoring user operation, acquiring data related to corresponding operation when the specified user operation is found, and storing the related data into a process operation record;
the process monitoring submodule is used for monitoring the process operation of the system, acquiring process operation information and storing the process operation information into a process operation record;
the network side probe module is used for acquiring data streams according to the received specified data marks;
and the joint analysis module is used for performing joint analysis on the information acquired by the terminal side probe module, the corresponding specified data mark, the process operation record and the data stream acquired by the network side probe module to generate network traffic characteristics.
Further, the generating of the specified data tag according to the specification and the sending of the specified data tag to the network side probe module specifically include: when the API calls the monitoring submodule to find out the API call action of the connection request, acquiring target quintuple information of the call action, generating a stream unique mark according to the specific API and the call action, and sending the target quintuple information and the stream unique mark to the network side probe module to prepare for acquiring the target stream; when the API calls the monitoring submodule to find the API call action of sending data, generating a data interval unique mark according to the specific API and the call action, and sending the data interval unique mark to the network side probe module, so that the network side probe module determines the specific data stream to be acquired and acquires the data stream; when the API call monitoring submodule finds out the API call action of ending the connection, the corresponding stream unique mark is sent to the network side probe module again according to the specific API, so that the network side probe module ends the acquisition of the corresponding data stream.
Further, when a specified user operation is found, acquiring data related to the corresponding operation, specifically: when the user operation monitoring submodule finds the appointed user operation, the target process and the window information of the user operation monitoring submodule are obtained; wherein the specified user operation comprises: clicking and inputting by a keyboard, wherein the form information comprises: the method comprises the steps of form name, form attribute, whether an input box exists in the form, form input box attribute, input box content, user input content and form content text data.
Further, the network side probe module is further configured to: and for the acquired data stream, using the corresponding stream unique mark as the stream mark of the corresponding data stream, using the corresponding data interval unique mark as the data packet mark of the corresponding data stream, and performing protocol identification and analysis on the acquired data stream.
Further, the joint analysis module is further configured to: and performing correlation analysis on the information acquired by the terminal side probe module, the corresponding specified data marks, the process operation records and the data stream acquired by the network side probe module by using a machine learning algorithm, establishing the relation between the terminal data and the network end data stream, classifying the terminal data and the network end data stream according to the type of the data stream, further analyzing the same type of data stream and the corresponding terminal and network end data thereof, and obtaining the relation among network flow characteristics, user input and interface functions of each type of data stream.
Further, the system further comprises a network traffic characteristic evaluation module, wherein the network traffic characteristic evaluation module is deployed in the server and is specifically configured to: in the process of dynamically and continuously carrying out the joint analysis by the joint analysis module, judging whether the obtained user input information and the historical user input information of the data stream of the same type are stable or not in the process of further analyzing the data of the same type, the corresponding terminal and the corresponding network terminal, and if so, regarding that the historical network flow characteristics of the data stream of the type are kept effective; otherwise, carrying out detailed analysis on the type of network flow, and judging the effectiveness of the historical network flow characteristics; if the user input information obtained in each subsequent time and the user input information of the historical record can not be kept stable, the network traffic characteristics obtained in the history of the data stream of the type are regarded as invalid characteristics; wherein the stabilizing, comprises: the input contents are the same, the formats of the input contents are the same, and the types of the input contents are the same (the input contents are the same as numbers, letters, Chinese characters, symbols or the combination thereof and the like); wherein the refining analysis comprises: the classification analysis is performed according to the protocol used and according to the version of the protocol used.
The invention has the beneficial effects that:
the invention can realize the automatic network flow characteristic collection based on the combination of the terminal and the network probe;
the invention can realize the establishment of the association between the terminal behavior and the network flow, obtain the linkage relation between the terminal side operation and the network side data through big data analysis, and dynamically judge the effectiveness of the historical network flow characteristics, thereby dynamically collecting and maintaining the characteristic database data and fully ensuring the integrity and the accuracy of the characteristics.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a probe-based method for automatic flow feature collection in accordance with the present invention;
FIG. 2 is a block diagram of an automatic probe-based flow feature collection system according to the present invention.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the above objects, features and advantages of the present invention more comprehensible, the technical solutions of the present invention are described in further detail below with reference to the accompanying drawings.
The invention provides an embodiment of an automatic flow characteristic collection method based on a probe, which comprises the following steps as shown in figure 1:
s101: deploying probes at a terminal side and a network side respectively;
s102: the terminal side probe monitors the calling of the API in the terminal system, corresponding stack tracing is carried out when a specified API calling action is found, corresponding calling process and calling module information are obtained, a specified data mark is generated according to the specification, and the specified data mark is sent to the network side probe; the specified API call actions include: requesting connection, sending data and finishing connection;
s103: the terminal side probe monitors user operation, acquires data related to corresponding operation when finding the specified user operation, and stores the related data into a process operation record;
s104: a terminal side probe monitors the process operation of the system, acquires process operation information and stores the process operation information into a process operation record;
s105: the network side probe acquires the data stream according to the received appointed data mark;
s106: and sending the information acquired by the terminal side probe, the corresponding specified data mark, the process operation record and the data stream acquired by the network side probe to a server for joint analysis to generate network traffic characteristics.
Preferably, the generating the specified data tag according to the specification and sending the specified data tag to the network side probe specifically include: when the terminal side probe finds the API call action of the connection request, target quintuple information of the call action is obtained, a stream unique mark is generated according to the specific API and the call action, and the target quintuple information and the stream unique mark are sent to the network side probe so that the network side probe can prepare for obtaining the target stream; when the terminal side probe finds the API calling action of sending data, a data interval unique mark is generated according to the specific API and the calling action, and the data interval unique mark is sent to the network side probe, so that the network side probe determines the specific data stream to be acquired and acquires the data stream; when the terminal side probe finds out the API calling action for ending the connection, the corresponding stream unique mark is sent to the network side probe again according to the specific API, so that the network side probe ends the acquisition of the corresponding data stream.
Preferably, when a specified user operation is found, the data related to the corresponding operation is acquired, specifically: when a terminal side probe finds the appointed user operation, acquiring the target process and window information of the terminal side probe; wherein the specified user operation comprises: clicking and inputting by a keyboard, wherein the form information comprises: the method comprises the steps of form name, form attribute, whether an input box exists in the form, form input box attribute, input box content, user input content and form content text data.
Preferably, the network side probe acquires the data stream according to the received specified data tag, further comprising: and for the acquired data stream, using the corresponding stream unique mark as the stream mark of the corresponding data stream, using the corresponding data interval unique mark as the data packet mark of the corresponding data stream, and performing protocol identification and analysis on the acquired data stream.
Preferably, the joint analysis further comprises: and performing correlation analysis on the information acquired by the terminal side probe, the corresponding specified data marks, the process operation records and the data stream acquired by the network side probe by using a machine learning algorithm, establishing the relation between the terminal side data and the network side data stream, classifying according to the type of the data stream, further analyzing the same type of data stream and the corresponding terminal side and network side data thereof, and obtaining the relation among the network flow characteristics, the user input and the interface functions of each type of data stream.
Preferably, the method further comprises evaluating the generated network traffic characteristics, specifically: in the process of dynamically and continuously performing the joint analysis on the server, judging whether the obtained user input information and the historical user input information of the data stream of the same type are stable or not in the further analysis of the data stream of the same type and the corresponding data of the terminal side and the network side, and if so, determining that the historical network traffic characteristics of the data stream of the type are kept effective; otherwise, carrying out detailed analysis on the type of network flow, and judging the effectiveness of the historical network flow characteristics; if the user input information obtained in each subsequent time and the user input information of the historical record can not be kept stable, the network traffic characteristics obtained in the history of the data stream of the type are regarded as invalid characteristics; wherein the stabilizing, comprises: the input contents are the same, the formats of the input contents are the same, and the types of the input contents are the same (the input contents are the same as numbers, letters, Chinese characters, symbols or the combination thereof and the like); wherein the refining analysis comprises: the classification analysis is performed according to the protocol used and according to the version of the protocol used.
The invention also provides an embodiment of an automatic traffic characteristic collection system based on a probe, which comprises a terminal, a network terminal and a server, and further comprises a terminal side probe module 201 deployed at the terminal, a network side probe module 202 deployed at the network terminal and a joint analysis module 203 deployed at the server, as shown in fig. 2;
the terminal side probe module 201 further comprises an API call monitoring submodule 201-1, a user operation monitoring submodule 202-2 and a process monitoring submodule 201-3;
in particular, the amount of the solvent to be used,
the API call monitoring submodule 201-1 is used for monitoring calling of an API in a terminal system, performing corresponding stack tracing when a specified API call action is found, acquiring corresponding call progress and call module information, generating a specified data mark according to a rule, and sending the specified data mark to the network side probe module 202; the specified API call actions include: requesting connection, sending data and finishing connection;
the user operation monitoring submodule 201-2 is used for monitoring user operation, acquiring data related to corresponding operation when the appointed user operation is found, and storing the related data into a process operation record;
the process monitoring submodule 201-3 is used for monitoring the process operation of the system, acquiring process operation information and storing the process operation information into a process operation record;
the network side probe module 202 is configured to acquire a data stream according to the received specified data tag;
a joint analysis module 203, configured to perform joint analysis on the information acquired by the terminal-side probe module, the corresponding specified data flag, the process operation record, and the data stream acquired by the network-side probe module, so as to generate a network traffic characteristic.
Preferably, the generating the specified data tag according to the specification and sending the specified data tag to the network side probe module 202 specifically include: when the API call monitoring submodule 201-1 finds the API call action of the connection request, target quintuple information of the call action is acquired, a stream unique mark is generated according to the specific API and the call action, and the target quintuple information and the stream unique mark are sent to the network side probe module 202, so that the network side probe module 202 prepares for acquiring the target stream; when the API call monitoring submodule 201-1 finds out an API call action for sending data, a data interval unique mark is generated according to a specific API and the call action, and the data interval unique mark is sent to the network side probe module 202, so that the network side probe module 202 determines and acquires a specific data stream to be acquired; when the API call monitoring sub-module 201-1 finds an API call action for ending the connection, according to a specific API, the corresponding stream unique label is sent to the network side probe module 202 again, so that the network side probe module 202 ends the acquisition of the corresponding data stream.
Preferably, when a specified user operation is found, the data related to the corresponding operation is acquired, specifically: when the user operation monitoring submodule 201-2 finds the appointed user operation, the target process and the window information are obtained; wherein the specified user operation comprises: clicking and inputting by a keyboard, wherein the form information comprises: the method comprises the steps of form name, form attribute, whether an input box exists in the form, form input box attribute, input box content, user input content and form content text data.
Preferably, the network side probe module 202 is further configured to: and for the acquired data stream, using the corresponding stream unique mark as the stream mark of the corresponding data stream, using the corresponding data interval unique mark as the data packet mark of the corresponding data stream, and performing protocol identification and analysis on the acquired data stream.
Preferably, the joint analysis module 203 is further configured to: and performing correlation analysis on the information acquired by the terminal side probe module 201, the corresponding designated data marks, the process operation records and the data stream acquired by the network side probe module 202 by using a machine learning algorithm, establishing a relation between the terminal data and the network end data stream, classifying the terminal data and the network end data stream according to the type of the data stream, further analyzing the same type of data stream and the corresponding terminal and network end data thereof, and obtaining the relation among network traffic characteristics, user input and interface functions of each type of data stream.
Preferably, the system further comprises a network traffic characteristic evaluation module, which is specifically configured to: in the dynamic and continuous combined analysis process of the combined analysis module 203, judging whether the obtained user input information and the historical user input information of the data stream of the same type are stable or not in the further analysis of the data stream of the same type and the corresponding terminal and network end data, and if so, determining that the historical obtained network flow characteristics of the data stream of the type are effective; otherwise, carrying out detailed analysis on the type of network flow, and judging the effectiveness of the historical network flow characteristics; if the user input information obtained in each subsequent time and the user input information of the historical record can not be kept stable, the network traffic characteristics obtained in the history of the data stream of the type are regarded as invalid characteristics; wherein the stabilizing, comprises: the input contents are the same, the formats of the input contents are the same, and the types of the input contents are the same (the input contents are the same as numbers, letters, Chinese characters, symbols or the combination thereof and the like); wherein the refining analysis comprises: the classification analysis is performed according to the protocol used and according to the version of the protocol used.
Aiming at the technical defect that the accuracy, timeliness and accuracy of collected characteristics cannot be ensured due to the fact that the network flow characteristic collection is carried out by adopting a manual method at present, the invention provides a probe-based automatic flow characteristic collection method and a probe-based automatic flow characteristic collection system. The invention can realize the automatic network flow characteristic collection based on the combination of the terminal and the network probe; the invention can realize the establishment of the association between the terminal behavior and the network flow, obtain the linkage relation between the terminal side operation and the network side data through big data analysis, and dynamically judge the effectiveness of the historical network flow characteristics, thereby dynamically collecting and maintaining the characteristic database data and fully ensuring the integrity and the accuracy of the characteristics.
While the present invention has been described with respect to the embodiments, those skilled in the art will appreciate that there are numerous variations and permutations of the present invention without departing from the spirit of the invention, and it is intended that the appended claims cover such variations and modifications as fall within the true spirit of the invention.