CN111222019B - Feature extraction method and device - Google Patents

Feature extraction method and device Download PDF

Info

Publication number
CN111222019B
CN111222019B CN201911304940.9A CN201911304940A CN111222019B CN 111222019 B CN111222019 B CN 111222019B CN 201911304940 A CN201911304940 A CN 201911304940A CN 111222019 B CN111222019 B CN 111222019B
Authority
CN
China
Prior art keywords
application
data packet
flow
detected
network flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911304940.9A
Other languages
Chinese (zh)
Other versions
CN111222019A (en
Inventor
张元生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hillstone Networks Co Ltd
Original Assignee
Hillstone Networks Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hillstone Networks Co Ltd filed Critical Hillstone Networks Co Ltd
Priority to CN201911304940.9A priority Critical patent/CN111222019B/en
Publication of CN111222019A publication Critical patent/CN111222019A/en
Application granted granted Critical
Publication of CN111222019B publication Critical patent/CN111222019B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method and a device for feature extraction. Wherein, the method comprises the following steps: intercepting a flow data packet of the application to be detected to obtain a data packet file, wherein the number of the application to be detected is one or more; preprocessing a data packet file to obtain a data matrix; and extracting the characteristics of the data matrix to obtain target characteristics of the application to be detected, wherein the target characteristics are used for analyzing the application flow of the application to be detected, and the target characteristics are the optimal characteristics in all the characteristics of the application to be detected. The invention solves the technical problem of low feature extraction efficiency caused by adopting a mode of manually extracting the data features in the application flow in the prior art.

Description

Feature extraction method and device
Technical Field
The invention relates to the field of computer networks, in particular to a method and a device for feature extraction.
Background
With the rapid development of computer network technology, especially under the background of the wave of the internet plus times, internet application software in various industries is rapidly developed like bamboo shoots in spring after rain, and an application-based identification technology is the foundation of safety protection of application layers of various network devices at present and is also one of key technical points of safety of layers L4-L7, so Deep Packet Inspection (DPI) is produced at the beginning. The DPI technology is a technology for identifying traffic generated by various application software or systems by extracting a data feature (sig, abbreviated as sig) in application traffic, and further analyzing, controlling and managing the application traffic in aspects of content, security, network and the like.
For extracting data features in application flow, most engineers use network analysis tools such as Wireshark and the like to adopt a manual feature extraction operation mode, and the mode is large in workload and prone to errors. In addition, a feature extraction tool commonly used in the internet at present is an open-source process packet capturing tool QPA, wherein a core feature extraction module of the QPA can perform feature extraction on all messages of different lengths of the same network flow, and the process involves all types of traffic and focuses on analysis, and requires much human intervention. In addition, the feature extraction module has many cases where many features are extracted without omission, and sometimes cannot cover most of the traffic. The same characteristic extraction module is adopted for analyzing the standard protocol flow, so that the method has no pertinence, and the extracted characteristics are rough.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a method and a device for feature extraction, which at least solve the technical problem of low feature extraction efficiency caused by the adoption of a mode of manually extracting data features in application flow in the prior art.
According to an aspect of an embodiment of the present invention, there is provided a method of feature extraction, including: intercepting a flow data packet of the application to be detected to obtain a data packet file, wherein the number of the application to be detected is one or more; preprocessing a data packet file to obtain a data matrix; and extracting the characteristics of the data matrix to obtain target characteristics of the application to be detected, wherein the target characteristics are used for analyzing the application flow of the application to be detected, and the target characteristics are the optimal characteristics in all the characteristics of the application to be detected.
Optionally, the method of feature extraction further comprises: determining the interception times corresponding to the application to be detected; and intercepting the flow data packet of the application to be detected for multiple times based on the interception times to obtain a data packet file.
Optionally, in each traffic data packet intercepting process, traffic data packets corresponding to different account numbers are intercepted for the same application to be detected.
Optionally, the method of feature extraction further comprises: performing network flow filtering processing on the data packet files to obtain a preset network flow, wherein the application to be detected corresponds to a plurality of data packet files, each data packet file comprises a plurality of network flows, and the network flows are used for representing network flow sessions; arranging characters corresponding to bytes according to the byte size of a plurality of application layer loads of each preset network flow to obtain a character string sequence corresponding to each preset network flow; and grouping the preset network flow according to the character string sequence to obtain a data matrix.
Optionally, the method of feature extraction further comprises: filtering the network flow corresponding to the hypertext transfer protocol flow and the network flow corresponding to the hypertext transfer security protocol flow in the transmission control protocol network flow in the data packet file to obtain the network flow corresponding to the non-hypertext transfer protocol flow or the network flow corresponding to the hypertext transfer security protocol flow; and filtering the network flow corresponding to the domain name system protocol flow in the user data packet protocol network flow in the data packet file to obtain the network flow corresponding to the non-domain name system protocol flow.
Optionally, the method of feature extraction further comprises: and grouping the preset network flows according to the similarity of the character string sequences in the data packet files to obtain a data square matrix, wherein the preset network flows with the similarity larger than the preset similarity are divided into one group.
Optionally, the method of feature extraction further comprises: combining the application layer loads with the same data flow direction in the same group in pairs, and inputting the application layer loads into a feature extraction module to obtain an output result of the feature extraction module; under the condition that the output result indicates that the features are generated, acquiring at least one generated feature to be selected; calculating a weight value corresponding to each feature to be selected; and determining the candidate feature with the highest weight value as the target feature.
Optionally, the method of feature extraction further comprises: combining the application layer loads with the same data flow direction in the same group in pairs, and inputting the combined application layer loads into a feature extraction module to obtain an output result of the feature extraction module; under the condition that the output result indicates that the features are not generated, acquiring application layer loads which have the same or similar character string sequences in the data packet file and have the same data flow direction and are positioned at a preset position, and performing feature extraction to obtain at least one feature to be selected; calculating a weight value corresponding to each feature to be selected; and determining the candidate feature with the highest weight value as the target feature.
Optionally, the method of feature extraction further comprises: obtaining a first numerical value according to the character string length of the characteristic character string corresponding to each feature to be selected; obtaining a second numerical value according to the offset correlation of the character string length in the corresponding application layer load; obtaining a third numerical value according to the priority of the characteristic character string in the corresponding network flow; obtaining a fourth numerical value according to the ratio of the first quantity of the data packet files to the second quantity of the data packet files containing the characteristic character strings; obtaining a fifth numerical value according to the data flow direction of the characteristic character string; and calculating the multiplier of the first numerical value, the second numerical value, the third numerical value, the fourth numerical value and the fifth numerical value to obtain the weight value.
Optionally, the method of feature extraction further comprises: after the data matrix is subjected to feature extraction to obtain target features of the application to be detected, the target features are sent to the internal server and/or the cloud server, and the internal server and/or the cloud server push the target features to the gateway device, so that the gateway device analyzes application traffic of the application to be detected according to the target features.
According to another aspect of the embodiments of the present invention, there is also provided an apparatus for feature extraction, including: the intercepting module is used for intercepting the flow data packet of the application to be detected to obtain a data packet file, wherein the number of the application to be detected is one or more; the processing module is used for preprocessing the data packet file to obtain a data matrix; and the extraction module is used for extracting the characteristics of the data matrix to obtain the target characteristics of the application to be detected, wherein the target characteristics are used for analyzing the application flow of the application to be detected, and the target characteristics are the optimal characteristics in all the characteristics of the application to be detected.
According to another aspect of the embodiments of the present invention, there is also provided a storage medium including a stored program, wherein when the program runs, an apparatus on which the storage medium is controlled to execute the above-mentioned feature extraction method.
According to another aspect of the embodiments of the present invention, there is also provided a processor for executing a program, where the program executes to perform the above-mentioned feature extraction method.
In the embodiment of the invention, a data packet file is obtained by intercepting a flow data packet of an application to be detected in an automatic feature extraction mode, then the data packet file is preprocessed to obtain a data square matrix, and finally the data square matrix is subjected to feature extraction to obtain a target feature of the application to be detected, wherein the target feature is used for analyzing the application flow of the application to be detected, and the target feature is the optimal feature of all features of the application to be detected.
In the process, the number of the applications to be detected is one or more, wherein when the applications to be detected are multiple, the application can simultaneously extract the characteristics of the multiple applications to be detected so as to achieve the purpose of analyzing the application flow of the multiple applications to be detected simultaneously, and further improve the efficiency of characteristic extraction. In addition, the data packet file is preprocessed, so that the characteristics can be accurately extracted. Finally, feature extraction is carried out based on the data matrix, the optimal target feature is selected from the multiple features, the target feature is used for analyzing the application flow of the application to be detected, and the accuracy of application identification and analysis can be further improved.
Therefore, the scheme provided by the application achieves the purpose of automatically extracting the data features, the technical effect of improving the data feature extraction efficiency is achieved, and the technical problem of low feature extraction efficiency caused by the mode of manually extracting the data features in the application flow in the prior art is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram of a method of feature extraction according to an embodiment of the invention;
fig. 2 is a flow diagram of an alternative security gateway device DPI engine according to an embodiment of the present invention;
FIG. 3 is a flow diagram of an alternative method of feature extraction according to an embodiment of the invention;
FIG. 4 is a schematic diagram of an alternative data matrix according to an embodiment of the invention;
FIG. 5 is a schematic illustration of an alternative generation feature according to embodiments of the present invention;
FIG. 6 is an alternative non-featured schematic diagram in accordance with an embodiment of the present invention;
FIG. 7 is a schematic diagram of an alternative QQ and WeChat feature according to an embodiment of the present invention;
FIG. 8 is an alternative hardware schematic according to an embodiment of the invention; and
fig. 9 is a schematic diagram of a feature extraction apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
According to an embodiment of the present invention, an embodiment of a method for feature extraction is provided, and it should be noted that, in this embodiment, a detection platform installed with an application to be detected may be used as an execution main body. Additionally, the steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions and, although a logical order is illustrated in the flow charts, in some cases, the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a flowchart of a method of feature extraction according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:
step S102, intercepting the flow data packet of the applications to be detected to obtain a data packet file, wherein the number of the applications to be detected is one or more.
In step S102, one or more applications to be detected are installed on the detection platform, where the detection platform may intercept the data traffic packet of the multiple applications to be detected simultaneously to obtain a data file packet when the number of the applications to be detected is multiple, so as to achieve the purpose of processing the multiple applications to be detected simultaneously, and improve the efficiency of feature extraction.
It should be noted that the detection platform may intercept the data traffic packet through the network card, but in the process of intercepting the data traffic packet through the network card, the captured data traffic packet is susceptible to interference of various system services and system applications. The development program is used for intercepting the data traffic packet, so that interference of various system services and system applications can be avoided, and efficiency and accuracy of feature extraction can be improved.
And step S104, preprocessing the data packet file to obtain a data matrix.
In step S104, in order to enable the detection platform to extract accurate features, the intercepted data packet file needs to be preprocessed, where operations corresponding to the preprocessing include filtering the data packet file and organizing the filtered data packet file. Optionally, the data square matrix can be obtained after organizing the data packet files.
It should be noted that the data matrix is a matrix formed by a plurality of application layer loads, where the application layer loads include a data packet flow direction, a data packet file number to which the application layer loads belong, and a position label of the application layer loads in an application layer load sequence in the data flow direction.
In addition, it should be noted that the detection platform preprocesses the data packet file to ensure accurate extraction of the features.
And S106, performing feature extraction on the data matrix to obtain a target feature of the application to be detected, wherein the target feature is used for analyzing the application flow of the application to be detected, and the target feature is the optimal feature of all the features of the application to be detected.
In step S106, the detection platform may perform feature extraction on the data matrix through a feature extraction module, where the feature extraction module performs feature extraction by using a common dynamic programming algorithm or a correlation analysis data mining algorithm. In addition, after the feature extraction is performed on the data matrix in step S106, a plurality of features are usually obtained, at this time, the detection platform selects an optimal feature from the plurality of features as a target feature, and analyzes the application traffic of the application to be detected using the target feature, which can further improve the accuracy of application identification and analysis.
It should be noted that the application identification technology plays an important role in network security equipment, and is the basis for implementing application layer processing by various security functions and application functions, and the effectiveness and fine granularity of other functional modules are determined by the application identification effect, and the application identification effect mainly comes from the feature quality of content detection, and the level of the feature quality also affects the performance of a DPI detection engine.
Alternatively, fig. 2 shows a flow diagram of an alternative security gateway device DPI engine. In fig. 2, each feature (e.g., SIG in fig. 2) is written in the form of a Regular expression (e.g., REGEX in fig. 2) according to the syntax of a PCRE (Perl Compatible Regular Expressions, Perl compiling rule expression library), and then compiled into a Regular expression engine (e.g., DFA in fig. 2), that is, a DPI engine, and a network data packet (e.g., PAK in fig. 2) is detected and identified as each specific application (e.g., APPID in fig. 2) by the Regular expression engine, that is, the identification of an application to be detected is achieved.
Based on the schemes defined in the above steps S102 to S106, it can be known that a data packet file is obtained by intercepting a traffic data packet of an application to be detected in an automatic feature extraction manner, then the data packet file is preprocessed to obtain a data matrix, and finally the data matrix is subjected to feature extraction to obtain a target feature of the application to be detected, where the target feature is used to analyze the application traffic of the application to be detected.
It is easy to notice that in the above process, the number of the applications to be detected is one or more, wherein when the number of the applications to be detected is multiple, the application can simultaneously perform feature extraction on multiple applications to be detected, so as to achieve the purpose of simultaneously analyzing the application traffic of the multiple applications to be detected, and further improve the efficiency of feature extraction. In addition, the data packet file is preprocessed, so that the characteristics can be accurately extracted. Finally, feature extraction is carried out based on the data matrix, the optimal target feature is selected from the multiple features, the target feature is used for analyzing the application flow of the application to be detected, and the accuracy of application identification and analysis can be further improved.
Therefore, the scheme provided by the application achieves the purpose of automatically extracting the data features, the technical effect of improving the data feature extraction efficiency is achieved, and the technical problem of low feature extraction efficiency caused by the mode of manually extracting the data features in the application flow in the prior art is solved.
In an optional embodiment, fig. 3 shows a flowchart of an optional feature extraction method, and as can be seen from fig. 3, the method includes four parts, namely, acquisition, preprocessing, feature generation, and feature synchronization of a data packet file, where in an acquisition stage of the data packet file, a detection platform may intercept a traffic data packet of an application to be detected to obtain a data packet file. Specifically, the detection platform firstly determines the interception times corresponding to the application to be detected, and then carries out multiple interception processing on the flow data packet of the application to be detected based on the interception times to obtain a data packet file. In each flow data packet intercepting process, intercepting the flow data packets corresponding to different account numbers for the same application to be detected.
It should be noted that, in the foregoing process, multiple times of interception need to be performed on each application to be detected, where the interception times corresponding to different applications to be detected are different, and the interception time of the application to be detected is, by default, 6. In addition, for the same application to be detected, in the process of intercepting a plurality of traffic data packets by the detection platform, each interception uses a different account to log in the application to be detected so as to obtain the traffic data packets corresponding to different accounts, for example, for a wechat application, each time the traffic data packet is intercepted, a different account is used to log in the wechat application.
Optionally, as shown in fig. 3, in the preprocessing stage, the detection platform first filters the data packet file through the traffic filter to obtain a preset network flow (for example, a TCP type and a UDP type in fig. 3), and then organizes the preset network flow to obtain a data matrix. In fig. 3, the TCP class includes a non-HTTP class and a non-HTTPs class, and the UDP class includes a non-DNS class.
Specifically, the detection platform performs network flow filtering processing on the data packet file to obtain preset network flows, then arranges characters corresponding to bytes according to the byte size of multiple application layer loads of each preset network flow to obtain a character string sequence corresponding to each preset network flow, and performs grouping processing on the preset network flows according to the character string sequence to obtain a data matrix. The application to be detected corresponds to a plurality of data packet files, each data packet file comprises a plurality of network flows, the network flows are used for representing network flow sessions, and optionally, the network flows are network flow sessions with the same quintuple information such as source IP addresses, target IP addresses, source ports, destination ports and protocol types.
In an optional embodiment, the detection platform performs filtering processing on a network flow corresponding to a hypertext transfer protocol (HTTP) traffic and a network flow corresponding to a hypertext transfer security protocol (HTTPs) traffic in a transmission control protocol network flow in the data packet file to obtain a network flow corresponding to a non-hypertext transfer protocol traffic (i.e., non-HTTP) or a network flow corresponding to a non-hypertext transfer security protocol traffic (i.e., non-HTTPs), and then performs filtering processing on a network flow corresponding to a domain name system (i.e., DNS) protocol traffic in a user data packet protocol (i.e., UDP) network flow in the data packet file to obtain a network flow corresponding to a non-domain name system (i.e., non-HTTPSDNS) protocol traffic.
It should be noted that, because protocol traffic such as HTTP/HTTPs/DNS has an open protocol structure, there is a targeted technical solution to extract data characteristics of network traffic corresponding to the protocol traffic more effectively, and in the present application, network traffic corresponding to protocol traffic such as HTTP/HTTPs/DNS is not processed. In addition, through the above process, in the present application, the detection platform may screen out the corresponding network flows such as HTTP, SSL (Secure Sockets Layer)/TLS (Transport Layer Security), DNS, advertisement traffic, and the like when analyzing the packet file.
Optionally, after the preset network flow is obtained, the detection platform performs grouping processing on the preset network flow according to the similarity of the character string sequences in the multiple data packet files to obtain a data square matrix, where the preset network flow with the similarity greater than the preset similarity is divided into a group.
Optionally, all network flows in each packet file may perform data organization in a structure of "session- > flow0& flow 1", where session is a network flow identical to the above quintuple information, flow0 and flow1 are respectively represented as two different data flow directions of the network flow, a direction in which a request packet is first sent in the network flow is flow0, and a direction in which a reply packet is flow 1. In addition, the structure can also store the byte number and the message number of each network flow.
Optionally, the above structure may further store a string sequence PL _ LEN _ SEQ composed of the flow0 of each network flow and K (default K is 4) application layer loads in the flow1 direction, where, for example, the application layer load byte numbers of the first four messages in the flow1 direction of a certain torpedo download network flow are respectively: 008B, 005B, 0165, 05A8, PL _ LEN _ SEQ is 008B005B016505 A8.
It should be noted that the number of bytes is expressed in a double-byte hexadecimal, and only the messages with the application layer load are counted. In addition, because the character string sequences of the network flow application layer loads generated by the same application behavior are the same or similar, the interference caused by retransmission messages and abnormal messages can be avoided by grouping the preset network flow according to the similarity of the character string sequences in a plurality of data packet files.
Optionally, the above structure may further store the first M application layer loads per net flow0 and flow1 directions and the first N bytes of each application layer load, where M and N are adjustable, for example, according to the scan length of the DPI inspection engine. In addition, if there are a plurality of network flows identical to PL _ LEN _ SEQ in the same packet file, only one of the network flows is stored.
Alternatively, the above structure may also store the location of each application layer load in flow0 or flow 1.
In addition, at each application flow, the detection platform divides the same (or similar) network flows PL _ LEN _ SEQ (distinguishing flow0 and flow1) in Y application data packet files into a group, wherein the same flow0 or flow1 of PL _ LEN _ SEQ is not necessarily present in each application data packet file, for example, P (i, j, k) is used for representing the load of a single application layer in each flow, i represents the flow direction, 0 represents flow0, and 1 represents flow; j represents the file number of the data packet to which the current application layer load belongs; k represents the position number of the application layer load sequence in the flow of the current application layer load (such as the application layer load sequence (number) in fig. 4). Fig. 4 shows a schematic diagram of an alternative data matrix.
In an alternative embodiment, as shown in fig. 3, in the feature generation stage, the detection platform first extracts features, and then performs feature selection from the extracted features to obtain target features. The feature extraction includes two modes, namely, transverse comparison and longitudinal comparison.
Wherein, the lateral comparison is also called cross-packet comparison, i.e. the algorithm comparison is performed on the first application layer loads of all flow0 and flow1 in the same packet by inputting the above data matrix into the feature extraction module, and preferentially selecting the flow0 or the flow1 of the same (or similar) PL _ LEN _ SEQ in different data packet files under the same application traffic. Specifically, in a transverse comparison mode, the detection platform combines the application layer loads with the same data flow direction in the same group in pairs, inputs the combined application layer loads into the feature extraction module to obtain an output result of the feature extraction module, then obtains at least one generated feature to be selected under the condition that the output result indicates that the feature is generated, calculates a weight value corresponding to each feature to be selected, and finally determines the feature to be selected with the highest weight value as the target feature.
For example, the first application layer loads of all flows 0 are P (0,1,1), P (0,2,1), P (0,3,1), and P (0,4,1), and P (0, 1) is first input into the feature extraction module by combining P (0,1,1) with P (0,2,1), and P (0,3,1) with P (0,4,1) in pairs. If all groups generate features, combining the generated features in pairs and inputting the combined features into a feature extraction module; and if the groups cannot generate the features, readjusting pairwise combination, continuously combining P (0,1,1) and P (0,3,1), and P (0,2,1) and P (0,4,1) pairwise into the feature extraction module until the features are generated, identifying the application layer loads which cannot generate the features, and identifying a group of application layer loads which generate the features. And then, sequentially performing feature extraction on all second application layer loads and all third application layer loads … and all pth application layer loads (P values are adjustable), so as to obtain multiple groups of application layer load features, wherein bytes are used as minimum feature units until final features are generated. Alternatively, fig. 5 shows a schematic diagram when a feature can be generated, fig. 6 shows a schematic diagram when a feature cannot be generated, in fig. 5 and 6, SEM is a feature extraction module, sig is a feature, where a dashed box in fig. 6 indicates that the feature extraction module does not generate a feature.
It should be noted that, in most cases, the transverse comparison can basically generate features, but the longitudinal comparison can still be selected when no usable features are generated as the transverse comparison, so as to further improve the yield of the application flow feature extraction. Specifically, in a longitudinal comparison mode, the detection platform combines application layer loads with the same data flow direction in the same group in pairs, inputs the combined application layer loads into the feature extraction module to obtain an output result of the feature extraction module, obtains a character string sequence which is the same or similar in a data packet file under the condition that the output result indicates that no feature is generated, performs feature extraction on the application layer loads which are at a preset position and have the same data flow direction to obtain at least one to-be-selected feature, calculates a weight value corresponding to each to-be-selected feature, and finally determines the to-be-selected feature with the highest weight value as a target feature.
As can be seen from the above, in the vertical comparison mode, the inspection platform preferentially selects the flow0 and the flow1 of the same (or similar) PL _ LEN _ SEQ in each packet file, performs feature extraction on the first P application layer loads in each flow0 and flow1, for example, performs mutual feature extraction on the application layer loads such as P (0,1,1), P (0,1,2), P (0,1,3), P (0,1,4), and adopts the same comparison method as the horizontal comparison, and takes bytes as the minimum feature unit until the final feature is generated.
It should be noted that, regardless of the horizontal comparison or the vertical comparison, multiple sets of application-layer load characteristics (i.e., candidate characteristics) are usually generated, and therefore, one or more target characteristics need to be selected from the candidate characteristics. Specifically, the detection platform obtains a first numerical value according to the character string length of the characteristic character string corresponding to each feature to be selected, obtains a second numerical value according to the offset correlation of the character string length in the corresponding application layer load, obtains a third numerical value according to the priority of the characteristic character string in the corresponding network flow, obtains a fourth numerical value according to the ratio of the first number of the data packet files to the second number of the data packet files containing the characteristic character string, obtains a fifth numerical value according to the data flow direction of the characteristic character string, and finally calculates the multiplier of the first numerical value, the second numerical value, the third numerical value, the fourth numerical value and the fifth numerical value to obtain the weight value.
Optionally, the weight value may satisfy the following formula:
s=a*b*c*d*e
in the above formula, s is a weighted value, a is a first value, b is a second value, c is a third value, d is a fourth value, and e is a fifth value.
Optionally, the value a is a continuous characteristic character string length, where the characteristic continuous character string length is greater than 6 or an L value, a is 1, and the continuous character string length is 4, then a is 0.8; b is the offset correlation of the length of the characteristic character string in the load of each application layer, if the offset correlations of the characteristic character string are the same, the value of b is 1, if the offset correlations of the characteristic character string are different, the value of b is determined according to the similarity of the characteristic character string; the c value is that the characteristic character string takes a higher priority in a first message with application layer load of a preset network flow, the value is 1, and the value of the characteristic character string takes a value of 0.8 in a second message; the value of d is X/Y, namely X data packet files in Y application data packet files comprise current characteristic character strings; the e value is 1, which is higher than the priority of the characteristic character string in the flow1 direction, and 0.6 which is the value of the characteristic character string in the flow0 direction.
In an alternative embodiment, as shown in fig. 3, in the feature synchronization stage, the detection platform sends the target feature synchronization to the internal server and/or the cloud server. Specifically, after the data matrix is subjected to feature extraction to obtain target features of the application to be detected, the detection platform sends the target features to the internal server and/or the cloud server, and the internal server and/or the cloud server pushes the target features to the gateway device, so that the gateway device analyzes application traffic of the application to be detected according to the target features.
The following description will be made by taking an example of analyzing QQ and wechat under the Windows platform. Simulating and operating each application sub-behavior (such as login, chat and the like of the QQ, login, chat and the like of the WeChat) on the QQ and the WeChat to generate network communication traffic, and repeating the operations of opening, operating the software, closing and six times on the QQ and the WeChat to generate six data packet files for each application software, wherein the data packet files generated by the same application software can be stored in the same application software folder, and the file names of the application data packets can be named in a mode of' process name _ number ID. WeChat _1.pcap, WeChat _2.pcap, WeChat _3.pcap, WeChat _4.pcap, WeChat _5.pcap, WeChat _6.pcap and QQ _1.pcap, QQ _2.pcap, QQ _3.pcap, QQ _4.pcap, QQ _5.pcap, QQ _6. pcap.
Then, the detection platform performs preliminary processing on the data packet file, and filters out some invalid network flows, for example, network flows with the number of messages less than or equal to three, and HTTP/HTTPs/DNS, advertisements and other traffic.
Then, the detection platform defines a Flow class, initializes seven instance attributes, i.e. protocol, source IP address, destination IP address, source port, destination port, application layer load dataset, packet number (proto, sip, dip, port, dport, applyer _ dataset, pkt _ count), further defines a Flow class, and initializes five instance attributes, Flow0, Flow1, total packet byte number, total packet number, application layer load size sequence (Flow0, Flow1, pkt _ bytes _ sum, pkt _ count _ sum, pkt _ len _ seq), when parsing the Flow of each packet file, after the data of each packet is parsed to the object instance of the Flow class, assigns the object instance of the Flow class to the Flow0 or Flow1 of each object instance of the Flow class according to the session quintuple information, wherein the Flow class is judged by defining Flow _1 _ Flow () method. Then, the detection platform reads the network flow in all the data packet files into a dictionary session _ fact, and the first N bytes of each packet are held at maximum (default N is 1024, M, N is configurable), and grouped using group () according to pkt _ len _ seq of each value in the dictionary session _ fact, the values of pkt _ len _ seq being the same will be grouped into one group, the grouped network streams are written into the dictionary function session _ fact, then, the messages at the same positions of the flows 0 and 1 in each functional network flow set of the dictionary function _ session _ fact are organized into the dictionary flow _ pole _ fact in a horizontal mode or a vertical mode, and finally, sequentially inputting the QQ in the dictionary flow _ pool _ fact and the application layer load data in each functional network flow data cluster of the WeChat into the feature extraction module and the feature selection module according to a transverse comparison mode in the technical scheme, and finally obtaining the feature schematic diagram of the QQ and the WeChat as shown in FIG. 7.
It should be noted that, in the above process, the detection platform may write each application layer load into a flow of each network flow by defining a network flow class method add _ flow _ applyerdata (), where each network flow includes flow0 and flow1, and each flow holds the first M data packets at most. Optionally, the value of M may be 3, and the maximum value is 10.
It should be noted that the method provided by the present application may be applied in a complex external network environment, and two typical application scenarios are described below according to the hardware schematic diagram shown in fig. 8.
The application scene one: the method provided by the application is applied to the security gateway devices, an enterprise network administrator can generate high-quality application features through self-defined application, meanwhile, the enterprise A uploads the self-defined application features to the cloud server, and the cloud server selects the application features with high credibility to be pushed to the enterprise B or more security gateway devices (such as internet personal users in fig. 8) through a specific screening algorithm.
Application scenario two: the method provided by the application is applied to a cloud server, the cloud server provides an application data packet uploading inlet for an internal engineer, an internet user and gateway security equipment to generate high-quality application features, and subsequently, like the scene, the cloud server selects the application features with high credibility through a specific screening algorithm and pushes the selected application features to an enterprise B or more security gateway devices (such as internet personal users in fig. 8).
It should be noted that the two application scenarios both break through the limitation of the traditional feature library, share the high-quality features to the whole network according to the instant detection result through the cloud server, reduce the maintenance technical threshold of the internal research and development engineers and the enterprise network administrator, reduce the maintenance burden of the internal research and development engineers and the enterprise network administrator, and can meet the client application identification requirement in time and improve the support quantity and coverage of application identification.
According to the content, the method provided by the application can be used for analyzing a plurality of target application software simultaneously, and the accuracy and the efficiency of feature generation are greatly improved. In addition, the method can also be used for extracting the application data characteristics under the windows platform, and has reference significance for extracting the application flow characteristics under other platforms such as Linux, android and the like and applying the cloud identification technology.
Example 2
According to an embodiment of the present invention, there is further provided an embodiment of a feature extraction apparatus, where fig. 9 is a schematic diagram of a feature extraction apparatus according to an embodiment of the present invention, and as shown in fig. 9, the apparatus includes: a truncation module 901, a processing module 903, and an extraction module 905.
The intercepting module 901 is configured to intercept a traffic data packet of an application to be detected to obtain a data packet file, where the number of the application to be detected is one or more; a processing module 903, configured to preprocess the data packet file to obtain a data matrix; the extraction module 905 is configured to perform feature extraction on the data matrix to obtain a target feature of the application to be detected, where the target feature is used to analyze the application traffic of the application to be detected, and the target feature is an optimal feature of all features of the application to be detected.
It should be noted here that the above intercepting module 901, the processing module 903 and the extracting module 905 correspond to steps S102 to S106 of the above embodiment, and the three modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure of the above embodiment.
In an alternative embodiment, the intercept module comprises: the device comprises a first determination module and an intercepting submodule. The first determining module is used for determining the interception times corresponding to the application to be detected; and the intercepting submodule is used for intercepting the flow data packet of the application to be detected for multiple times based on the intercepting times to obtain a data packet file.
Optionally, in the process of intercepting the traffic data packet each time, the traffic data packets corresponding to different account numbers are intercepted for the same application to be detected.
In an alternative embodiment, the processing module comprises: the device comprises a filtering module, an arranging module and a grouping module. The filtering module is used for performing network flow filtering processing on the data packet files to obtain a preset network flow, wherein the application to be detected corresponds to a plurality of data packet files, each data packet file comprises a plurality of network flows, and the network flows are used for representing network flow sessions; the arrangement module is used for arranging the characters corresponding to the bytes according to the byte size of the multiple application layer loads of each preset network flow to obtain a character string sequence corresponding to each preset network flow; and the grouping module is used for grouping the preset network flow according to the character string sequence to obtain a data matrix.
In an alternative embodiment, the filtration module comprises: the filter comprises a first filter module and a second filter module. The first filtering module is used for filtering the network flow corresponding to the hypertext transfer protocol traffic and the network flow corresponding to the hypertext transfer security protocol traffic in the transmission control protocol network flow in the data packet file to obtain the network flow corresponding to the non-hypertext transfer protocol traffic or the network flow corresponding to the hypertext transfer security protocol traffic; and the second filtering module is used for filtering the network flow corresponding to the domain name system protocol flow in the user data packet protocol network flow in the data packet file to obtain the network flow corresponding to the non-domain name system protocol flow.
In an alternative embodiment, the grouping module comprises: and the grouping submodule is used for grouping the preset network streams according to the similarity of the character string sequences in the plurality of data packet files to obtain a data matrix, wherein the preset network streams with the similarity larger than the preset similarity are grouped into one group.
In an alternative embodiment, the extraction module comprises: the device comprises a first combination module, a first acquisition module, a first calculation module and a second determination module. The first combination module is used for combining the application layer loads with the same data flow direction in the same group in pairs and inputting the combination into the feature extraction module to obtain an output result of the feature extraction module; the first acquisition module is used for acquiring at least one generated feature to be selected under the condition that the output result indicates the generated feature; the first calculation module is used for calculating a weight value corresponding to each feature to be selected; and the second determining module is used for determining the candidate feature with the highest weight value as the target feature.
In an alternative embodiment, the extraction module comprises: the device comprises a second combination module, a second acquisition module, a second calculation module and a third determination module. The second combination module is used for combining the application layer loads with the same data flow direction in the same group in pairs, inputting the combined application layer loads into the feature extraction module and obtaining the output result of the feature extraction module; the second obtaining module is used for obtaining the application layer loads which have the same or similar character string sequences and have the same data stream direction and are positioned at the preset position in the data packet file for feature extraction under the condition that the output result indicates that the features are not generated, so as to obtain at least one feature to be selected; the second calculation module is used for calculating a weight value corresponding to each feature to be selected; and the third determining module is used for determining the candidate feature with the highest weight value as the target feature.
In an optional embodiment, the apparatus for feature extraction further comprises: the device comprises a first processing module, a second processing module, a third processing module, a fourth processing module, a fifth processing module and a third calculating module. The first processing module is used for obtaining a first numerical value according to the character string length of the characteristic character string corresponding to each feature to be selected; the second processing module is used for obtaining a second numerical value according to the offset correlation of the character string length in the corresponding application layer load; the third processing module is used for obtaining a third numerical value according to the priority of the characteristic character string in the corresponding network flow; the fourth processing module is used for obtaining a fourth numerical value according to the ratio of the first quantity of the data packet files to the second quantity of the data packet files containing the characteristic character strings; the fifth processing module is used for obtaining a fifth numerical value according to the data flow direction of the characteristic character string; and the third calculation module is used for calculating the multiplier of the first numerical value, the second numerical value, the third numerical value, the fourth numerical value and the fifth numerical value to obtain the weight value.
In an alternative embodiment, the feature extraction apparatus further includes: the sending module is used for sending the target characteristics to the internal server and/or the cloud server after the characteristic extraction is carried out on the data matrix to obtain the target characteristics of the application to be detected, and pushing the target characteristics to the gateway equipment by the internal server and/or the cloud server so that the gateway equipment can analyze the application flow of the application to be detected according to the target characteristics.
Example 3
According to another aspect of the embodiments of the present invention, there is also provided a storage medium including a stored program, wherein the apparatus in which the storage medium is located is controlled to perform the method of feature extraction in embodiment 1 described above when the program runs.
Example 4
According to another aspect of the embodiments of the present invention, there is also provided a processor for running a program, where the program executes the method for feature extraction in embodiment 1.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technical content can be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (7)

1. A method of feature extraction, comprising:
intercepting a flow data packet of applications to be detected to obtain a data packet file, wherein the number of the applications to be detected is one or more;
preprocessing the data packet file to obtain a data matrix;
extracting the characteristics of the data matrix to obtain target characteristics of the application to be detected, wherein the target characteristics are used for analyzing the application flow of the application to be detected, and the target characteristics are the optimal characteristics of all the characteristics of the application to be detected;
preprocessing the data packet file to obtain a data matrix, wherein the preprocessing comprises the following steps: performing network flow filtering processing on the data packet files to obtain a preset network flow, wherein the application to be detected corresponds to a plurality of data packet files, each data packet file comprises a plurality of network flows, and the network flows are used for representing network flow sessions; arranging characters corresponding to the bytes according to the size of the bytes loaded by the application layers of each preset network flow to obtain a character string sequence corresponding to each preset network flow; grouping the preset network flow according to the character string sequence to obtain the data matrix;
and performing network flow filtering processing on the data packet file to obtain a preset network flow, wherein the method comprises the following steps: filtering the network flow corresponding to the hypertext transfer protocol flow and the network flow corresponding to the hypertext transfer security protocol flow in the transmission control protocol network flow in the data packet file to obtain the network flow corresponding to the non-hypertext transfer protocol flow or the network flow corresponding to the hypertext transfer security protocol flow; filtering the network flow corresponding to the domain name system protocol flow in the user data packet protocol network flow in the data packet file to obtain the network flow corresponding to the non-domain name system protocol flow;
grouping the preset network flow according to the character string sequence to obtain the data matrix, wherein the grouping process comprises the following steps: grouping the preset network flows according to the similarity of the character string sequences in the data packet files to obtain the data matrix, wherein the preset network flows with the similarity larger than the preset similarity are divided into a group;
performing feature extraction on the data matrix to obtain the target features of the application to be detected, including: combining application layer loads with the same data flow direction in the same group in pairs, and inputting the application layer loads into a feature extraction module to obtain an output result of the feature extraction module; under the condition that the output result indicates that the features are generated, acquiring at least one generated feature to be selected; calculating a weight value corresponding to each feature to be selected; and determining the candidate feature with the highest weight value as the target feature.
2. The method of claim 1, wherein intercepting a traffic data packet of an application to be detected to obtain a data packet file comprises:
determining the interception times corresponding to the application to be detected;
and intercepting the flow data packet of the application to be detected for multiple times based on the interception times to obtain the data packet file.
3. The method according to claim 2, characterized in that in each traffic data packet intercepting process, traffic data packets corresponding to different account numbers are intercepted for the same application to be detected.
4. The method according to claim 1, wherein performing feature extraction on the data matrix to obtain the target feature of the application to be detected comprises:
combining the application layer loads with the same data flow direction in the same group in pairs, and inputting the combined application layer loads into a feature extraction module to obtain an output result of the feature extraction module;
under the condition that the output result indicates that the features are not generated, acquiring application layer loads which have the same or similar character string sequences and have the same data stream direction and are positioned at a preset position in the data packet file to extract the features to obtain at least one feature to be selected;
calculating a weight value corresponding to each feature to be selected;
and determining the candidate feature with the highest weight value as the target feature.
5. The method according to claim 1 or 4, wherein calculating the weight value corresponding to each candidate feature comprises:
obtaining a first numerical value according to the character string length of the characteristic character string corresponding to each feature to be selected;
obtaining a second numerical value according to the offset correlation of the character string length in the corresponding application layer load;
obtaining a third numerical value according to the priority of the characteristic character string in the corresponding network flow;
obtaining a fourth numerical value according to the ratio of the first quantity of the data packet files to the second quantity of the data packet files containing the characteristic character strings;
obtaining a fifth numerical value according to the data flow direction of the characteristic character string;
and calculating the multiplier of the first numerical value, the second numerical value, the third numerical value, the fourth numerical value and the fifth numerical value to obtain the weight value.
6. The method according to claim 1, wherein after performing feature extraction on the data matrix to obtain the target features to be detected, the method further comprises:
and sending the target characteristics to an internal server and/or a cloud server, and pushing the target characteristics to a gateway device by the internal server and/or the cloud server so that the gateway device analyzes the application flow of the application to be detected according to the target characteristics.
7. An apparatus for feature extraction, comprising:
the device comprises an intercepting module, a sending module and a receiving module, wherein the intercepting module is used for intercepting a flow data packet of an application to be detected to obtain a data packet file, and the number of the application to be detected is one or more;
the processing module is used for preprocessing the data packet file to obtain a data matrix;
the extraction module is used for extracting the characteristics of the data matrix to obtain the target characteristics of the application to be detected, wherein the target characteristics are used for analyzing the application flow of the application to be detected, and the target characteristics are the optimal characteristics of all the characteristics of the application to be detected;
wherein the processing module comprises: the filtering module is used for performing network flow filtering processing on the data packet files to obtain a preset network flow, wherein the application to be detected corresponds to a plurality of data packet files, each data packet file comprises a plurality of network flows, and the network flows are used for representing network flow sessions; the arrangement module is used for arranging the characters corresponding to the bytes according to the size of the bytes loaded by the application layers of each preset network flow to obtain a character string sequence corresponding to each preset network flow; the grouping module is used for grouping the preset network flow according to the character string sequence to obtain the data square matrix;
the filtration module includes: the first filtering module is used for filtering the network flow corresponding to the hypertext transfer protocol traffic and the network flow corresponding to the hypertext transfer security protocol traffic in the transmission control protocol network flow in the data packet file to obtain the network flow corresponding to the non-hypertext transfer protocol traffic or the network flow corresponding to the hypertext transfer security protocol traffic; the second filtering module is used for filtering the network flow corresponding to the domain name system protocol flow in the user data packet protocol network flow in the data packet file to obtain the network flow corresponding to the non-domain name system protocol flow;
the grouping module includes: the grouping submodule is used for grouping the preset network streams according to the similarity of the character string sequences in the data packet files to obtain the data matrix, wherein the preset network streams with the similarity larger than the preset similarity are divided into a group;
the extraction module comprises: the first combination module is used for combining the application layer loads with the same data flow direction in the same group in pairs and inputting the combination into the feature extraction module to obtain an output result of the feature extraction module; the first acquisition module is used for acquiring at least one generated feature to be selected under the condition that the output result indicates the generated feature; the first calculation module is used for calculating a weight value corresponding to each feature to be selected; and determining the candidate feature with the highest weight value as the target feature.
CN201911304940.9A 2019-12-17 2019-12-17 Feature extraction method and device Active CN111222019B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911304940.9A CN111222019B (en) 2019-12-17 2019-12-17 Feature extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911304940.9A CN111222019B (en) 2019-12-17 2019-12-17 Feature extraction method and device

Publications (2)

Publication Number Publication Date
CN111222019A CN111222019A (en) 2020-06-02
CN111222019B true CN111222019B (en) 2022-09-06

Family

ID=70810554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911304940.9A Active CN111222019B (en) 2019-12-17 2019-12-17 Feature extraction method and device

Country Status (1)

Country Link
CN (1) CN111222019B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112866289B (en) * 2021-03-02 2022-09-30 恒为科技(上海)股份有限公司 Method and system for extracting feature rule
CN113315721B (en) * 2021-05-26 2023-01-17 恒安嘉新(北京)科技股份公司 Network data feature processing method, device, equipment and storage medium
CN114221816B (en) * 2021-12-17 2024-05-03 恒安嘉新(北京)科技股份公司 Flow detection method, device, equipment and storage medium
CN115632995B (en) * 2022-12-19 2023-03-17 北京安帝科技有限公司 Data feature extraction method, equipment and computer medium for industrial control network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101753622A (en) * 2009-12-25 2010-06-23 青岛朗讯科技通讯设备有限公司 Method for extracting characteristics of application layer protocols
CN104270392A (en) * 2014-10-24 2015-01-07 中国科学院信息工程研究所 Method and system for network protocol recognition based on tri-classifier cooperative training learning
US10187401B2 (en) * 2015-11-06 2019-01-22 Cisco Technology, Inc. Hierarchical feature extraction for malware classification in network traffic
CN109327357A (en) * 2018-11-29 2019-02-12 杭州迪普科技股份有限公司 Feature extracting method, device and the electronic equipment of application software

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101753622A (en) * 2009-12-25 2010-06-23 青岛朗讯科技通讯设备有限公司 Method for extracting characteristics of application layer protocols
CN104270392A (en) * 2014-10-24 2015-01-07 中国科学院信息工程研究所 Method and system for network protocol recognition based on tri-classifier cooperative training learning
US10187401B2 (en) * 2015-11-06 2019-01-22 Cisco Technology, Inc. Hierarchical feature extraction for malware classification in network traffic
CN109327357A (en) * 2018-11-29 2019-02-12 杭州迪普科技股份有限公司 Feature extracting method, device and the electronic equipment of application software

Also Published As

Publication number Publication date
CN111222019A (en) 2020-06-02

Similar Documents

Publication Publication Date Title
CN111222019B (en) Feature extraction method and device
CN111865815B (en) Flow classification method and system based on federal learning
CN110011931B (en) Encrypted flow type detection method and system
CN106815112B (en) Massive data monitoring system and method based on deep packet inspection
US10084713B2 (en) Protocol type identification method and apparatus
US7840664B2 (en) Automated characterization of network traffic
US8676729B1 (en) Network traffic classification using subspace clustering techniques
US9473380B1 (en) Automatic parsing of binary-based application protocols using network traffic
US11095670B2 (en) Hierarchical activation of scripts for detecting a security threat to a network using a programmable data plane
CN110417729B (en) Service and application classification method and system for encrypted traffic
US10498618B2 (en) Attributing network address translation device processed traffic to individual hosts
CN106789242B (en) Intelligent identification application analysis method based on mobile phone client software dynamic feature library
CN102739457A (en) Network flow recognition system and method based on DPI (Deep Packet Inspection) and SVM (Support Vector Machine) technology
CN111865996A (en) Data detection method and device and electronic equipment
CN101635720A (en) Filtering method of unknown flow rate and bandwidth management equipment
CN110099138A (en) A kind of method and system handling the DHCP data with VLAN TAG
CN117240560A (en) GAN-based high-simulation honeypot implementation method and system
Langthasa et al. Classification of network traffic in LAN
CN115664739B (en) User identity attribute active detection method and system based on flow characteristic matching
CN111835720A (en) VPN flow WEB fingerprint identification method based on feature enhancement
Qin et al. Behavior spectrum: An effective method for user's web access behavior monitoring and measurement
KR101605187B1 (en) Apparatus and method for collecting unknown traffic flow to analysis application traffic
RU2697698C2 (en) Method of processing network traffic using firewall method
CN115589362B (en) Method for generating and identifying device type fingerprint, device and medium
RU181257U1 (en) Data Clustering Firewall

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant