CN112953852A - Application identification method based on TCP protocol payload characteristics - Google Patents
Application identification method based on TCP protocol payload characteristics Download PDFInfo
- Publication number
- CN112953852A CN112953852A CN202110112860.4A CN202110112860A CN112953852A CN 112953852 A CN112953852 A CN 112953852A CN 202110112860 A CN202110112860 A CN 202110112860A CN 112953852 A CN112953852 A CN 112953852A
- Authority
- CN
- China
- Prior art keywords
- node
- matching
- successful
- payload
- flow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2483—Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
- H04L69/163—In-band adaptation of TCP data exchange; In-band control procedures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/02—Capturing of monitoring data
- H04L43/028—Capturing of monitoring data by filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/22—Parsing or analysis of headers
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention discloses an application identification method based on TCP protocol payload characteristics, which comprises the following steps: s1, carrying out data message; s2, extracting payload fields; s3, dictionary tree feature matching is carried out; and S4, judging whether the matching is successful or not, marking the flow as unidentified flow when the matching is not successful, and marking the flow as identified flow when the matching is successful. According to the invention, the problem of serious system performance loss in the identification process is avoided.
Description
Technical Field
The invention relates to the technical field of DPI technology, in particular to an application identification method based on TCP protocol payload characteristics.
Background
The DPI technology is a traditional application traffic identification technology, and the basic principle is that the application is comprehensively judged by combining the fingerprint characteristics of various applications on the basis of analyzing data message payload. An application traffic identification system implemented based on DPI technology generally includes two parts: the package detects a "fingerprint" feature library that identifies the engine and application. The application characteristic library is a text file generated by a series of application characteristics according to a certain format, and the packet detection and identification engine is mainly responsible for analyzing data messages, extracting characteristic fields and matching the characteristic library. Therefore, the DPI-based application traffic identification process can be simplified into a process of extracting the characteristic fields of the data messages and matching the characteristic fields with the application characteristic library.
The TCP protocol payload is a data segment of a TCP protocol layer, and a payload field of the same application data packet often contains unique characteristic "fingerprint" information of the application, and by these characteristics, we can conveniently identify the application of the data stream. The Payload feature is marked in the form of "pos: value", where pos represents the position information of the value in the Payload, and value represents the numerical value information (1Byte, 16-system representation) of the corresponding position. A single application may contain multiple payload messages and a payload feature may contain multiple sets of "pos value" messages. The following is an example application characteristic information:
00:0x17|02:0x00|03:0x03
the payload feature described above is equivalent to: 17 × 0003 (16-ary prefix omitted, wild card character).
However, in the prior art, the npi traverses each protocol parser until parsing is successful or all parsing fails, which is o (n) in time complexity, and n is the number of protocol parsers. When the number n of the protocol resolvers is increased, the resolution and identification efficiency is in a linear descending trend along with the number of the protocol resolvers.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an application identification method based on TCP protocol payload characteristics, and the problem of serious system performance loss in the identification process is solved. To achieve the above objects and other advantages in accordance with the present invention, there is provided an application recognition method based on a TCP protocol payload feature, comprising the steps of:
s1, carrying out data message;
s2, extracting payload fields;
s3, dictionary tree feature matching is carried out;
and S4, judging whether the matching is successful or not, marking the flow as unidentified flow when the matching is not successful, and marking the flow as identified flow when the matching is successful.
Preferably, the step S3 includes the steps of:
s31, whether wildcard nodes exist in the payload data or not is judged;
s32, when judging that the payload data has no wildcard node, judging whether the node matching is successful;
and S33, judging whether the node is the tail node or not when judging that the node matching is successful.
Preferably, when it is determined that the payload data has the wildcard node, the node moves to the next layer, and returns to step S31.
Preferably, when the node matching is judged to be unsuccessful, the matching is indicated to be failed.
Preferably, when the node is determined to be the tail node, the matching is successful, and when the node is determined not to be the tail node, the node moves to the next layer, and the step S31 is returned.
Compared with the prior art, the invention has the beneficial effects that: in order to avoid performance loss caused by traversal operation, the fingerprint information of the application is stored by adopting the dictionary tree structure, a traversal form is not needed when feature matching is carried out, meanwhile, the problem of high space complexity of the dictionary tree structure is optimized, the complexity of feature matching time is close to O (1), and the efficiency of application identification is not limited by the size of a feature library.
Drawings
FIG. 1 is a flowchart of payload feature recognition of an application recognition method based on TCP protocol payload features according to the present invention;
fig. 2 is a flow chart of dictionary tree structure matching of the application recognition method based on TCP protocol payload characteristics according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-2, an application identification method based on TCP protocol payload characteristics includes the following steps: s1, carrying out data message;
s2, extracting payload fields;
s3, dictionary tree feature matching is carried out;
and S4, judging whether the matching is successful or not, marking the flow as unidentified flow when the matching is not successful, and marking the flow as identified flow when the matching is successful.
Further, the step S3 includes the following steps:
s31, whether wildcard nodes exist in the payload data or not is judged;
s32, when judging that the payload data has no wildcard node, judging whether the node matching is successful;
and S33, judging whether the node is the tail node or not when judging that the node matching is successful.
Further, when it is determined that the payload data has the wildcard node, the proceeding node moves to the next layer, and returns to step S31.
Further, when the node matching is judged to be unsuccessful, the matching is indicated to be failed.
Further, when the node is determined to be the tail node, the matching is successful, and when the node is determined not to be the tail node, the node is moved to the next layer, and the step S31 is returned.
The invention obtains the type of the related service flow by analyzing the data packet content of an upper layer protocol (above an IP layer) on the basis of the traditional quintuple detection and according to the search of data characteristic words or the behavior statistics of the service, and optimizes the problem of system performance loss caused by the traditional traversal characteristic library/analyzer in the means of structural characteristic storage and matching of a dictionary tree.
The number of devices and the scale of the processes described herein are intended to simplify the description of the invention, and applications, modifications and variations of the invention will be apparent to those skilled in the art.
While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable in various fields of endeavor to which the invention pertains, and further modifications may readily be made by those skilled in the art, it being understood that the invention is not limited to the details shown and described herein without departing from the general concept defined by the appended claims and their equivalents.
Claims (5)
1. An application identification method based on TCP protocol payload characteristics is characterized by comprising the following steps:
s1, carrying out data message;
s2, extracting payload fields;
s3, dictionary tree feature matching is carried out;
and S4, judging whether the matching is successful or not, marking the flow as unidentified flow when the matching is not successful, and marking the flow as identified flow when the matching is successful.
2. The method for identifying an application based on the payload feature of the TCP protocol as claimed in claim 1, wherein the step S3 comprises the steps of:
s31, whether wildcard nodes exist in the payload data or not is judged;
s32, when judging that the payload data has no wildcard node, judging whether the node matching is successful;
and S33, judging whether the node is the tail node or not when judging that the node matching is successful.
3. The method as claimed in claim 2, wherein when it is determined that there is a wildcard node in the payload data, the node moves to the next layer and returns to step S31.
4. The method as claimed in claim 2, wherein when the node matching is determined to be unsuccessful, it indicates that the matching is failed.
5. The method as claimed in claim 2, wherein when the node is determined to be the end node, the matching is successful, and when the node is determined not to be the end node, the node moves to the next layer, and the step S31 is returned.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110112860.4A CN112953852A (en) | 2021-01-27 | 2021-01-27 | Application identification method based on TCP protocol payload characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110112860.4A CN112953852A (en) | 2021-01-27 | 2021-01-27 | Application identification method based on TCP protocol payload characteristics |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112953852A true CN112953852A (en) | 2021-06-11 |
Family
ID=76238029
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110112860.4A Pending CN112953852A (en) | 2021-01-27 | 2021-01-27 | Application identification method based on TCP protocol payload characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112953852A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105373601A (en) * | 2015-11-09 | 2016-03-02 | 国家计算机网络与信息安全管理中心 | Keyword word frequency characteristic-based multimode matching method |
CN106295366A (en) * | 2016-08-15 | 2017-01-04 | 北京奇虎科技有限公司 | Sensitive data recognition methods and device |
CN106815112A (en) * | 2015-11-27 | 2017-06-09 | 大唐软件技术股份有限公司 | A kind of mass data monitoring system and method based on deep-packet detection |
CN112100361A (en) * | 2020-11-12 | 2020-12-18 | 南京中孚信息技术有限公司 | Character string multimode fuzzy matching method based on AC automaton |
-
2021
- 2021-01-27 CN CN202110112860.4A patent/CN112953852A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105373601A (en) * | 2015-11-09 | 2016-03-02 | 国家计算机网络与信息安全管理中心 | Keyword word frequency characteristic-based multimode matching method |
CN106815112A (en) * | 2015-11-27 | 2017-06-09 | 大唐软件技术股份有限公司 | A kind of mass data monitoring system and method based on deep-packet detection |
CN106295366A (en) * | 2016-08-15 | 2017-01-04 | 北京奇虎科技有限公司 | Sensitive data recognition methods and device |
CN112100361A (en) * | 2020-11-12 | 2020-12-18 | 南京中孚信息技术有限公司 | Character string multimode fuzzy matching method based on AC automaton |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7512634B2 (en) | Systems and methods for processing regular expressions | |
CN109063745B (en) | Network equipment type identification method and system based on decision tree | |
RU2608464C2 (en) | Device, method and network server for detecting data structures in data stream | |
CN102647414B (en) | Protocol analysis method, protocol analysis device and protocol analysis system | |
US20030204584A1 (en) | Apparatus and method for pattern matching in text based protocol | |
US20060085389A1 (en) | Method for transformation of regular expressions | |
US9064032B2 (en) | Blended match mode DFA scanning | |
US20120290736A1 (en) | Systems and Methods for Processing Regular Expressions | |
WO2003023548A2 (en) | High speed data stream pattern recognition | |
CN102195977A (en) | Network protocol identification method and device | |
CN111585832A (en) | Industrial control protocol reverse analysis method based on semantic pre-mining | |
CN111988231A (en) | Mask five-tuple rule matching method and device | |
CN113411290A (en) | Packet header parsing method and device | |
CN112953852A (en) | Application identification method based on TCP protocol payload characteristics | |
CN111371649B (en) | Deep packet detection method and device | |
CN111950000A (en) | Access access control method and device | |
EP2122503B1 (en) | A method of filtering sections of a data stream | |
CN115168755A (en) | Abnormal data processing method and system based on URL (Uniform resource locator) characteristics | |
CN115994210A (en) | Method and device for quickly searching text in OFD document and electronic equipment | |
CN112887280B (en) | Network protocol metadata extraction system and method based on automaton | |
CN111353018B (en) | Data processing method and device based on deep packet inspection and network equipment | |
CN102130956A (en) | Method and system for identifying application layer protocols | |
CN109688043B (en) | IMAP protocol multi-link association analysis method and system | |
CN114422622B (en) | Engineering mechanical equipment working condition data analysis method | |
CN111753150B (en) | Graph search method-based method and system for accelerating epsilon closure computation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210611 |
|
RJ01 | Rejection of invention patent application after publication |