CN112953852A - Application identification method based on TCP protocol payload characteristics - Google Patents

Application identification method based on TCP protocol payload characteristics Download PDF

Info

Publication number
CN112953852A
CN112953852A CN202110112860.4A CN202110112860A CN112953852A CN 112953852 A CN112953852 A CN 112953852A CN 202110112860 A CN202110112860 A CN 202110112860A CN 112953852 A CN112953852 A CN 112953852A
Authority
CN
China
Prior art keywords
node
matching
successful
payload
flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110112860.4A
Other languages
Chinese (zh)
Inventor
王玉其
林喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sunmi Technology Group Co Ltd
Shanghai Sunmi Technology Co Ltd
Shenzhen Michelangelo Technology Co Ltd
Original Assignee
Shanghai Sunmi Technology Group Co Ltd
Shenzhen Michelangelo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sunmi Technology Group Co Ltd, Shenzhen Michelangelo Technology Co Ltd filed Critical Shanghai Sunmi Technology Group Co Ltd
Priority to CN202110112860.4A priority Critical patent/CN112953852A/en
Publication of CN112953852A publication Critical patent/CN112953852A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/163In-band adaptation of TCP data exchange; In-band control procedures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/028Capturing of monitoring data by filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses an application identification method based on TCP protocol payload characteristics, which comprises the following steps: s1, carrying out data message; s2, extracting payload fields; s3, dictionary tree feature matching is carried out; and S4, judging whether the matching is successful or not, marking the flow as unidentified flow when the matching is not successful, and marking the flow as identified flow when the matching is successful. According to the invention, the problem of serious system performance loss in the identification process is avoided.

Description

Application identification method based on TCP protocol payload characteristics
Technical Field
The invention relates to the technical field of DPI technology, in particular to an application identification method based on TCP protocol payload characteristics.
Background
The DPI technology is a traditional application traffic identification technology, and the basic principle is that the application is comprehensively judged by combining the fingerprint characteristics of various applications on the basis of analyzing data message payload. An application traffic identification system implemented based on DPI technology generally includes two parts: the package detects a "fingerprint" feature library that identifies the engine and application. The application characteristic library is a text file generated by a series of application characteristics according to a certain format, and the packet detection and identification engine is mainly responsible for analyzing data messages, extracting characteristic fields and matching the characteristic library. Therefore, the DPI-based application traffic identification process can be simplified into a process of extracting the characteristic fields of the data messages and matching the characteristic fields with the application characteristic library.
The TCP protocol payload is a data segment of a TCP protocol layer, and a payload field of the same application data packet often contains unique characteristic "fingerprint" information of the application, and by these characteristics, we can conveniently identify the application of the data stream. The Payload feature is marked in the form of "pos: value", where pos represents the position information of the value in the Payload, and value represents the numerical value information (1Byte, 16-system representation) of the corresponding position. A single application may contain multiple payload messages and a payload feature may contain multiple sets of "pos value" messages. The following is an example application characteristic information:
00:0x17|02:0x00|03:0x03
the payload feature described above is equivalent to: 17 × 0003 (16-ary prefix omitted, wild card character).
However, in the prior art, the npi traverses each protocol parser until parsing is successful or all parsing fails, which is o (n) in time complexity, and n is the number of protocol parsers. When the number n of the protocol resolvers is increased, the resolution and identification efficiency is in a linear descending trend along with the number of the protocol resolvers.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an application identification method based on TCP protocol payload characteristics, and the problem of serious system performance loss in the identification process is solved. To achieve the above objects and other advantages in accordance with the present invention, there is provided an application recognition method based on a TCP protocol payload feature, comprising the steps of:
s1, carrying out data message;
s2, extracting payload fields;
s3, dictionary tree feature matching is carried out;
and S4, judging whether the matching is successful or not, marking the flow as unidentified flow when the matching is not successful, and marking the flow as identified flow when the matching is successful.
Preferably, the step S3 includes the steps of:
s31, whether wildcard nodes exist in the payload data or not is judged;
s32, when judging that the payload data has no wildcard node, judging whether the node matching is successful;
and S33, judging whether the node is the tail node or not when judging that the node matching is successful.
Preferably, when it is determined that the payload data has the wildcard node, the node moves to the next layer, and returns to step S31.
Preferably, when the node matching is judged to be unsuccessful, the matching is indicated to be failed.
Preferably, when the node is determined to be the tail node, the matching is successful, and when the node is determined not to be the tail node, the node moves to the next layer, and the step S31 is returned.
Compared with the prior art, the invention has the beneficial effects that: in order to avoid performance loss caused by traversal operation, the fingerprint information of the application is stored by adopting the dictionary tree structure, a traversal form is not needed when feature matching is carried out, meanwhile, the problem of high space complexity of the dictionary tree structure is optimized, the complexity of feature matching time is close to O (1), and the efficiency of application identification is not limited by the size of a feature library.
Drawings
FIG. 1 is a flowchart of payload feature recognition of an application recognition method based on TCP protocol payload features according to the present invention;
fig. 2 is a flow chart of dictionary tree structure matching of the application recognition method based on TCP protocol payload characteristics according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-2, an application identification method based on TCP protocol payload characteristics includes the following steps: s1, carrying out data message;
s2, extracting payload fields;
s3, dictionary tree feature matching is carried out;
and S4, judging whether the matching is successful or not, marking the flow as unidentified flow when the matching is not successful, and marking the flow as identified flow when the matching is successful.
Further, the step S3 includes the following steps:
s31, whether wildcard nodes exist in the payload data or not is judged;
s32, when judging that the payload data has no wildcard node, judging whether the node matching is successful;
and S33, judging whether the node is the tail node or not when judging that the node matching is successful.
Further, when it is determined that the payload data has the wildcard node, the proceeding node moves to the next layer, and returns to step S31.
Further, when the node matching is judged to be unsuccessful, the matching is indicated to be failed.
Further, when the node is determined to be the tail node, the matching is successful, and when the node is determined not to be the tail node, the node is moved to the next layer, and the step S31 is returned.
The invention obtains the type of the related service flow by analyzing the data packet content of an upper layer protocol (above an IP layer) on the basis of the traditional quintuple detection and according to the search of data characteristic words or the behavior statistics of the service, and optimizes the problem of system performance loss caused by the traditional traversal characteristic library/analyzer in the means of structural characteristic storage and matching of a dictionary tree.
The number of devices and the scale of the processes described herein are intended to simplify the description of the invention, and applications, modifications and variations of the invention will be apparent to those skilled in the art.
While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable in various fields of endeavor to which the invention pertains, and further modifications may readily be made by those skilled in the art, it being understood that the invention is not limited to the details shown and described herein without departing from the general concept defined by the appended claims and their equivalents.

Claims (5)

1. An application identification method based on TCP protocol payload characteristics is characterized by comprising the following steps:
s1, carrying out data message;
s2, extracting payload fields;
s3, dictionary tree feature matching is carried out;
and S4, judging whether the matching is successful or not, marking the flow as unidentified flow when the matching is not successful, and marking the flow as identified flow when the matching is successful.
2. The method for identifying an application based on the payload feature of the TCP protocol as claimed in claim 1, wherein the step S3 comprises the steps of:
s31, whether wildcard nodes exist in the payload data or not is judged;
s32, when judging that the payload data has no wildcard node, judging whether the node matching is successful;
and S33, judging whether the node is the tail node or not when judging that the node matching is successful.
3. The method as claimed in claim 2, wherein when it is determined that there is a wildcard node in the payload data, the node moves to the next layer and returns to step S31.
4. The method as claimed in claim 2, wherein when the node matching is determined to be unsuccessful, it indicates that the matching is failed.
5. The method as claimed in claim 2, wherein when the node is determined to be the end node, the matching is successful, and when the node is determined not to be the end node, the node moves to the next layer, and the step S31 is returned.
CN202110112860.4A 2021-01-27 2021-01-27 Application identification method based on TCP protocol payload characteristics Pending CN112953852A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110112860.4A CN112953852A (en) 2021-01-27 2021-01-27 Application identification method based on TCP protocol payload characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110112860.4A CN112953852A (en) 2021-01-27 2021-01-27 Application identification method based on TCP protocol payload characteristics

Publications (1)

Publication Number Publication Date
CN112953852A true CN112953852A (en) 2021-06-11

Family

ID=76238029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110112860.4A Pending CN112953852A (en) 2021-01-27 2021-01-27 Application identification method based on TCP protocol payload characteristics

Country Status (1)

Country Link
CN (1) CN112953852A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105373601A (en) * 2015-11-09 2016-03-02 国家计算机网络与信息安全管理中心 Keyword word frequency characteristic-based multimode matching method
CN106295366A (en) * 2016-08-15 2017-01-04 北京奇虎科技有限公司 Sensitive data recognition methods and device
CN106815112A (en) * 2015-11-27 2017-06-09 大唐软件技术股份有限公司 A kind of mass data monitoring system and method based on deep-packet detection
CN112100361A (en) * 2020-11-12 2020-12-18 南京中孚信息技术有限公司 Character string multimode fuzzy matching method based on AC automaton

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105373601A (en) * 2015-11-09 2016-03-02 国家计算机网络与信息安全管理中心 Keyword word frequency characteristic-based multimode matching method
CN106815112A (en) * 2015-11-27 2017-06-09 大唐软件技术股份有限公司 A kind of mass data monitoring system and method based on deep-packet detection
CN106295366A (en) * 2016-08-15 2017-01-04 北京奇虎科技有限公司 Sensitive data recognition methods and device
CN112100361A (en) * 2020-11-12 2020-12-18 南京中孚信息技术有限公司 Character string multimode fuzzy matching method based on AC automaton

Similar Documents

Publication Publication Date Title
US7512634B2 (en) Systems and methods for processing regular expressions
CN109063745B (en) Network equipment type identification method and system based on decision tree
RU2608464C2 (en) Device, method and network server for detecting data structures in data stream
CN102647414B (en) Protocol analysis method, protocol analysis device and protocol analysis system
US20030204584A1 (en) Apparatus and method for pattern matching in text based protocol
US20060085389A1 (en) Method for transformation of regular expressions
US9064032B2 (en) Blended match mode DFA scanning
US20120290736A1 (en) Systems and Methods for Processing Regular Expressions
WO2003023548A2 (en) High speed data stream pattern recognition
CN102195977A (en) Network protocol identification method and device
CN111585832A (en) Industrial control protocol reverse analysis method based on semantic pre-mining
CN111988231A (en) Mask five-tuple rule matching method and device
CN113411290A (en) Packet header parsing method and device
CN112953852A (en) Application identification method based on TCP protocol payload characteristics
CN111371649B (en) Deep packet detection method and device
CN111950000A (en) Access access control method and device
EP2122503B1 (en) A method of filtering sections of a data stream
CN115168755A (en) Abnormal data processing method and system based on URL (Uniform resource locator) characteristics
CN115994210A (en) Method and device for quickly searching text in OFD document and electronic equipment
CN112887280B (en) Network protocol metadata extraction system and method based on automaton
CN111353018B (en) Data processing method and device based on deep packet inspection and network equipment
CN102130956A (en) Method and system for identifying application layer protocols
CN109688043B (en) IMAP protocol multi-link association analysis method and system
CN114422622B (en) Engineering mechanical equipment working condition data analysis method
CN111753150B (en) Graph search method-based method and system for accelerating epsilon closure computation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210611

RJ01 Rejection of invention patent application after publication