CN114598616A - Efficient mode matching method for solving real-time mass data - Google Patents

Efficient mode matching method for solving real-time mass data Download PDF

Info

Publication number
CN114598616A
CN114598616A CN202210496478.2A CN202210496478A CN114598616A CN 114598616 A CN114598616 A CN 114598616A CN 202210496478 A CN202210496478 A CN 202210496478A CN 114598616 A CN114598616 A CN 114598616A
Authority
CN
China
Prior art keywords
data
keyword
submodule
real
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210496478.2A
Other languages
Chinese (zh)
Inventor
杨贻宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Feiqi Network Technology Co ltd
Original Assignee
Shanghai Feiqi Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Feiqi Network Technology Co ltd filed Critical Shanghai Feiqi Network Technology Co ltd
Priority to CN202210496478.2A priority Critical patent/CN114598616A/en
Publication of CN114598616A publication Critical patent/CN114598616A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/028Capturing of monitoring data by filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/026Capturing of monitoring data using flow identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • H04L45/745Address table lookup; Address filtering
    • H04L45/7453Address table lookup; Address filtering using hashing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to an efficient mode matching method for solving real-time mass data. Which comprises the following steps: the data is transmitted to the long and short packet filtering module through the interface adaptation module; judging the packaging type of the data through the analysis submodule; extracting keywords through a keyword extraction submodule; filtering the keyword data by a BF engine; the keyword compression submodule reduces the length of the keywords; the extracted keyword data is sent out through the table look-up submodule; the designed chip-level pattern matching engine eliminates the defects of small capacity of an FPGA, high power consumption and price of a TCAM (ternary content addressable memory) and low matching precision of Bloom filers, fully and organically combines the advantages of the FPGA and the TCAM, and improves the data matching performance from a K level to a G level, reduces the power consumption by 30 percent and reduces the manufacturing cost by 20 percent.

Description

Efficient mode matching method for solving real-time mass data
Technical Field
The invention relates to the technical field of data processing, in particular to an efficient mode matching method for solving real-time mass data.
Background
The multi-source heterogeneous real-time data multi-mode matching is the first step of transmission resource acquisition and analysis and is also a crucial step. The transmission rate of the backbone network reaches 40Gpbs at present, and with the development of technology, networks of 100Gbsp and even TGbsp can appear in the future. How to perform real-time and accurate transmission resource acquisition and analysis at such a high rate without affecting the existing transmission service in the network is a difficult problem in the field of information transmission networks. In order to solve the problem of fast matching, an AC multimode matching algorithm is proposed, which is mainly used for a network intrusion detection system, and because each jump of an automaton can only process one character, and the jumps are too frequent, an external memory needs to be continuously accessed to read corresponding jump information, thereby causing great access delay and seriously influencing the improvement of matching speed. In order to further improve the efficiency of the AC algorithm, an AC-based improved K-step state machine algorithm is proposed. At present, the software is simple and common to implement, but the data real-time processing speed can only reach 100 KBps. With the coming of the multi-source heterogeneous data era, more and more data are processed, data flow is larger and larger, and a matching mode of software implementation is long-felt. The mode matching engine realized by hardware is widely researched at present, and mainly has multi-mode matching based on FPGA, but as the FPGA has limited resources, too many modes are difficult to store, complex scenes cannot be coped with, extra ROM or SDRAM support is inevitably needed to be added, and thus the acquisition and analysis speed and the real-time performance of mass data are reduced. In order to respond to the problem, people propose multi-mode matching based on FPGA and TCAM, the TCAM provides mode storage and parallel access, and the mode matching speed is improved, but the TCAM has the problems of overlarge power consumption, capacity increase, price increase and cost increase, and is not beneficial to low-carbon economy advocated at present. In order to solve the problem of overlarge power consumption of the TCAM, a Bloom filter + FPGA matching mode is generally adopted, but the Bloom filter has the defect of false alarm in matching, so that the matching precision is reduced.
At present, the field of transmission resource acquisition and analysis mainly aims at realizing multimode matching by software. In order to deal with the development trend of multi-source heterogeneous transmission resource acquisition and analysis, the project fully considers the characteristics of various hardware chips, designs a two-stage multi-mode matching structure BF-TCAM (Bloom Filter-Ternary Content Addressable Memory), combines the respective advantages of the TCAM and the Bloom Filter, realizes high-speed accurate multi-mode matching, and keeps the power consumption of the whole system at a lower level.
Disclosure of Invention
The invention aims to provide an efficient mode matching method for solving real-time mass data so as to solve the problems in the background technology.
In order to achieve the above object, the present invention provides an efficient pattern matching method for solving real-time mass data, comprising the following steps:
s1, transmitting the data to a long and short packet filtering module through an interface adaptation module, filtering the overlong and overlong data packets, and transmitting the rest data packets to an analysis submodule;
s2, judging the packaging type of the data through an analysis submodule, analyzing the packaging type of the data and extracting contents;
s3, extracting keywords from the extracted data packet through a keyword extraction submodule;
s4, filtering the sub-module of the keyword data through a BF engine;
s5, transmitting the keyword data to a keyword compression submodule to reduce the length of the keyword;
and S6, the extracted keyword data is sent out through the table look-up sub-module, so that the FPGA can be ensured to accurately send out the keywords according to the operation specification of the TCAM chip, and the hit address sent out from the result bus is read, thereby ensuring the rapidness and accuracy of matching.
As a further improvement of the technical solution, in S1, the long and short packet filtering module specifically includes the steps of: and sending the overlong and overlong data packets to an upper layer, determining whether to discard or continue forwarding by upper layer configuration, and sending other data packets to an analysis submodule for further analysis.
As a further improvement of the technical solution, in S2, the parsing submodule first determines whether the encapsulation type of the incoming data is PPP, directly encapsulated IP packet, non-IP packet, or MPLS1, 2, 3, 4 layer encapsulated data packet, and then parses the data to extract the content.
As a further improvement of the technical solution, in S3, the keyword extraction sub-module specifically includes: and extracting a data field to be searched in the data packet, distinguishing whether the incoming data is a TCP type or a UDP type, and extracting keywords respectively.
As a further improvement of the technical solution, in S4, the BF engine filtering submodule is a Bloom filter and TCAM two-stage multi-mode matching architecture, and combines respective advantages of the TCAM and the Bloom filter, thereby not only realizing high-speed accurate multi-mode matching, but also keeping the power consumption of the entire system at a lower level; the first-stage filtering matching of the two-stage multi-mode matching architecture is realized by the Bloom filter, the matching is guaranteed to be rapid, the power consumption is not high, after the first-stage matching is successful, the second-stage accurate matching is carried out, the TCAM is used for realizing the first-stage filtering matching, the design greatly reduces the execution times of the second-stage matching in the first stage, the power consumption is reduced, meanwhile, due to the fact that the second-stage accurate matching is carried out, the misinformation of the Bloom filter is eliminated, and the matching precision is guaranteed. Aiming at the problem of high manufacturing cost of the high-capacity TCAM, a dynamic loading mode matching mechanism is adopted in the engine, dynamic matching is realized according to service requirements, dependence on the high-price TCAM is reduced, and the manufacturing cost of the whole system is greatly reduced.
As a further improvement of the technical solution, in S5, the keyword compression sub-module reduces the length of the keyword to be matched by HASH compression using a HASH-TCAM algorithm, thereby increasing the matching speed.
As a further improvement of the present technical solution, in S6, the keyword data is sent to the service control module through the table lookup sub-module.
Compared with the prior art, the invention has the following beneficial effects:
in the efficient pattern matching method for solving the real-time mass data, the designed chip-level pattern matching engine eliminates the defects of small FPGA capacity, high TCAM power consumption and price and low Bloom filter matching precision, fully and organically combines the advantages of the FPGA and the TCAM, improves the data matching performance from K level to G level, reduces the power consumption by 30 percent and reduces the manufacturing cost by 20 percent. Meanwhile, the chip-level pattern matching engine is modularized by hardware, and multiple groups of data processing modules can be deployed on the physical data processing module according to the speed requirement and the number of interfaces for actual resource acquisition, so that the concurrent processing capacity and speed of acquisition equipment are further increased, and support is provided for software definition matching.
Drawings
FIG. 1 is an overall flow diagram of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Embodiment 1 a method for efficient pattern matching for solving real-time mass data, comprising:
1. the data is transmitted to the long and short packet filtering module through the interface adaptation module, and the specific steps are as follows: sending the overlong and overlong data packets to an upper layer, determining whether to discard or continue forwarding by upper layer configuration, and sending other data packets to an analysis submodule for next analysis;
2. firstly, judging the encapsulation type of the incoming data by an analysis submodule, namely judging whether the incoming data is PPP, a directly encapsulated IP packet, a non-IP packet and an MPLS1, 2, 3 and 4-layer encapsulated data packet, then analyzing the data packet and extracting the content;
3. extracting keywords from the extracted data packet through a keyword extraction submodule, and specifically comprising the following steps of: extracting a data field to be searched in a data packet, distinguishing whether the data field is a TCP type or a UDP type for the coming data, and then extracting keywords respectively;
4. the keyword data passes through a BF engine filtering submodule which is a Bloom filter and TCAM two-stage multi-mode matching architecture, and the respective advantages of the TCAM and the Bloom filter are combined, so that high-speed accurate multi-mode matching is realized, and the power consumption of the whole system is kept at a lower level; the first-stage filtering matching of the two-stage multi-mode matching architecture is realized by the Bloom filter, the matching is guaranteed to be rapid, the power consumption is not high, after the first-stage matching is successful, the second-stage accurate matching is carried out, the TCAM is used for realizing the first-stage filtering matching, the design greatly reduces the execution times of the second-stage matching in the first stage, the power consumption is reduced, meanwhile, due to the fact that the second-stage accurate matching is carried out, the misinformation of the Bloom filter is eliminated, and the matching precision is guaranteed. Aiming at the problem of high manufacturing cost of a high-capacity TCAM, a dynamic loading mode matching mechanism is adopted in an engine to realize dynamic matching according to service requirements, reduce the dependence on the high-price TCAM and greatly reduce the manufacturing cost of the whole system;
5. the keyword data is transmitted to a keyword compression submodule for keyword length reduction, and the module adopts a HASH-TCAM algorithm to reduce the length of the keyword to be matched through Hash compression, so that the matching speed is improved;
6. the extracted keyword data is sent to the service control module through the table look-up sub-module, so that the FPGA can be guaranteed to accurately send out the keywords according to the operation specification of the TCAM chip, the hit address sent out from the result bus is read, and the matching is fast and accurate.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (7)

1. A high-efficiency pattern matching method for solving real-time mass data is characterized by comprising the following steps:
s1, transmitting the data to a long and short packet filtering module through an interface adaptation module, filtering the overlong and overlong data packets, and transmitting the rest data packets to an analysis submodule;
s2, judging the packaging type of the data through an analysis submodule, analyzing the packaging type of the data and extracting contents;
s3, extracting keywords from the extracted data packet through a keyword extraction submodule;
s4, filtering the sub-module of the keyword data through a BF engine;
s5, transmitting the keyword data to a keyword compression submodule to reduce the length of the keyword;
and S6, sending the extracted keyword data through the table lookup submodule.
2. The efficient pattern matching method for solving the problem of real-time mass data according to claim 1, characterized in that: in S1, the long and short packet filtering module specifically includes the steps of: and sending the overlong and overlong data packets to an upper layer, determining whether to discard or continue forwarding by upper layer configuration, and sending other data packets to an analysis submodule for further analysis.
3. The efficient pattern matching method for solving the problem of real-time mass data according to claim 1, characterized in that: in S2, the parsing submodule first determines whether the incoming data is of the encapsulation type, that is, whether the incoming data is PPP, directly encapsulated IP packet, non-IP packet, or MPLS1, 2, 3, or 4 encapsulated data packet, and then parses the incoming data to extract the content.
4. The efficient pattern matching method for solving the problem of real-time mass data according to claim 1, characterized in that: in S3, the keyword extraction sub-module specifically includes: and extracting a data field to be searched in the data packet, distinguishing whether the incoming data is a TCP type or a UDP type, and extracting keywords respectively.
5. The efficient pattern matching method for solving the problem of real-time mass data according to claim 1, characterized in that: in the S4, the BF engine filtering submodule is a Bloom filter and TCAM two-stage multi-mode matching architecture; the first-stage filtering matching of the two-stage multi-mode matching architecture is realized by Bloom filters, and after the first-stage filtering matching is successful, the second-stage accurate matching is carried out and is realized by TCAM.
6. The efficient pattern matching method for solving the problem of real-time mass data according to claim 1, characterized in that: in the step S5, the keyword compression sub-module reduces the length of the keyword to be matched by HASH compression using a HASH-TCAM algorithm.
7. The efficient pattern matching method for solving the problem of real-time mass data according to claim 1, characterized in that: in S6, the keyword data is sent to the service control module through the table lookup sub-module.
CN202210496478.2A 2022-05-09 2022-05-09 Efficient mode matching method for solving real-time mass data Pending CN114598616A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210496478.2A CN114598616A (en) 2022-05-09 2022-05-09 Efficient mode matching method for solving real-time mass data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210496478.2A CN114598616A (en) 2022-05-09 2022-05-09 Efficient mode matching method for solving real-time mass data

Publications (1)

Publication Number Publication Date
CN114598616A true CN114598616A (en) 2022-06-07

Family

ID=81811565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210496478.2A Pending CN114598616A (en) 2022-05-09 2022-05-09 Efficient mode matching method for solving real-time mass data

Country Status (1)

Country Link
CN (1) CN114598616A (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070115986A1 (en) * 2005-11-01 2007-05-24 Udaya Shankara Method to perform exact string match in the data plane of a network processor
CN101321163A (en) * 2008-07-03 2008-12-10 江苏华丽网络工程有限公司 Integrated hardware implementing method for multi-layer amalgamation and parallel processing network access equipment
CN101359325A (en) * 2007-08-01 2009-02-04 北京启明星辰信息技术有限公司 Multi-key-word matching method for rapidly analyzing content
CN101478447A (en) * 2009-01-08 2009-07-08 中国人民解放军信息工程大学 Method and apparatus for deep packet detection
CN101848222A (en) * 2010-05-28 2010-09-29 武汉烽火网络有限责任公司 Inspection method and device of Internet deep packet
US20120275466A1 (en) * 2010-10-21 2012-11-01 Texas Instruments Incorporated System and method for classifying packets
CN104866502A (en) * 2014-02-25 2015-08-26 深圳市中兴微电子技术有限公司 Data matching method and device
CN105515997A (en) * 2015-12-07 2016-04-20 刘航天 BF_TCAM (Bloom Filter-Ternary Content Addressable Memory)-based high-efficiency range matching method for realizing zero range expansion
CN105553850A (en) * 2015-12-10 2016-05-04 北京浩瀚深度信息技术股份有限公司 URL blocking method based on FPGA and TCAM
US9424366B1 (en) * 2013-02-11 2016-08-23 Marvell International Ltd. Reducing power consumption in ternary content addressable memory (TCAM)
US20170046395A1 (en) * 2014-04-30 2017-02-16 Hewlett Packard Enterprise Development Lp Partitionable ternary content addressable memory (tcam) for use with a bloom filter
US20180063084A1 (en) * 2016-09-01 2018-03-01 Hewlett Packard Enterprise Development Lp Filtering of packets for packet types at network devices
CN111241138A (en) * 2020-01-14 2020-06-05 北京恒光信息技术股份有限公司 Data matching method and device
CN114297368A (en) * 2021-12-08 2022-04-08 无锡宏创盛安科技有限公司 Efficient keyword filtering method realized in FPGA (field programmable Gate array) way

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070115986A1 (en) * 2005-11-01 2007-05-24 Udaya Shankara Method to perform exact string match in the data plane of a network processor
CN101359325A (en) * 2007-08-01 2009-02-04 北京启明星辰信息技术有限公司 Multi-key-word matching method for rapidly analyzing content
CN101321163A (en) * 2008-07-03 2008-12-10 江苏华丽网络工程有限公司 Integrated hardware implementing method for multi-layer amalgamation and parallel processing network access equipment
CN101478447A (en) * 2009-01-08 2009-07-08 中国人民解放军信息工程大学 Method and apparatus for deep packet detection
CN101848222A (en) * 2010-05-28 2010-09-29 武汉烽火网络有限责任公司 Inspection method and device of Internet deep packet
US20120275466A1 (en) * 2010-10-21 2012-11-01 Texas Instruments Incorporated System and method for classifying packets
US9424366B1 (en) * 2013-02-11 2016-08-23 Marvell International Ltd. Reducing power consumption in ternary content addressable memory (TCAM)
CN104866502A (en) * 2014-02-25 2015-08-26 深圳市中兴微电子技术有限公司 Data matching method and device
US20170046395A1 (en) * 2014-04-30 2017-02-16 Hewlett Packard Enterprise Development Lp Partitionable ternary content addressable memory (tcam) for use with a bloom filter
CN105515997A (en) * 2015-12-07 2016-04-20 刘航天 BF_TCAM (Bloom Filter-Ternary Content Addressable Memory)-based high-efficiency range matching method for realizing zero range expansion
CN105553850A (en) * 2015-12-10 2016-05-04 北京浩瀚深度信息技术股份有限公司 URL blocking method based on FPGA and TCAM
US20180063084A1 (en) * 2016-09-01 2018-03-01 Hewlett Packard Enterprise Development Lp Filtering of packets for packet types at network devices
CN111241138A (en) * 2020-01-14 2020-06-05 北京恒光信息技术股份有限公司 Data matching method and device
CN114297368A (en) * 2021-12-08 2022-04-08 无锡宏创盛安科技有限公司 Efficient keyword filtering method realized in FPGA (field programmable Gate array) way

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
赵睿等: "深度包检测中的模式匹配算法研究", 《现代电子技术》 *
金冬成: "P2P检测控制系统中的协议分析", 《中国新通信》 *
陈正虎等: "一种基于Bloom-filter表项压缩的TCAM业务识别算法", 《电子与信息学报》 *

Similar Documents

Publication Publication Date Title
US8060546B2 (en) Positionally dependent pattern checking in character strings using deterministic finite automata
KR20030053038A (en) A method of improving the lookup performance of tree-type knowledge base searches
US20100131935A1 (en) System and method for compiling and matching regular expressions
CN111600796B (en) Flow identification device and method based on configurable analysis field
JP2003092598A (en) Packet transferring processor
WO2015184706A1 (en) Statistical counting device and implementation method therefor, and system having statistical counting device
CN105099957B (en) A kind of data packet forwarding method based on software checking book
CN103281257A (en) Method and device for processing protocol message
CN101060482B (en) A route search method and forwarding system
CN113411380A (en) Processing method, logic circuit and equipment based on FPGA (field programmable gate array) programmable session table
CN110324204B (en) High-speed regular expression matching engine and method implemented in FPGA (field programmable Gate array)
CN101650718A (en) Method and device for matching character strings
CN114422617A (en) Message processing method, system and computer readable storage medium
CN114598616A (en) Efficient mode matching method for solving real-time mass data
US6661792B1 (en) Apparatus for processing data packet of ethernet switch system and method thereof
CN117640510B (en) Efficient forwarding method and device for space terahertz network packet
SE531947C2 (en) Procedure, device and system for multi-field classification in a data communication network
CN101599910A (en) The method and apparatus that message sends
CN112187935B (en) Information identification method and read-only memory
US20240056393A1 (en) Packet forwarding method and device, and computer readable storage medium
CN114827030A (en) Flow classification device based on folded SRAM and table entry compression method
CN114610958B (en) Processing method and device of transmission resources and electronic equipment
CN112214429A (en) Data transmission device and method based on SRIO
CN116015696A (en) Firewall system, malicious software detection method and device
CN107426180B (en) Detection apparatus for data frame coverage of ethernet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220607