CN114598616A - Efficient mode matching method for solving real-time mass data - Google Patents
Efficient mode matching method for solving real-time mass data Download PDFInfo
- Publication number
- CN114598616A CN114598616A CN202210496478.2A CN202210496478A CN114598616A CN 114598616 A CN114598616 A CN 114598616A CN 202210496478 A CN202210496478 A CN 202210496478A CN 114598616 A CN114598616 A CN 114598616A
- Authority
- CN
- China
- Prior art keywords
- data
- keyword
- submodule
- real
- matching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/02—Capturing of monitoring data
- H04L43/028—Capturing of monitoring data by filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/02—Capturing of monitoring data
- H04L43/026—Capturing of monitoring data using flow identification
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/74—Address processing for routing
- H04L45/745—Address table lookup; Address filtering
- H04L45/7453—Address table lookup; Address filtering using hashing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention relates to the technical field of data processing, in particular to an efficient mode matching method for solving real-time mass data. Which comprises the following steps: the data is transmitted to the long and short packet filtering module through the interface adaptation module; judging the packaging type of the data through the analysis submodule; extracting keywords through a keyword extraction submodule; filtering the keyword data by a BF engine; the keyword compression submodule reduces the length of the keywords; the extracted keyword data is sent out through the table look-up submodule; the designed chip-level pattern matching engine eliminates the defects of small capacity of an FPGA, high power consumption and price of a TCAM (ternary content addressable memory) and low matching precision of Bloom filers, fully and organically combines the advantages of the FPGA and the TCAM, and improves the data matching performance from a K level to a G level, reduces the power consumption by 30 percent and reduces the manufacturing cost by 20 percent.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to an efficient mode matching method for solving real-time mass data.
Background
The multi-source heterogeneous real-time data multi-mode matching is the first step of transmission resource acquisition and analysis and is also a crucial step. The transmission rate of the backbone network reaches 40Gpbs at present, and with the development of technology, networks of 100Gbsp and even TGbsp can appear in the future. How to perform real-time and accurate transmission resource acquisition and analysis at such a high rate without affecting the existing transmission service in the network is a difficult problem in the field of information transmission networks. In order to solve the problem of fast matching, an AC multimode matching algorithm is proposed, which is mainly used for a network intrusion detection system, and because each jump of an automaton can only process one character, and the jumps are too frequent, an external memory needs to be continuously accessed to read corresponding jump information, thereby causing great access delay and seriously influencing the improvement of matching speed. In order to further improve the efficiency of the AC algorithm, an AC-based improved K-step state machine algorithm is proposed. At present, the software is simple and common to implement, but the data real-time processing speed can only reach 100 KBps. With the coming of the multi-source heterogeneous data era, more and more data are processed, data flow is larger and larger, and a matching mode of software implementation is long-felt. The mode matching engine realized by hardware is widely researched at present, and mainly has multi-mode matching based on FPGA, but as the FPGA has limited resources, too many modes are difficult to store, complex scenes cannot be coped with, extra ROM or SDRAM support is inevitably needed to be added, and thus the acquisition and analysis speed and the real-time performance of mass data are reduced. In order to respond to the problem, people propose multi-mode matching based on FPGA and TCAM, the TCAM provides mode storage and parallel access, and the mode matching speed is improved, but the TCAM has the problems of overlarge power consumption, capacity increase, price increase and cost increase, and is not beneficial to low-carbon economy advocated at present. In order to solve the problem of overlarge power consumption of the TCAM, a Bloom filter + FPGA matching mode is generally adopted, but the Bloom filter has the defect of false alarm in matching, so that the matching precision is reduced.
At present, the field of transmission resource acquisition and analysis mainly aims at realizing multimode matching by software. In order to deal with the development trend of multi-source heterogeneous transmission resource acquisition and analysis, the project fully considers the characteristics of various hardware chips, designs a two-stage multi-mode matching structure BF-TCAM (Bloom Filter-Ternary Content Addressable Memory), combines the respective advantages of the TCAM and the Bloom Filter, realizes high-speed accurate multi-mode matching, and keeps the power consumption of the whole system at a lower level.
Disclosure of Invention
The invention aims to provide an efficient mode matching method for solving real-time mass data so as to solve the problems in the background technology.
In order to achieve the above object, the present invention provides an efficient pattern matching method for solving real-time mass data, comprising the following steps:
s1, transmitting the data to a long and short packet filtering module through an interface adaptation module, filtering the overlong and overlong data packets, and transmitting the rest data packets to an analysis submodule;
s2, judging the packaging type of the data through an analysis submodule, analyzing the packaging type of the data and extracting contents;
s3, extracting keywords from the extracted data packet through a keyword extraction submodule;
s4, filtering the sub-module of the keyword data through a BF engine;
s5, transmitting the keyword data to a keyword compression submodule to reduce the length of the keyword;
and S6, the extracted keyword data is sent out through the table look-up sub-module, so that the FPGA can be ensured to accurately send out the keywords according to the operation specification of the TCAM chip, and the hit address sent out from the result bus is read, thereby ensuring the rapidness and accuracy of matching.
As a further improvement of the technical solution, in S1, the long and short packet filtering module specifically includes the steps of: and sending the overlong and overlong data packets to an upper layer, determining whether to discard or continue forwarding by upper layer configuration, and sending other data packets to an analysis submodule for further analysis.
As a further improvement of the technical solution, in S2, the parsing submodule first determines whether the encapsulation type of the incoming data is PPP, directly encapsulated IP packet, non-IP packet, or MPLS1, 2, 3, 4 layer encapsulated data packet, and then parses the data to extract the content.
As a further improvement of the technical solution, in S3, the keyword extraction sub-module specifically includes: and extracting a data field to be searched in the data packet, distinguishing whether the incoming data is a TCP type or a UDP type, and extracting keywords respectively.
As a further improvement of the technical solution, in S4, the BF engine filtering submodule is a Bloom filter and TCAM two-stage multi-mode matching architecture, and combines respective advantages of the TCAM and the Bloom filter, thereby not only realizing high-speed accurate multi-mode matching, but also keeping the power consumption of the entire system at a lower level; the first-stage filtering matching of the two-stage multi-mode matching architecture is realized by the Bloom filter, the matching is guaranteed to be rapid, the power consumption is not high, after the first-stage matching is successful, the second-stage accurate matching is carried out, the TCAM is used for realizing the first-stage filtering matching, the design greatly reduces the execution times of the second-stage matching in the first stage, the power consumption is reduced, meanwhile, due to the fact that the second-stage accurate matching is carried out, the misinformation of the Bloom filter is eliminated, and the matching precision is guaranteed. Aiming at the problem of high manufacturing cost of the high-capacity TCAM, a dynamic loading mode matching mechanism is adopted in the engine, dynamic matching is realized according to service requirements, dependence on the high-price TCAM is reduced, and the manufacturing cost of the whole system is greatly reduced.
As a further improvement of the technical solution, in S5, the keyword compression sub-module reduces the length of the keyword to be matched by HASH compression using a HASH-TCAM algorithm, thereby increasing the matching speed.
As a further improvement of the present technical solution, in S6, the keyword data is sent to the service control module through the table lookup sub-module.
Compared with the prior art, the invention has the following beneficial effects:
in the efficient pattern matching method for solving the real-time mass data, the designed chip-level pattern matching engine eliminates the defects of small FPGA capacity, high TCAM power consumption and price and low Bloom filter matching precision, fully and organically combines the advantages of the FPGA and the TCAM, improves the data matching performance from K level to G level, reduces the power consumption by 30 percent and reduces the manufacturing cost by 20 percent. Meanwhile, the chip-level pattern matching engine is modularized by hardware, and multiple groups of data processing modules can be deployed on the physical data processing module according to the speed requirement and the number of interfaces for actual resource acquisition, so that the concurrent processing capacity and speed of acquisition equipment are further increased, and support is provided for software definition matching.
Drawings
FIG. 1 is an overall flow diagram of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Embodiment 1 a method for efficient pattern matching for solving real-time mass data, comprising:
1. the data is transmitted to the long and short packet filtering module through the interface adaptation module, and the specific steps are as follows: sending the overlong and overlong data packets to an upper layer, determining whether to discard or continue forwarding by upper layer configuration, and sending other data packets to an analysis submodule for next analysis;
2. firstly, judging the encapsulation type of the incoming data by an analysis submodule, namely judging whether the incoming data is PPP, a directly encapsulated IP packet, a non-IP packet and an MPLS1, 2, 3 and 4-layer encapsulated data packet, then analyzing the data packet and extracting the content;
3. extracting keywords from the extracted data packet through a keyword extraction submodule, and specifically comprising the following steps of: extracting a data field to be searched in a data packet, distinguishing whether the data field is a TCP type or a UDP type for the coming data, and then extracting keywords respectively;
4. the keyword data passes through a BF engine filtering submodule which is a Bloom filter and TCAM two-stage multi-mode matching architecture, and the respective advantages of the TCAM and the Bloom filter are combined, so that high-speed accurate multi-mode matching is realized, and the power consumption of the whole system is kept at a lower level; the first-stage filtering matching of the two-stage multi-mode matching architecture is realized by the Bloom filter, the matching is guaranteed to be rapid, the power consumption is not high, after the first-stage matching is successful, the second-stage accurate matching is carried out, the TCAM is used for realizing the first-stage filtering matching, the design greatly reduces the execution times of the second-stage matching in the first stage, the power consumption is reduced, meanwhile, due to the fact that the second-stage accurate matching is carried out, the misinformation of the Bloom filter is eliminated, and the matching precision is guaranteed. Aiming at the problem of high manufacturing cost of a high-capacity TCAM, a dynamic loading mode matching mechanism is adopted in an engine to realize dynamic matching according to service requirements, reduce the dependence on the high-price TCAM and greatly reduce the manufacturing cost of the whole system;
5. the keyword data is transmitted to a keyword compression submodule for keyword length reduction, and the module adopts a HASH-TCAM algorithm to reduce the length of the keyword to be matched through Hash compression, so that the matching speed is improved;
6. the extracted keyword data is sent to the service control module through the table look-up sub-module, so that the FPGA can be guaranteed to accurately send out the keywords according to the operation specification of the TCAM chip, the hit address sent out from the result bus is read, and the matching is fast and accurate.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (7)
1. A high-efficiency pattern matching method for solving real-time mass data is characterized by comprising the following steps:
s1, transmitting the data to a long and short packet filtering module through an interface adaptation module, filtering the overlong and overlong data packets, and transmitting the rest data packets to an analysis submodule;
s2, judging the packaging type of the data through an analysis submodule, analyzing the packaging type of the data and extracting contents;
s3, extracting keywords from the extracted data packet through a keyword extraction submodule;
s4, filtering the sub-module of the keyword data through a BF engine;
s5, transmitting the keyword data to a keyword compression submodule to reduce the length of the keyword;
and S6, sending the extracted keyword data through the table lookup submodule.
2. The efficient pattern matching method for solving the problem of real-time mass data according to claim 1, characterized in that: in S1, the long and short packet filtering module specifically includes the steps of: and sending the overlong and overlong data packets to an upper layer, determining whether to discard or continue forwarding by upper layer configuration, and sending other data packets to an analysis submodule for further analysis.
3. The efficient pattern matching method for solving the problem of real-time mass data according to claim 1, characterized in that: in S2, the parsing submodule first determines whether the incoming data is of the encapsulation type, that is, whether the incoming data is PPP, directly encapsulated IP packet, non-IP packet, or MPLS1, 2, 3, or 4 encapsulated data packet, and then parses the incoming data to extract the content.
4. The efficient pattern matching method for solving the problem of real-time mass data according to claim 1, characterized in that: in S3, the keyword extraction sub-module specifically includes: and extracting a data field to be searched in the data packet, distinguishing whether the incoming data is a TCP type or a UDP type, and extracting keywords respectively.
5. The efficient pattern matching method for solving the problem of real-time mass data according to claim 1, characterized in that: in the S4, the BF engine filtering submodule is a Bloom filter and TCAM two-stage multi-mode matching architecture; the first-stage filtering matching of the two-stage multi-mode matching architecture is realized by Bloom filters, and after the first-stage filtering matching is successful, the second-stage accurate matching is carried out and is realized by TCAM.
6. The efficient pattern matching method for solving the problem of real-time mass data according to claim 1, characterized in that: in the step S5, the keyword compression sub-module reduces the length of the keyword to be matched by HASH compression using a HASH-TCAM algorithm.
7. The efficient pattern matching method for solving the problem of real-time mass data according to claim 1, characterized in that: in S6, the keyword data is sent to the service control module through the table lookup sub-module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210496478.2A CN114598616A (en) | 2022-05-09 | 2022-05-09 | Efficient mode matching method for solving real-time mass data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210496478.2A CN114598616A (en) | 2022-05-09 | 2022-05-09 | Efficient mode matching method for solving real-time mass data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114598616A true CN114598616A (en) | 2022-06-07 |
Family
ID=81811565
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210496478.2A Pending CN114598616A (en) | 2022-05-09 | 2022-05-09 | Efficient mode matching method for solving real-time mass data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114598616A (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070115986A1 (en) * | 2005-11-01 | 2007-05-24 | Udaya Shankara | Method to perform exact string match in the data plane of a network processor |
CN101321163A (en) * | 2008-07-03 | 2008-12-10 | 江苏华丽网络工程有限公司 | Integrated hardware implementing method for multi-layer amalgamation and parallel processing network access equipment |
CN101359325A (en) * | 2007-08-01 | 2009-02-04 | 北京启明星辰信息技术有限公司 | Multi-key-word matching method for rapidly analyzing content |
CN101478447A (en) * | 2009-01-08 | 2009-07-08 | 中国人民解放军信息工程大学 | Method and apparatus for deep packet detection |
CN101848222A (en) * | 2010-05-28 | 2010-09-29 | 武汉烽火网络有限责任公司 | Inspection method and device of Internet deep packet |
US20120275466A1 (en) * | 2010-10-21 | 2012-11-01 | Texas Instruments Incorporated | System and method for classifying packets |
CN104866502A (en) * | 2014-02-25 | 2015-08-26 | 深圳市中兴微电子技术有限公司 | Data matching method and device |
CN105515997A (en) * | 2015-12-07 | 2016-04-20 | 刘航天 | BF_TCAM (Bloom Filter-Ternary Content Addressable Memory)-based high-efficiency range matching method for realizing zero range expansion |
CN105553850A (en) * | 2015-12-10 | 2016-05-04 | 北京浩瀚深度信息技术股份有限公司 | URL blocking method based on FPGA and TCAM |
US9424366B1 (en) * | 2013-02-11 | 2016-08-23 | Marvell International Ltd. | Reducing power consumption in ternary content addressable memory (TCAM) |
US20170046395A1 (en) * | 2014-04-30 | 2017-02-16 | Hewlett Packard Enterprise Development Lp | Partitionable ternary content addressable memory (tcam) for use with a bloom filter |
US20180063084A1 (en) * | 2016-09-01 | 2018-03-01 | Hewlett Packard Enterprise Development Lp | Filtering of packets for packet types at network devices |
CN111241138A (en) * | 2020-01-14 | 2020-06-05 | 北京恒光信息技术股份有限公司 | Data matching method and device |
CN114297368A (en) * | 2021-12-08 | 2022-04-08 | 无锡宏创盛安科技有限公司 | Efficient keyword filtering method realized in FPGA (field programmable Gate array) way |
-
2022
- 2022-05-09 CN CN202210496478.2A patent/CN114598616A/en active Pending
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070115986A1 (en) * | 2005-11-01 | 2007-05-24 | Udaya Shankara | Method to perform exact string match in the data plane of a network processor |
CN101359325A (en) * | 2007-08-01 | 2009-02-04 | 北京启明星辰信息技术有限公司 | Multi-key-word matching method for rapidly analyzing content |
CN101321163A (en) * | 2008-07-03 | 2008-12-10 | 江苏华丽网络工程有限公司 | Integrated hardware implementing method for multi-layer amalgamation and parallel processing network access equipment |
CN101478447A (en) * | 2009-01-08 | 2009-07-08 | 中国人民解放军信息工程大学 | Method and apparatus for deep packet detection |
CN101848222A (en) * | 2010-05-28 | 2010-09-29 | 武汉烽火网络有限责任公司 | Inspection method and device of Internet deep packet |
US20120275466A1 (en) * | 2010-10-21 | 2012-11-01 | Texas Instruments Incorporated | System and method for classifying packets |
US9424366B1 (en) * | 2013-02-11 | 2016-08-23 | Marvell International Ltd. | Reducing power consumption in ternary content addressable memory (TCAM) |
CN104866502A (en) * | 2014-02-25 | 2015-08-26 | 深圳市中兴微电子技术有限公司 | Data matching method and device |
US20170046395A1 (en) * | 2014-04-30 | 2017-02-16 | Hewlett Packard Enterprise Development Lp | Partitionable ternary content addressable memory (tcam) for use with a bloom filter |
CN105515997A (en) * | 2015-12-07 | 2016-04-20 | 刘航天 | BF_TCAM (Bloom Filter-Ternary Content Addressable Memory)-based high-efficiency range matching method for realizing zero range expansion |
CN105553850A (en) * | 2015-12-10 | 2016-05-04 | 北京浩瀚深度信息技术股份有限公司 | URL blocking method based on FPGA and TCAM |
US20180063084A1 (en) * | 2016-09-01 | 2018-03-01 | Hewlett Packard Enterprise Development Lp | Filtering of packets for packet types at network devices |
CN111241138A (en) * | 2020-01-14 | 2020-06-05 | 北京恒光信息技术股份有限公司 | Data matching method and device |
CN114297368A (en) * | 2021-12-08 | 2022-04-08 | 无锡宏创盛安科技有限公司 | Efficient keyword filtering method realized in FPGA (field programmable Gate array) way |
Non-Patent Citations (3)
Title |
---|
赵睿等: "深度包检测中的模式匹配算法研究", 《现代电子技术》 * |
金冬成: "P2P检测控制系统中的协议分析", 《中国新通信》 * |
陈正虎等: "一种基于Bloom-filter表项压缩的TCAM业务识别算法", 《电子与信息学报》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8060546B2 (en) | Positionally dependent pattern checking in character strings using deterministic finite automata | |
KR20030053038A (en) | A method of improving the lookup performance of tree-type knowledge base searches | |
US20100131935A1 (en) | System and method for compiling and matching regular expressions | |
CN111600796B (en) | Flow identification device and method based on configurable analysis field | |
JP2003092598A (en) | Packet transferring processor | |
WO2015184706A1 (en) | Statistical counting device and implementation method therefor, and system having statistical counting device | |
CN105099957B (en) | A kind of data packet forwarding method based on software checking book | |
CN103281257A (en) | Method and device for processing protocol message | |
CN101060482B (en) | A route search method and forwarding system | |
CN113411380A (en) | Processing method, logic circuit and equipment based on FPGA (field programmable gate array) programmable session table | |
CN110324204B (en) | High-speed regular expression matching engine and method implemented in FPGA (field programmable Gate array) | |
CN101650718A (en) | Method and device for matching character strings | |
CN114422617A (en) | Message processing method, system and computer readable storage medium | |
CN114598616A (en) | Efficient mode matching method for solving real-time mass data | |
US6661792B1 (en) | Apparatus for processing data packet of ethernet switch system and method thereof | |
CN117640510B (en) | Efficient forwarding method and device for space terahertz network packet | |
SE531947C2 (en) | Procedure, device and system for multi-field classification in a data communication network | |
CN101599910A (en) | The method and apparatus that message sends | |
CN112187935B (en) | Information identification method and read-only memory | |
US20240056393A1 (en) | Packet forwarding method and device, and computer readable storage medium | |
CN114827030A (en) | Flow classification device based on folded SRAM and table entry compression method | |
CN114610958B (en) | Processing method and device of transmission resources and electronic equipment | |
CN112214429A (en) | Data transmission device and method based on SRIO | |
CN116015696A (en) | Firewall system, malicious software detection method and device | |
CN107426180B (en) | Detection apparatus for data frame coverage of ethernet |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220607 |