CN105786981A - Hash-table-based host and URL keyword strategy matching method - Google Patents
Hash-table-based host and URL keyword strategy matching method Download PDFInfo
- Publication number
- CN105786981A CN105786981A CN201610085491.3A CN201610085491A CN105786981A CN 105786981 A CN105786981 A CN 105786981A CN 201610085491 A CN201610085491 A CN 201610085491A CN 105786981 A CN105786981 A CN 105786981A
- Authority
- CN
- China
- Prior art keywords
- url
- key
- designated
- bytes
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9566—URL specific, e.g. using aliases, detecting broken or misspelled links
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9014—Indexing; Data structures therefor; Storage structures hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1466—Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Computer And Data Communications (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a Hash-table-based host and URL keyword strategy matching method. According to the method, a source IP, a target IP, a source port, a target port, a protocol number, a host and a URL are managed, the seven factors are separately considered, the source IP, the target IP, the source port, the target port, the protocol number and the host are a whole body, the URL is a whole body, considering storage and research convenience of a super-long URL, URL keywords are partitioned for a second time, then the front storage content and the rear storage content are correlated, and thus unified precise research and matching are achieved. By means of the Hash-table-based host and URL keyword strategy matching method, the memory requirement is reduced, the super-long URL information is segmented and extracted so that a segmented key can be obtained, the method is adapted to the system requirement, the source IP, the target IP, the source port, the target port and the protocol number information are considered, and precise control can be conducted based on a certain user.
Description
Technical field
The present invention relates to Internet security technology area, specifically refer to a kind of host based on Hash table and URL keyword strategy matching method.
Background technology
Current Internet industry updates day by day; the life of people does not have been moved off network; but there is various crisis and hidden danger in a network; a lot " fishing website " utilizes false URL information to carry out fraud; the network information security is particularly important, also more and more higher for the requirement of URL and HOST keyword query performance and practicality in security system.Finding for the HOST used in information safety system and URL keyword storehouse coupling, the length of interval of HOST is between 4-48 byte, and wherein length occupies 90% within 20 bytes;The length of URL keyword is between 4-256 byte, and wherein the length about 52 bytes occupies 80%.For inconsistent with above-mentioned length and need support overlength URL mate, if adopting unified storage and coupling, the requirement of internal memory will be produced great expense, and then have impact on the security reliability of whole system.
At present for the exact-match lookup of the URL (length more than 40 bytes even up to 200 bytes) of overlength, if adopting merely the structure of a Hash table to carry out storing and mating, there is following defect: one, cause internal memory waste, conflict chain to increase, increase deletion and lookup can not reach performance requirement yet;Its two, the URL information of overlength can produce the key of overlength, and a lot of systems do not support so long Key, and this can cause that the URL of overlength cannot be carried out coupling;Its three, for part URL forward part information identical in the case of under, it may appear that repeat storage;Its four, this HOST and URL keyword coupling source IP, purpose IP, source port, destination interface and protocol number information are not taken into account, it is impossible to carry out accurate management and control based on a certain user.
Summary of the invention
For the defect overcoming prior art to exist, the present invention provides a kind of host based on Hash table and URL keyword strategy matching method, the method reduces memory requirements, the URL information stage extraction of overlength is obtained the key of segmentation, adaptive system requirement, and source IP, purpose IP, source port, destination interface and protocol number information are taken into account, it is possible to carry out accurate management and control based on a certain user.
For achieving the above object, the present invention is by the following technical solutions:
The present invention is based on the host of Hash table and URL keyword strategy matching method, the method is extracted URL and is formed keyword (Key), the list structure that search key (Key) is corresponding successively, can obtaining matching result, the extracting method of described keyword (Key) comprises the following steps:
Step S101, source IP, purpose IP, source port, destination interface, protocol number, host form a Key_1 and storage is made up of to table 1, Key_1 56 bytes;
Step S102, storage coupling action in the Result_1 that table 1 produces, described coupling action is for continuing to search for or forwarding or abandon;Wherein, continue to search for representing that Key_1 need to associate with follow-up URL, and generate an index_1, continue to search for being designated 1, forward and abandon and be all designated 0, go to step S103;Forwarding or abandon expression Key_1 and do not associate with follow-up URL, it is not necessary to memory headroom storage Key_1, forward and be designated 1, extraction terminates;
Step S103, it is judged that whether URL length, less than or equal to 52 bytes, is go to step S104, otherwise goes to step S105;
Step S104, if URL length less than 52 bytes with 0 by URL polishing, index_1 and URL forms Key_2 and stores table 2, and it is identical that the Result_2 that table 2 produces and the Result_1 that table 1 produces forms structure, if coupling action is for continuing to search for, then generate an index_2, continue to search for being designated 1, forward and abandon and be all designated 0, if coupling action is for forwarding or abandoning, then store Key_2 without memory headroom, forward and be designated 1;Extraction terminates;
Step S105, front 52 bytes composition Key_3 of index_1 and URL storage are arrived table 3, it is identical that the Result_3 that table 3 produces and the Result_1 that table 1 produces forms structure, if coupling action is for continuing to search for, then generate an index_3, continue to search for being designated 1, forward and abandon and be all designated 0, and go to step S106;If coupling action is for forwarding or abandoning, then storing Key_3 without memory headroom, forward and be designated 1, extraction terminates;
Step S106, it is judged that whether URL length, be more than or equal to 104 bytes, is go to step S107, otherwise goes to step S108;
Step S107, if URL length less than 104 bytes with 0 by URL polishing, after index_3 and URL, 52 byte composition Key_4 storage are to table 4, and Result_4 and the Result_1 composition structure that table 1 produces that table 4 produces are identical, if coupling action is for continuing to search for, then generate an index_4, continue to search for being designated 1, forward and abandon and be all designated 0, if coupling action is for forwarding or abandoning, then store Key_4 without memory headroom, forward and be designated 1;Extraction terminates;
Step S108, the 53-104 byte composition Key_5 of index_3 and URL storage are arrived table 5, it is identical that the Result_5 that table 5 produces and the Result_1 that table 1 produces forms structure, if coupling action is for continuing to search for, then generate an index_5, continue to search for being designated 1, forward and abandon and be all designated 0, and go to next step;If coupling action is for forwarding or abandoning, then storing Key_5 without memory headroom, forward and be designated 1, extraction terminates;
The like, if arriving ending, its polishing less than 52, is then 52 with 0 by the URL byte serial length of extraction.
In above-mentioned steps S101, in 56 bytes of Key_1, source IP takies 4 bytes, and purpose IP takies 4 bytes, and source port takies 2 bytes, and destination interface takies 2 bytes, and protocol number takies 1 byte, and all the other 43 bytes are taken by host;If in source IP, purpose IP, source port, destination interface, protocol number, host during any one curtailment, with 0 by its polishing.
The beneficial effects of the present invention is:
One, to not storing with the follow-up URL Key_1 associated in the present invention, reduces memory requirements;
Its two, the present invention is by the URL information stage extraction of overlength the key obtaining segmentation, adaptive system requirement;
Its three, source IP, purpose IP, source port, destination interface and protocol number information are taken into account by the present invention, it is possible to carry out accurate management and control based on a certain user.
Detailed description of the invention
Below in conjunction with specific embodiment, the present invention is described in further detail, but this embodiment should not be construed limitation of the present invention.
A kind of host based on Hash table provided by the invention and URL keyword strategy matching method, the method is extracted URL and is formed keyword (Key), the list structure that search key (Key) is corresponding successively, can obtain matching result.The extracting method of keyword (Key) comprises the following steps:
Step S101, source IP, purpose IP, source port, destination interface, protocol number, host form a Key_1 and store table 1, Key_1 is made up of 56 bytes, in 56 bytes of Key_1, source IP takies 4 bytes, and purpose IP takies 4 bytes, source port takies 2 bytes, destination interface takies 2 bytes, and protocol number takies 1 byte, and all the other 43 bytes are taken by host.If in source IP, purpose IP, source port, destination interface, protocol number, host during any one curtailment, with 0 by its polishing.
Step S102, storage coupling action in the Result_1 that table 1 produces, described coupling action is for continuing to search for or forwarding or abandon.Wherein, continue to search for representing that Key_1 need to associate with follow-up URL, and generate an index_1, continue to search for being designated 1, forward and abandon and be all designated 0, go to step S103.Forwarding or abandon expression Key_1 and do not associate with follow-up URL, it is not necessary to memory headroom storage Key_1, forward and be designated 1, extraction terminates.
Step S103, it is judged that whether URL length, less than or equal to 52 bytes, is go to step S104, otherwise goes to step S105.
Step S104, if URL length less than 52 bytes with 0 by URL polishing, index_1 and URL forms Key_2 and stores table 2, and Result_2 and the Result_1 composition structure that table 1 produces that table 2 produces are identical.If coupling action is for continuing to search for, then generating an index_2, continue to search for being designated 1, forward and abandon and be all designated 0, extraction terminates.If coupling action is for forwarding or abandoning, then storing Key_2 without memory headroom, forward and be designated 1, extraction terminates.
Step S105, arrives table 3 by front 52 bytes composition Key_3 of index_1 and URL storage, and Result_3 and the Result_1 composition structure that table 1 produces that table 3 produces are identical.If coupling action is for continuing to search for, then generates an index_3, continue to search for being designated 1, forward and abandon and be all designated 0, and go to step S106.If coupling action is for forwarding or abandoning, then storing Key_3 without memory headroom, forward and be designated 1, extraction terminates.
Step S106, it is judged that whether URL length, be more than or equal to 104 bytes, is go to step S107, otherwise goes to step S108.
Step S107, if URL length less than 104 bytes with 0 by URL polishing, 52 bytes composition Key_4 store table 4 after index_3 and URL, Result_4 and the Result_1 composition structure that table 1 produces that table 4 produces are identical.If coupling action is for continuing to search for, then generating an index_4, continue to search for being designated 1, forward and abandon and be all designated 0, extraction terminates.If coupling action is for forwarding or abandoning, then storing Key_4 without memory headroom, forward and be designated 1, extraction terminates.
Step S108, arrives table 5 by the 53-104 byte composition Key_5 of index_3 and URL storage, and Result_5 and the Result_1 composition structure that table 1 produces that table 5 produces are identical.If coupling action is for continuing to search for, then generates an index_5, continue to search for being designated 1, forward and abandon and be all designated 0, and go to next step.If coupling action is for forwarding or abandoning, then stores Key_5 without memory headroom, forward and be designated 1, terminate in advance.
The like, if arriving ending, its polishing less than 52, is then 52 with 0 by the URL byte serial length of extraction.
The content not being described in detail in this specification, belongs to the known prior art of those skilled in the art.
Claims (2)
1. the host based on Hash table and URL keyword strategy matching method, extract URL and form keyword (Key), the list structure that search key (Key) is corresponding successively, matching result can be obtained, it is characterized in that, the extracting method of described keyword (Key) comprises the following steps:
Step S101, source IP, purpose IP, source port, destination interface, protocol number, host form a Key_1 and storage is made up of to table 1, Key_1 56 bytes;
Step S102, storage coupling action in the Result_1 that table 1 produces, described coupling action is for continuing to search for or forwarding or abandon;Wherein, continue to search for representing that Key_1 need to associate with follow-up URL, and generate an index_1, continue to search for being designated 1, forward and abandon and be all designated 0, go to step S103;Forwarding or abandon expression Key_1 and do not associate with follow-up URL, it is not necessary to memory headroom storage Key_1, forward and be designated 1, extraction terminates;
Step S103, it is judged that whether URL length, less than or equal to 52 bytes, is go to step S104, otherwise goes to step S105;
Step S104, if URL length less than 52 bytes with 0 by URL polishing, index_1 and URL forms Key_2 and stores table 2, and it is identical that the Result_2 that table 2 produces and the Result_1 that table 1 produces forms structure, if coupling action is for continuing to search for, then generate an index_2, continue to search for being designated 1, forward and abandon and be all designated 0, if coupling action is for forwarding or abandoning, then store Key_2 without memory headroom, forward and be designated 1;Extraction terminates;
Step S105, front 52 bytes composition Key_3 of index_1 and URL storage are arrived table 3, it is identical that the Result_3 that table 3 produces and the Result_1 that table 1 produces forms structure, if coupling action is for continuing to search for, then generate an index_3, continue to search for being designated 1, forward and abandon and be all designated 0, and go to step S106;If coupling action is for forwarding or abandoning, then storing Key_3 without memory headroom, forward and be designated 1, extraction terminates;
Step S106, it is judged that whether URL length, be more than or equal to 104 bytes, is go to step S107, otherwise goes to step S108;
Step S107, if URL length less than 104 bytes with 0 by URL polishing, after index_3 and URL, 52 byte composition Key_4 storage are to table 4, and Result_4 and the Result_1 composition structure that table 1 produces that table 4 produces are identical, if coupling action is for continuing to search for, then generate an index_4, continue to search for being designated 1, forward and abandon and be all designated 0, if coupling action is for forwarding or abandoning, then store Key_4 without memory headroom, forward and be designated 1;Extraction terminates;
Step S108, the 53-104 byte composition Key_5 of index_3 and URL storage are arrived table 5, it is identical that the Result_5 that table 5 produces and the Result_1 that table 1 produces forms structure, if coupling action is for continuing to search for, then generate an index_5, continue to search for being designated 1, forward and abandon and be all designated 0, and go to next step;If coupling action is for forwarding or abandoning, then storing Key_5 without memory headroom, forward and be designated 1, extraction terminates;
The like, if arriving ending, its polishing less than 52, is then 52 with 0 by the URL byte serial length of extraction.
2. the host based on Hash table according to claim 1 and URL keyword strategy matching method, it is characterized in that, in step S101, in 56 bytes of Key_1, source IP takies 4 bytes, and purpose IP takies 4 bytes, source port takies 2 bytes, destination interface takies 2 bytes, and protocol number takies 1 byte, and all the other 43 bytes are taken by host;If in source IP, purpose IP, source port, destination interface, protocol number, host during any one curtailment, with 0 by its polishing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610085491.3A CN105786981B (en) | 2016-02-15 | 2016-02-15 | Host and URL keyword strategy matching method based on Hash table |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610085491.3A CN105786981B (en) | 2016-02-15 | 2016-02-15 | Host and URL keyword strategy matching method based on Hash table |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105786981A true CN105786981A (en) | 2016-07-20 |
CN105786981B CN105786981B (en) | 2019-05-17 |
Family
ID=56402236
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610085491.3A Active CN105786981B (en) | 2016-02-15 | 2016-02-15 | Host and URL keyword strategy matching method based on Hash table |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105786981B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106230863A (en) * | 2016-09-19 | 2016-12-14 | 成都知道创宇信息技术有限公司 | A kind of ReDoS attack detection method based on WAF |
CN106534081A (en) * | 2016-10-31 | 2017-03-22 | 浙江大学 | Method of complementing Host/Url characteristic set of App based on user real flow data |
CN112261168A (en) * | 2020-09-30 | 2021-01-22 | 厦门市美亚柏科信息股份有限公司 | Multi-IP port user information searching method, terminal equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1475930A (en) * | 2002-08-15 | 2004-02-18 | 联想(北京)有限公司 | Chain path layer location information filtering based on state detection |
CN102521348A (en) * | 2011-12-12 | 2012-06-27 | 上海西默通信技术有限公司 | Matching algorithm of mass Uniform Resource Locator (URL) |
CN102737119A (en) * | 2012-05-30 | 2012-10-17 | 华为技术有限公司 | Searching method, filtering method and related equipment and systems of uniform resource locator |
CN103077208A (en) * | 2012-12-28 | 2013-05-01 | 华为技术有限公司 | Uniform resource locator matching processing method and device |
CN103414603A (en) * | 2013-07-31 | 2013-11-27 | 清华大学 | Ipv6 deep packet inspection method based on Hash folding method |
-
2016
- 2016-02-15 CN CN201610085491.3A patent/CN105786981B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1475930A (en) * | 2002-08-15 | 2004-02-18 | 联想(北京)有限公司 | Chain path layer location information filtering based on state detection |
CN102521348A (en) * | 2011-12-12 | 2012-06-27 | 上海西默通信技术有限公司 | Matching algorithm of mass Uniform Resource Locator (URL) |
CN102737119A (en) * | 2012-05-30 | 2012-10-17 | 华为技术有限公司 | Searching method, filtering method and related equipment and systems of uniform resource locator |
CN103077208A (en) * | 2012-12-28 | 2013-05-01 | 华为技术有限公司 | Uniform resource locator matching processing method and device |
CN103414603A (en) * | 2013-07-31 | 2013-11-27 | 清华大学 | Ipv6 deep packet inspection method based on Hash folding method |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106230863A (en) * | 2016-09-19 | 2016-12-14 | 成都知道创宇信息技术有限公司 | A kind of ReDoS attack detection method based on WAF |
CN106230863B (en) * | 2016-09-19 | 2019-05-07 | 成都知道创宇信息技术有限公司 | A kind of ReDoS attack detection method based on WAF |
CN106534081A (en) * | 2016-10-31 | 2017-03-22 | 浙江大学 | Method of complementing Host/Url characteristic set of App based on user real flow data |
CN106534081B (en) * | 2016-10-31 | 2019-09-10 | 浙江大学 | A method of the Host/Url feature set based on user's real traffic Supplementing Data App |
CN112261168A (en) * | 2020-09-30 | 2021-01-22 | 厦门市美亚柏科信息股份有限公司 | Multi-IP port user information searching method, terminal equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN105786981B (en) | 2019-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017133344A1 (en) | Ip address table storage and query method applicable in dns querying | |
CN102122285B (en) | Data cache system and data inquiry method | |
RU2009140391A (en) | IDENTIFICATION AND COMPARISON OF E-MAIL MESSAGES | |
CN103226597B (en) | Keyword advertisement matching method based on natural semantics | |
CN103873371A (en) | Name routing fast matching search method and device | |
WO2008043645B1 (en) | Establishing document relevance by semantic network density | |
CN105786981A (en) | Hash-table-based host and URL keyword strategy matching method | |
JP2007524927A5 (en) | ||
CN104572983B (en) | Construction method, String searching method and the related device of hash table based on internal memory | |
CN109639579B (en) | Multicast message processing method and device, storage medium and processor | |
CN101369283A (en) | Data synchronization method and system for internal memory database physical data base | |
IL182820A (en) | Double-hash lookup mechanism for searching addresses in a network device | |
CN104008199B (en) | A kind of data query method | |
WO2018068524A1 (en) | Routing-table establishment and ip routing lookup method, device, and storage medium | |
CN103428093A (en) | Route prefix storing, matching and updating method and device based on names | |
WO2015127721A1 (en) | Data matching method and apparatus and computer storage medium | |
CN103595637A (en) | Method for utilizing content-centric network nodes to process data based on tree and hash table | |
CN103914570A (en) | Intelligent customer service searching method and system based on character string similarity algorithm | |
CN104239570B (en) | The searching method and device of paper | |
JP2001005874A5 (en) | ||
CN106033438A (en) | Public sentiment data storage method and server | |
US20120054198A1 (en) | Table creating and lookup method used by network processor | |
CN107204891A (en) | A kind of method and device of the lower message identification of magnanimity rule | |
CN102387403B (en) | A kind of service message transfer approach based on matched rule and system | |
US20180240053A1 (en) | System and Method for Associating a Multi-segment Component Transaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |