CN105786981A - Hash-table-based host and URL keyword strategy matching method - Google Patents

Hash-table-based host and URL keyword strategy matching method Download PDF

Info

Publication number
CN105786981A
CN105786981A CN201610085491.3A CN201610085491A CN105786981A CN 105786981 A CN105786981 A CN 105786981A CN 201610085491 A CN201610085491 A CN 201610085491A CN 105786981 A CN105786981 A CN 105786981A
Authority
CN
China
Prior art keywords
url
key
designated
bytes
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610085491.3A
Other languages
Chinese (zh)
Other versions
CN105786981B (en
Inventor
吴有庆
王乾
马红兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Balance Network Technology Co Ltd
Original Assignee
Nanjing Balance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Balance Network Technology Co Ltd filed Critical Nanjing Balance Network Technology Co Ltd
Priority to CN201610085491.3A priority Critical patent/CN105786981B/en
Publication of CN105786981A publication Critical patent/CN105786981A/en
Application granted granted Critical
Publication of CN105786981B publication Critical patent/CN105786981B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1466Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a Hash-table-based host and URL keyword strategy matching method. According to the method, a source IP, a target IP, a source port, a target port, a protocol number, a host and a URL are managed, the seven factors are separately considered, the source IP, the target IP, the source port, the target port, the protocol number and the host are a whole body, the URL is a whole body, considering storage and research convenience of a super-long URL, URL keywords are partitioned for a second time, then the front storage content and the rear storage content are correlated, and thus unified precise research and matching are achieved. By means of the Hash-table-based host and URL keyword strategy matching method, the memory requirement is reduced, the super-long URL information is segmented and extracted so that a segmented key can be obtained, the method is adapted to the system requirement, the source IP, the target IP, the source port, the target port and the protocol number information are considered, and precise control can be conducted based on a certain user.

Description

Host and URL keyword strategy matching method based on Hash table
Technical field
The present invention relates to Internet security technology area, specifically refer to a kind of host based on Hash table and URL keyword strategy matching method.
Background technology
Current Internet industry updates day by day; the life of people does not have been moved off network; but there is various crisis and hidden danger in a network; a lot " fishing website " utilizes false URL information to carry out fraud; the network information security is particularly important, also more and more higher for the requirement of URL and HOST keyword query performance and practicality in security system.Finding for the HOST used in information safety system and URL keyword storehouse coupling, the length of interval of HOST is between 4-48 byte, and wherein length occupies 90% within 20 bytes;The length of URL keyword is between 4-256 byte, and wherein the length about 52 bytes occupies 80%.For inconsistent with above-mentioned length and need support overlength URL mate, if adopting unified storage and coupling, the requirement of internal memory will be produced great expense, and then have impact on the security reliability of whole system.
At present for the exact-match lookup of the URL (length more than 40 bytes even up to 200 bytes) of overlength, if adopting merely the structure of a Hash table to carry out storing and mating, there is following defect: one, cause internal memory waste, conflict chain to increase, increase deletion and lookup can not reach performance requirement yet;Its two, the URL information of overlength can produce the key of overlength, and a lot of systems do not support so long Key, and this can cause that the URL of overlength cannot be carried out coupling;Its three, for part URL forward part information identical in the case of under, it may appear that repeat storage;Its four, this HOST and URL keyword coupling source IP, purpose IP, source port, destination interface and protocol number information are not taken into account, it is impossible to carry out accurate management and control based on a certain user.
Summary of the invention
For the defect overcoming prior art to exist, the present invention provides a kind of host based on Hash table and URL keyword strategy matching method, the method reduces memory requirements, the URL information stage extraction of overlength is obtained the key of segmentation, adaptive system requirement, and source IP, purpose IP, source port, destination interface and protocol number information are taken into account, it is possible to carry out accurate management and control based on a certain user.
For achieving the above object, the present invention is by the following technical solutions:
The present invention is based on the host of Hash table and URL keyword strategy matching method, the method is extracted URL and is formed keyword (Key), the list structure that search key (Key) is corresponding successively, can obtaining matching result, the extracting method of described keyword (Key) comprises the following steps:
Step S101, source IP, purpose IP, source port, destination interface, protocol number, host form a Key_1 and storage is made up of to table 1, Key_1 56 bytes;
Step S102, storage coupling action in the Result_1 that table 1 produces, described coupling action is for continuing to search for or forwarding or abandon;Wherein, continue to search for representing that Key_1 need to associate with follow-up URL, and generate an index_1, continue to search for being designated 1, forward and abandon and be all designated 0, go to step S103;Forwarding or abandon expression Key_1 and do not associate with follow-up URL, it is not necessary to memory headroom storage Key_1, forward and be designated 1, extraction terminates;
Step S103, it is judged that whether URL length, less than or equal to 52 bytes, is go to step S104, otherwise goes to step S105;
Step S104, if URL length less than 52 bytes with 0 by URL polishing, index_1 and URL forms Key_2 and stores table 2, and it is identical that the Result_2 that table 2 produces and the Result_1 that table 1 produces forms structure, if coupling action is for continuing to search for, then generate an index_2, continue to search for being designated 1, forward and abandon and be all designated 0, if coupling action is for forwarding or abandoning, then store Key_2 without memory headroom, forward and be designated 1;Extraction terminates;
Step S105, front 52 bytes composition Key_3 of index_1 and URL storage are arrived table 3, it is identical that the Result_3 that table 3 produces and the Result_1 that table 1 produces forms structure, if coupling action is for continuing to search for, then generate an index_3, continue to search for being designated 1, forward and abandon and be all designated 0, and go to step S106;If coupling action is for forwarding or abandoning, then storing Key_3 without memory headroom, forward and be designated 1, extraction terminates;
Step S106, it is judged that whether URL length, be more than or equal to 104 bytes, is go to step S107, otherwise goes to step S108;
Step S107, if URL length less than 104 bytes with 0 by URL polishing, after index_3 and URL, 52 byte composition Key_4 storage are to table 4, and Result_4 and the Result_1 composition structure that table 1 produces that table 4 produces are identical, if coupling action is for continuing to search for, then generate an index_4, continue to search for being designated 1, forward and abandon and be all designated 0, if coupling action is for forwarding or abandoning, then store Key_4 without memory headroom, forward and be designated 1;Extraction terminates;
Step S108, the 53-104 byte composition Key_5 of index_3 and URL storage are arrived table 5, it is identical that the Result_5 that table 5 produces and the Result_1 that table 1 produces forms structure, if coupling action is for continuing to search for, then generate an index_5, continue to search for being designated 1, forward and abandon and be all designated 0, and go to next step;If coupling action is for forwarding or abandoning, then storing Key_5 without memory headroom, forward and be designated 1, extraction terminates;
The like, if arriving ending, its polishing less than 52, is then 52 with 0 by the URL byte serial length of extraction.
In above-mentioned steps S101, in 56 bytes of Key_1, source IP takies 4 bytes, and purpose IP takies 4 bytes, and source port takies 2 bytes, and destination interface takies 2 bytes, and protocol number takies 1 byte, and all the other 43 bytes are taken by host;If in source IP, purpose IP, source port, destination interface, protocol number, host during any one curtailment, with 0 by its polishing.
The beneficial effects of the present invention is:
One, to not storing with the follow-up URL Key_1 associated in the present invention, reduces memory requirements;
Its two, the present invention is by the URL information stage extraction of overlength the key obtaining segmentation, adaptive system requirement;
Its three, source IP, purpose IP, source port, destination interface and protocol number information are taken into account by the present invention, it is possible to carry out accurate management and control based on a certain user.
Detailed description of the invention
Below in conjunction with specific embodiment, the present invention is described in further detail, but this embodiment should not be construed limitation of the present invention.
A kind of host based on Hash table provided by the invention and URL keyword strategy matching method, the method is extracted URL and is formed keyword (Key), the list structure that search key (Key) is corresponding successively, can obtain matching result.The extracting method of keyword (Key) comprises the following steps:
Step S101, source IP, purpose IP, source port, destination interface, protocol number, host form a Key_1 and store table 1, Key_1 is made up of 56 bytes, in 56 bytes of Key_1, source IP takies 4 bytes, and purpose IP takies 4 bytes, source port takies 2 bytes, destination interface takies 2 bytes, and protocol number takies 1 byte, and all the other 43 bytes are taken by host.If in source IP, purpose IP, source port, destination interface, protocol number, host during any one curtailment, with 0 by its polishing.
Step S102, storage coupling action in the Result_1 that table 1 produces, described coupling action is for continuing to search for or forwarding or abandon.Wherein, continue to search for representing that Key_1 need to associate with follow-up URL, and generate an index_1, continue to search for being designated 1, forward and abandon and be all designated 0, go to step S103.Forwarding or abandon expression Key_1 and do not associate with follow-up URL, it is not necessary to memory headroom storage Key_1, forward and be designated 1, extraction terminates.
Step S103, it is judged that whether URL length, less than or equal to 52 bytes, is go to step S104, otherwise goes to step S105.
Step S104, if URL length less than 52 bytes with 0 by URL polishing, index_1 and URL forms Key_2 and stores table 2, and Result_2 and the Result_1 composition structure that table 1 produces that table 2 produces are identical.If coupling action is for continuing to search for, then generating an index_2, continue to search for being designated 1, forward and abandon and be all designated 0, extraction terminates.If coupling action is for forwarding or abandoning, then storing Key_2 without memory headroom, forward and be designated 1, extraction terminates.
Step S105, arrives table 3 by front 52 bytes composition Key_3 of index_1 and URL storage, and Result_3 and the Result_1 composition structure that table 1 produces that table 3 produces are identical.If coupling action is for continuing to search for, then generates an index_3, continue to search for being designated 1, forward and abandon and be all designated 0, and go to step S106.If coupling action is for forwarding or abandoning, then storing Key_3 without memory headroom, forward and be designated 1, extraction terminates.
Step S106, it is judged that whether URL length, be more than or equal to 104 bytes, is go to step S107, otherwise goes to step S108.
Step S107, if URL length less than 104 bytes with 0 by URL polishing, 52 bytes composition Key_4 store table 4 after index_3 and URL, Result_4 and the Result_1 composition structure that table 1 produces that table 4 produces are identical.If coupling action is for continuing to search for, then generating an index_4, continue to search for being designated 1, forward and abandon and be all designated 0, extraction terminates.If coupling action is for forwarding or abandoning, then storing Key_4 without memory headroom, forward and be designated 1, extraction terminates.
Step S108, arrives table 5 by the 53-104 byte composition Key_5 of index_3 and URL storage, and Result_5 and the Result_1 composition structure that table 1 produces that table 5 produces are identical.If coupling action is for continuing to search for, then generates an index_5, continue to search for being designated 1, forward and abandon and be all designated 0, and go to next step.If coupling action is for forwarding or abandoning, then stores Key_5 without memory headroom, forward and be designated 1, terminate in advance.
The like, if arriving ending, its polishing less than 52, is then 52 with 0 by the URL byte serial length of extraction.
The content not being described in detail in this specification, belongs to the known prior art of those skilled in the art.

Claims (2)

1. the host based on Hash table and URL keyword strategy matching method, extract URL and form keyword (Key), the list structure that search key (Key) is corresponding successively, matching result can be obtained, it is characterized in that, the extracting method of described keyword (Key) comprises the following steps:
Step S101, source IP, purpose IP, source port, destination interface, protocol number, host form a Key_1 and storage is made up of to table 1, Key_1 56 bytes;
Step S102, storage coupling action in the Result_1 that table 1 produces, described coupling action is for continuing to search for or forwarding or abandon;Wherein, continue to search for representing that Key_1 need to associate with follow-up URL, and generate an index_1, continue to search for being designated 1, forward and abandon and be all designated 0, go to step S103;Forwarding or abandon expression Key_1 and do not associate with follow-up URL, it is not necessary to memory headroom storage Key_1, forward and be designated 1, extraction terminates;
Step S103, it is judged that whether URL length, less than or equal to 52 bytes, is go to step S104, otherwise goes to step S105;
Step S104, if URL length less than 52 bytes with 0 by URL polishing, index_1 and URL forms Key_2 and stores table 2, and it is identical that the Result_2 that table 2 produces and the Result_1 that table 1 produces forms structure, if coupling action is for continuing to search for, then generate an index_2, continue to search for being designated 1, forward and abandon and be all designated 0, if coupling action is for forwarding or abandoning, then store Key_2 without memory headroom, forward and be designated 1;Extraction terminates;
Step S105, front 52 bytes composition Key_3 of index_1 and URL storage are arrived table 3, it is identical that the Result_3 that table 3 produces and the Result_1 that table 1 produces forms structure, if coupling action is for continuing to search for, then generate an index_3, continue to search for being designated 1, forward and abandon and be all designated 0, and go to step S106;If coupling action is for forwarding or abandoning, then storing Key_3 without memory headroom, forward and be designated 1, extraction terminates;
Step S106, it is judged that whether URL length, be more than or equal to 104 bytes, is go to step S107, otherwise goes to step S108;
Step S107, if URL length less than 104 bytes with 0 by URL polishing, after index_3 and URL, 52 byte composition Key_4 storage are to table 4, and Result_4 and the Result_1 composition structure that table 1 produces that table 4 produces are identical, if coupling action is for continuing to search for, then generate an index_4, continue to search for being designated 1, forward and abandon and be all designated 0, if coupling action is for forwarding or abandoning, then store Key_4 without memory headroom, forward and be designated 1;Extraction terminates;
Step S108, the 53-104 byte composition Key_5 of index_3 and URL storage are arrived table 5, it is identical that the Result_5 that table 5 produces and the Result_1 that table 1 produces forms structure, if coupling action is for continuing to search for, then generate an index_5, continue to search for being designated 1, forward and abandon and be all designated 0, and go to next step;If coupling action is for forwarding or abandoning, then storing Key_5 without memory headroom, forward and be designated 1, extraction terminates;
The like, if arriving ending, its polishing less than 52, is then 52 with 0 by the URL byte serial length of extraction.
2. the host based on Hash table according to claim 1 and URL keyword strategy matching method, it is characterized in that, in step S101, in 56 bytes of Key_1, source IP takies 4 bytes, and purpose IP takies 4 bytes, source port takies 2 bytes, destination interface takies 2 bytes, and protocol number takies 1 byte, and all the other 43 bytes are taken by host;If in source IP, purpose IP, source port, destination interface, protocol number, host during any one curtailment, with 0 by its polishing.
CN201610085491.3A 2016-02-15 2016-02-15 Host and URL keyword strategy matching method based on Hash table Active CN105786981B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610085491.3A CN105786981B (en) 2016-02-15 2016-02-15 Host and URL keyword strategy matching method based on Hash table

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610085491.3A CN105786981B (en) 2016-02-15 2016-02-15 Host and URL keyword strategy matching method based on Hash table

Publications (2)

Publication Number Publication Date
CN105786981A true CN105786981A (en) 2016-07-20
CN105786981B CN105786981B (en) 2019-05-17

Family

ID=56402236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610085491.3A Active CN105786981B (en) 2016-02-15 2016-02-15 Host and URL keyword strategy matching method based on Hash table

Country Status (1)

Country Link
CN (1) CN105786981B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106230863A (en) * 2016-09-19 2016-12-14 成都知道创宇信息技术有限公司 A kind of ReDoS attack detection method based on WAF
CN106534081A (en) * 2016-10-31 2017-03-22 浙江大学 Method of complementing Host/Url characteristic set of App based on user real flow data
CN112261168A (en) * 2020-09-30 2021-01-22 厦门市美亚柏科信息股份有限公司 Multi-IP port user information searching method, terminal equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1475930A (en) * 2002-08-15 2004-02-18 联想(北京)有限公司 Chain path layer location information filtering based on state detection
CN102521348A (en) * 2011-12-12 2012-06-27 上海西默通信技术有限公司 Matching algorithm of mass Uniform Resource Locator (URL)
CN102737119A (en) * 2012-05-30 2012-10-17 华为技术有限公司 Searching method, filtering method and related equipment and systems of uniform resource locator
CN103077208A (en) * 2012-12-28 2013-05-01 华为技术有限公司 Uniform resource locator matching processing method and device
CN103414603A (en) * 2013-07-31 2013-11-27 清华大学 Ipv6 deep packet inspection method based on Hash folding method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1475930A (en) * 2002-08-15 2004-02-18 联想(北京)有限公司 Chain path layer location information filtering based on state detection
CN102521348A (en) * 2011-12-12 2012-06-27 上海西默通信技术有限公司 Matching algorithm of mass Uniform Resource Locator (URL)
CN102737119A (en) * 2012-05-30 2012-10-17 华为技术有限公司 Searching method, filtering method and related equipment and systems of uniform resource locator
CN103077208A (en) * 2012-12-28 2013-05-01 华为技术有限公司 Uniform resource locator matching processing method and device
CN103414603A (en) * 2013-07-31 2013-11-27 清华大学 Ipv6 deep packet inspection method based on Hash folding method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106230863A (en) * 2016-09-19 2016-12-14 成都知道创宇信息技术有限公司 A kind of ReDoS attack detection method based on WAF
CN106230863B (en) * 2016-09-19 2019-05-07 成都知道创宇信息技术有限公司 A kind of ReDoS attack detection method based on WAF
CN106534081A (en) * 2016-10-31 2017-03-22 浙江大学 Method of complementing Host/Url characteristic set of App based on user real flow data
CN106534081B (en) * 2016-10-31 2019-09-10 浙江大学 A method of the Host/Url feature set based on user's real traffic Supplementing Data App
CN112261168A (en) * 2020-09-30 2021-01-22 厦门市美亚柏科信息股份有限公司 Multi-IP port user information searching method, terminal equipment and storage medium

Also Published As

Publication number Publication date
CN105786981B (en) 2019-05-17

Similar Documents

Publication Publication Date Title
WO2017133344A1 (en) Ip address table storage and query method applicable in dns querying
CN102122285B (en) Data cache system and data inquiry method
RU2009140391A (en) IDENTIFICATION AND COMPARISON OF E-MAIL MESSAGES
CN103226597B (en) Keyword advertisement matching method based on natural semantics
CN103873371A (en) Name routing fast matching search method and device
WO2008043645B1 (en) Establishing document relevance by semantic network density
CN105786981A (en) Hash-table-based host and URL keyword strategy matching method
JP2007524927A5 (en)
CN104572983B (en) Construction method, String searching method and the related device of hash table based on internal memory
CN109639579B (en) Multicast message processing method and device, storage medium and processor
CN101369283A (en) Data synchronization method and system for internal memory database physical data base
IL182820A (en) Double-hash lookup mechanism for searching addresses in a network device
CN104008199B (en) A kind of data query method
WO2018068524A1 (en) Routing-table establishment and ip routing lookup method, device, and storage medium
CN103428093A (en) Route prefix storing, matching and updating method and device based on names
WO2015127721A1 (en) Data matching method and apparatus and computer storage medium
CN103595637A (en) Method for utilizing content-centric network nodes to process data based on tree and hash table
CN103914570A (en) Intelligent customer service searching method and system based on character string similarity algorithm
CN104239570B (en) The searching method and device of paper
JP2001005874A5 (en)
CN106033438A (en) Public sentiment data storage method and server
US20120054198A1 (en) Table creating and lookup method used by network processor
CN107204891A (en) A kind of method and device of the lower message identification of magnanimity rule
CN102387403B (en) A kind of service message transfer approach based on matched rule and system
US20180240053A1 (en) System and Method for Associating a Multi-segment Component Transaction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant