CN103401850A - Message filtering method and device - Google Patents

Message filtering method and device Download PDF

Info

Publication number
CN103401850A
CN103401850A CN2013103058404A CN201310305840A CN103401850A CN 103401850 A CN103401850 A CN 103401850A CN 2013103058404 A CN2013103058404 A CN 2013103058404A CN 201310305840 A CN201310305840 A CN 201310305840A CN 103401850 A CN103401850 A CN 103401850A
Authority
CN
China
Prior art keywords
information
field
url
current data
data stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013103058404A
Other languages
Chinese (zh)
Inventor
李霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Star Net Ruijie Networks Co Ltd
Original Assignee
Beijing Star Net Ruijie Networks Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Star Net Ruijie Networks Co Ltd filed Critical Beijing Star Net Ruijie Networks Co Ltd
Priority to CN2013103058404A priority Critical patent/CN103401850A/en
Publication of CN103401850A publication Critical patent/CN103401850A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a message filtering method and device. Aiming at each data flow, the method comprises the following steps of A, extracting information of a related field of an URL (Uniform Resource Locator) from an acquired HTTP (Hyper Text Transport Protocol) request message of a current data flow, and storing the related field of the URL in a cache region corresponding to the current data flow; B, confirming whether the information stored in the cache region can be spliced into a whole URL or not, if yes, carrying out D, and if not, carrying out C; C, acquiring a next HTTP request message, and carrying out A; D, after confirming that the whole URL is the URL for a search engine, extracting information of a search keyword field; and E, confirming whether the extracted information is stored in a preset search keyword set or not, if yes, discarding the HTTP request message, and if not, releasing the HTTP request message. According to the scheme, the extracting efficiency of the URL can be improved.

Description

A kind of message filtering method and device
Technical field
The present invention relates to field of computer technology, espespecially a kind of message filtering method and device.
Background technology
The high speed development of the Internet makes its a lot of corners that are penetrated into social life, becomes people's indispensable part of learning, live, work, and also is enterprise's high efficiency operation platform that provides the foundation.The Internet brings to us that many also for various discordant behaviors provide the hotbed that grows, illegal reaction information etc. is slandered, propagated in network practical joke, calumny simultaneously easily, gives that country is stable, social harmony, enterprise efficiency brought threat.
The many threats that bring for solving the Internet, the internet behavior management concept arises at the historic moment.The internet behavior management refers to the use of control and management user side to the Internet, comprise web page access filtration, network application control, bandwidth traffic management, information transmit-receive audit, user behavior analysis etc., thereby realize the comprehensive management to the internet access behavior.Wherein, search engine, as the important channel of acquisition of information, also seems particularly important in the internet behavior management.Can say that the search behavior management for search engine has become one of critical function of internet behavior management.
By the main flow search engine is analyzed to discovery, the search key of search engine all is recorded in URL(uniform resource locator) (Uniform Resource Locator usually, URL) in, if search key is " testkeyword ", the URL of several main flow search engines is as follows:
Baidu
http://www.baidu.com/s?wd=testkeyword
Illustrate: in URL, the wd field value is search key.
Google (Google)
http://www.google.com.hk/search?hl=zh-CN&source=hp&q=testkeyword&me?ta=&aq=f&aqi=&aql=&oq=&gs_rfai=
Illustrate: in URL, the q field value is search key.
Yahoo (Yahoo)
http://search.cn.yahoo.com/s?p=testkeyword&v=web&pid=ysearch
Illustrate: in URL, the p field value is search key.
Based on this feature, if forbid user side, by search engine, search for some keyword, only need to carry out monitoring filtering to relevant URL gets final product, common processing procedure: gateway or internet behavior management equipment are intercepted and captured HTML (Hypertext Markup Language) (Hyper Text Transport Protocol, HTTP) request message, and therefrom extract URL; URL is identified, if the URL of search engine enters next step, otherwise this message of letting pass; Field name according to default search key, from URL, extracting search key, judge whether it is the search key of forbidding, if so, abandons this message, otherwise this message of letting pass.
According to RFC2616, the syntax format of URL is as follows:
HTTP_URL:="http:""//"host[:port][abs_path["?"query]]
Wherein, http represents http protocol, host[:port] be the value in HTTP request message stem HOST territory (being the address of resource website, can be domain name, can be also IP), if port is empty, represent that port is 80; Abs_path[" " query] be the Uniform Resource Identifier (Uniform Resource Identifier, URI) of resource.Visible, after HOST field information and URI field information were determined, URL had just determined.
For example, for message:
GET
/s?wd=%E6%B5%8B%E8%AF%95&rsv_spt=1&issp=1&rsv_bp=0&ie=utf-8&tn=baiduhome_pg?HTTP/1.1
Accept:image/gif,image/x-xbitmap,image/jpeg,image/pjpeg,application/x-shockwave-flash,application/xaml+xml,application/x-ms-xbap,application/x-ms-application,application/vnd.ms-excel,application/vnd.ms-powerpoint,application/msword,*/*
Referer:http://www.baidu.com/
Accept-Language:zh-cn
Accept-Encoding:gzip,deflate?User-Agent:Mozilla/4.0(compatible;MSIE6.0;Windows?NT5.1;SV1;.NET4.0C;.NET4.0E;.NET?CLR2.0.50727;.NET?CLR3.0.04506.648;.NET?CLR3.5.21022)
Host:www.baidu.com
Connection:Keep-Alive。
According to RFC2616, the URL that extracts is:
www.baidu.com/s?wd=%E6%B5%8B%E8%AF%95&rsv_spt=1&issp=1&rsv_bp=0&ie=utf-8&tn=baiduhome_pg。
Because HTTP is carried on transmission control protocol (Transfer Control Protocol, TCP) on, according to the feature of TCP, if packet loss occurs in Internet Transmission, be transmitting terminal is not received opposite end within the time of appointment confirmation information, transmitting terminal can resend this message.In the reality test, if the HTTP request message that user side sends is abandoned by gateway or internet behavior management equipment, this user side can repeatedly retransmit this HTTP request message, even can repeatedly attempt still after failure, the HTTP request message being carried out to the burst transmission.The above-mentioned HTTP request message may be divided into following two messages:
The message fragment situation:
Message 1:
GET
/s?wd=%E6%B5%8B%E8%AF%95&rsv_spt=1&issp=1&rsv_bp=0&ie=utf-8&tn=baiduhome_pg?HTTP/1.1
Accept:image/gif,image/x-xbitmap,image/jpeg,image/pjpeg,application/x-shockwave-flash,application/xaml+xml,application/x-ms-xbap,application/x-ms-application,application/vnd.ms-excel,application/vnd.ms-powerpoint,application/msword,*/*
Message 2:
Referer:http://www.baidu.com/
Accept-Language:zh-cn
Accept-Encoding:gzip,deflate
User-Agent:Mozilla/4.0(compatible;MSIE6.0;Windows?NT5.1;SV1;.NET4.0C;.NET4.0E;.NET?CLR2.0.50727;.NET?CLR3.0.04506.648;.NET?CLR3.5.21022)
Host:www.baidu.com
Connection:Keep-Alive。
As implied above, URI field information and HOST field information are scattered in message 1 and message 2, and this has just strengthened the difficulty that URL extracts.The common way of prior art scheme is, by burst, the interim buffer memory of out of order message, after message is here more one by one message extract.The a large amount of internal memories of this processing mode consumption, affect the URL extraction efficiency, and then affect packet filtering efficiency.
Summary of the invention
The embodiment of the present invention provides a kind of message filtering method and device, in order to solve existing message filtering method at the URL burst, when out of order, the problem that packet filtering efficiency is lower.
A kind of message filtering method comprises:
A, from the HTML (Hypertext Markup Language) HTTP request message of the current data obtained stream, extracting the information of uniform resource position mark URL relevant field, and be kept in the buffer zone that described current data stream is corresponding;
B, determine in buffer zone that described current data stream is corresponding, whether canned data can be spliced into complete URL, if can, D carried out; If cannot, carry out C;
C, obtain the next HTTP request message of described current data stream, carry out A;
D, after definite described complete URL is the URL of search engine, from described complete URL, extracting the information of search key field;
Whether the information of E, definite search key field of extracting is stored in the search key set of setting up in advance; If exist, abandon the HTTP request message of the information that comprises the URL relevant field that the described complete URL of splicing uses; If do not exist, the HTTP request message of the information comprise the URL relevant field that the described complete URL of splicing uses of letting pass.
Concrete, described URL relevant field comprises unknown field, Uniform Resource Identifier URI field and HOST field;
From the HTTP request message that the current data of obtaining flows, extracting the information of URL relevant field, and be kept in the buffer zone that described current data stream is corresponding, specifically comprise:
If what extract is the complete information of URI field and/or HOST field, directly be kept in the buffer zone that described current data stream is corresponding;
If what extract is the information of unknown field, the information of unknown field and corresponding original transmission control protocol TCP sequence number thereof are kept in the buffer zone that described current data stream is corresponding;
If what extract is the partial information of URI field, the initial TCP sequence number that the disappearance information of the partial information of the URI field of extraction and URI field is corresponding is kept in the buffer zone that described current data stream is corresponding;
If what extract is the partial information of HOST field, the initial TCP sequence number that the disappearance information of the partial information of the HOST field of extraction and HOST field is corresponding is kept in the buffer zone that described current data stream is corresponding.
Concrete, splice canned data in the buffer zone that described current data stream is corresponding, specifically comprise:
The information of the partial information of the information of URI field, HOST field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of HOST field is spliced; Or
The information of the partial information of the information of HOST field, URI field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of URI field is spliced; Or
The information of the partial information of the information of the partial information of HOST field, the unknown field identical with initial TCP sequence number corresponding to the disappearance information of HOST field, URI field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of URI field is spliced.
Further, after the HTTP request message that abandons the information that comprises the URL relevant field that uses of the described complete URL of splicing, also comprise:
Transmitting terminal transmission to the HTTP message that abandons abandons information.
Concrete, set up the search key set, specifically comprise:
Obtain forbidden search key;
Use known coded format to encode respectively to forbidden Chinese search keyword;
Forbidden Chinese search keyword after coding and the forbidden non-Chinese search keyword that obtains form the search key set.
A kind of packet filtering device comprises:
Storage unit, the HTML (Hypertext Markup Language) HTTP request message that flows for the current data from obtaining extracts the information of uniform resource position mark URL relevant field, and is kept in the buffer zone that described current data stream is corresponding;
Whether determining unit, can be spliced into complete URL for determining buffer zone canned data corresponding to described current data stream; If cannot, turn to acquiring unit; If of course, turn to extraction unit;
Described acquiring unit, be used to obtaining the next HTTP request message of described current data stream, turn to described storage unit;
Described extraction unit, after being the URL of search engine at definite described complete URL, from described complete URL, extracting the information of search key field;
Whether processing unit, be stored in the search key set of setting up in advance for the information of determining the search key field of extracting; If exist, abandon the HTTP request message of the information that comprises the URL relevant field that the described complete URL of splicing uses; If do not exist, the HTTP request message of the information comprise the URL relevant field that the described complete URL of splicing uses of letting pass.
Concrete, described URL relevant field comprises unknown field, Uniform Resource Identifier URI field and HOST field;
Described storage unit, extract the information of URL relevant field for the HTTP request message of the current data from obtaining stream, and be kept in the buffer zone that described current data stream is corresponding, specifically for:
If what extract is the complete information of URI field and/or HOST field, directly be kept in the buffer zone that described current data stream is corresponding;
If what extract is the information of unknown field, the information of unknown field and corresponding original transmission control protocol TCP sequence number thereof are kept in the buffer zone that described current data stream is corresponding;
If what extract is the partial information of URI field, the initial TCP sequence number that the disappearance information of the partial information of the URI field of extraction and URI field is corresponding is kept in the buffer zone that described current data stream is corresponding;
If what extract is the partial information of HOST field, the initial TCP sequence number that the disappearance information of the partial information of the HOST field of extraction and HOST field is corresponding is kept in the buffer zone that described current data stream is corresponding.
Concrete, described determining unit, for splicing buffer zone canned data corresponding to described current data stream, specifically for:
The information of the partial information of the information of URI field, HOST field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of HOST field is spliced; Or
The information of the partial information of the information of HOST field, URI field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of URI field is spliced; Or
The information of the partial information of the information of the partial information of HOST field, the unknown field identical with initial TCP sequence number corresponding to the disappearance information of HOST field, URI field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of URI field is spliced.
Further, described processing unit, also for:
After the HTTP request message that abandons the information that comprises the URL relevant field that uses of the described complete URL of splicing, to the transmitting terminal of the HTTP message that abandons, send and abandon information.
Concrete, described processing unit, be used to setting up the search key set, specifically for:
Obtain forbidden search key;
Use known coded format to encode respectively to forbidden Chinese search keyword;
Forbidden Chinese search keyword after coding and the forbidden non-Chinese search keyword that obtains form the search key set.
Beneficial effect of the present invention is as follows:
the message filtering method that the embodiment of the present invention provides and device, for a data flow, when the URL of search engine burst is stored in a plurality of HTTP request messages, and during these HTTP packet out-orderings, can obtain the HTTP request message that arrives first, and extract the information of URL relevant field, then be kept in the buffer zone of current data stream, if in the buffer zone of current data stream correspondence, canned data can not be spliced into complete URL, obtain next HTTP request message, then extract the information of URL relevant field, be kept in the buffer zone of current data stream, if can be spliced into complete URL, after definite complete URL is the URL of search engine, just can extract search key, with the search key set, mate, and then determine whether to abandon the HTTP request message.Due to the information of only extracting URL relevant field in the HTTP request message, all HTTP request messages of buffer memory current data stream no longer, thus can save internal memory, improve the extraction efficiency of URL, and then improve packet filtering efficiency.
The accompanying drawing explanation
Fig. 1 is the flow chart of message filtering method in the embodiment of the present invention;
Fig. 2 is the structural representation of packet filtering device in the embodiment of the present invention.
Embodiment
For existing message filtering method at the URL burst, when out of order, the problem that packet filtering efficiency is lower, the embodiment of the present invention provides a kind of message filtering method, the flow process of this inventive method as shown in Figure 1, source port, destination interface, source internet protocol (the Internet Protocol of the message that data flow comprises, IP) address, these five elements of purpose IP address and agreement are all the same, and whether therefore can usually distinguish according to these five units is the message of same data flow.For each data flow, perform step as follows:
S10: the HTTP request message that obtains current data stream.
Because URL is carried in the HTTP request message, so only need to obtain the HTTP request message.
S11: from extracting the information of URL relevant field the HTTP request message that the current data of obtaining flows, and be kept in the buffer zone that current data stream is corresponding.
S12: determine that current data flows canned data in corresponding buffer zone and whether can be spliced into complete URL; If cannot, carry out S13; If of course, carry out S14;
S13: obtain the next HTTP request message of current data stream, carry out S11;
S14: determine whether complete URL is the URL of search engine, if carry out S15; Otherwise, carry out S18.
Can the prestore form of URL of search engine, after extracting complete URL, check whether complete URL meets the form of the URL of the search engine that prestores, thereby can determine whether complete URL is the URL of search engine.
S15: from extracting the information of search key field complete URL.
S16: whether the information of determining the search key field of extracting is stored in the search key set of setting up in advance, if exist, carries out S17; Otherwise, carry out S18.
The search key set can be the AC state machine, can certainly be a list, can be also other form.
S17: the HTTP request message that abandons the information that comprises the URL relevant field that uses of the complete URL of splicing.
S18: the HTTP request message of the information comprise the URL relevant field that the complete URL of splicing uses of letting pass.
this scheme is for a data flow, when the URL of search engine burst is stored in a plurality of HTTP request messages, and during these HTTP packet out-orderings, can obtain the HTTP request message that arrives first, and extract the information of URL relevant field, then be kept in the buffer zone of current data stream, if in the buffer zone of current data stream correspondence, canned data can not be spliced into complete URL, obtain next HTTP request message, then extract the information of URL relevant field, be kept in the buffer zone of current data stream, if can be spliced into complete URL, after definite complete URL is the URL of search engine, just can extract search key, with the search key set, mate, and then determine whether to abandon the HTTP request message.Due to the information of only extracting URL relevant field in the HTTP request message, all HTTP request messages of buffer memory current data stream no longer, thus can save internal memory, improve the extraction efficiency of URL, and then improve packet filtering efficiency.
Concrete, the URL relevant field comprises unknown field, URI field and HOST field; From the HTTP request message that the current data of obtaining flows, extracting the information of URL relevant field, and be kept in the buffer zone that current data stream is corresponding, specifically comprise:
If what extract is the complete information of URI field and/or HOST field, directly be kept in the buffer zone that current data stream is corresponding;
If what extract is the information of unknown field, the information of unknown field and corresponding initial TCP sequence number thereof are kept in the buffer zone that current data stream is corresponding;
If what extract is the partial information of URI field, the initial TCP sequence number that the disappearance information of the partial information of the URI field of extraction and URI field is corresponding is kept in the buffer zone that current data stream is corresponding;
If what extract is the partial information of HOST field, the initial TCP sequence number that the disappearance information of the partial information of the HOST field of extraction and HOST field is corresponding is kept in the buffer zone that current data stream is corresponding.
Here, after the end of message of the information of URI field, HOST field, have corresponding end sign, therefore, can, after the type of determining the URL relevant field, according to the information of URI field, the information of HOST field, whether there is the end sign to determine whether the information of the information of URI field, HOST field is complete.
Concrete, canned data in the buffer zone of the splicing current data stream correspondence in above-mentioned S12 specifically comprises:
The information of the partial information of the information of URI field, HOST field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of HOST field is spliced; Or
The information of the partial information of the information of HOST field, URI field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of URI field is spliced; Or
The information of the partial information of the information of the partial information of HOST field, the unknown field identical with initial TCP sequence number corresponding to the disappearance information of HOST field, URI field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of URI field is spliced.
if the complete information of URI field is carried at respectively in the message more than three and is sent, need like this to obtain the partial information of URI field and the information of plural unknown field, in splicing, obtain in the process of complete information of URI field, can at first inquire about the information of the unknown field identical with the initial TCP sequence number of the disappearance information of URI field, the partial information of URI field and the information of the unknown field that inquires are spliced, and the initial TCP sequence number of the disappearance information of modification URI field, the process of then inquiring about and splicing, until splice the complete information of URI field.If the complete information of HOST field is sent respectively in the message more than three, processing procedure is identical.
Concrete, in above-mentioned S17 after the HTTP request message that abandons the information that comprises the URL relevant field that the complete URL of splicing uses, also comprise: to the transmitting terminal of the HTTP message that abandons, send and abandon information.
After abandoning the HTTP request message, can send and abandon information to transmitting terminal, for example respond the page of a tabu search to client, point out this keyword to be prohibited to search for, thereby put forward H.D ease for use, also can avoid transmitting terminal to retransmit the performance waste that message brings.
Concrete, set up the search key set, specifically comprise: obtain forbidden search key; Use known coded format to encode respectively to forbidden Chinese search keyword; Forbidden Chinese search keyword after coding and the forbidden non-Chinese search keyword that obtains form the search key set.
If the search key of forbidding is " test ", uses the result after several known coded formats are encoded to " test " as follows:
(1) GBK:0xB20xE20xCA0xD4(explanation: the implication of this expression is, this character string always has 4 bytes, and wherein first byte is 0XB2, and second byte is 0xE2, and the 3rd byte is 0xCA, and the 4th byte is 0xD4).
(2)UTF-8:0xE60xB50x8B0xE80xAF0x95。
(3) GBK+URL coding: %B2%E2%CA%D4(explanation: this expression character string is " %B2%E2%CA%D4 ").
(4) UTF-8+URL coding: %E6%B5%8B%E8%AF%95.
Top four character strings are all added in the search key set.
If again, the search key of forbidding is " test ", owing to not comprising Chinese, therefore directly " test " is joined to the search key set and gets final product.
After by URL, obtaining search key, only need to once mate just know whether hit the search key of forbidding, and the processing of message being filtered or let pass according to matching result, greatly promote treatment effeciency, can better adapt to large capaciated flow network environment.
Below with two specific embodiments, above-mentioned message filtering method is described.
Example one: suppose to receive two HTTP request messages of same data flow, as follows for information about in message:
Message 1:
GET
/s?wd=mygod&rsv_spt=1&issp=1&rsv_bp=0&ie=utf-8&tn=baiduhome_pg?HTTP/1.1
Message 2:
Host:www.baidu.com
Although the information of the information of URI field and HOST field becomes scattered about in two different messages, but the information of the information of URI field and HOST field is complete in the message at its place, therefore extract respectively and preserve, from the information that extracts the URI field message 1, being:
/s?wd=mygod&rsv_spt=1&issp=1&rsv_bp=0&ie=utf-8&tn=baiduhome_pg;
From the information that extracts the HOST field message 2, be:
www.baidu.com。
The information of the URI field that finally will preserve and the splicing of the information of HOST field, just can obtain complete URL, and be as follows:
http://www.baidu.com/s?wd=mygod&rsv_spt=1&issp=1&rsv_bp=0&ie=utf-8&tn=baiduhome_pg。
Due to this URL, be the URL of search engine, according to the field name extraction search key of search key, be " mygod ", this search key and the AC state machine of setting up are in advance mated to scanning, find miss this search key, therefore let pass message 1 and message 2.
Example two: suppose to receive three HTTP request messages that same data flow is received, as described below for information about in message:
Message 1:
GET
/s?wd=%E6%B5%8B%E8%AF%95&rsv_spt=1&issp=1&rsv_bp=0&ie=utf-8&tn=baiduhome_pg?HTTP/1.1;
Message 2:
Host:www.bai;
Message 3:
du.com。
For message 1, comprised the information of complete URI field, after extraction, be:
/s?wd=%E6%B5%8B%E8%AF%95&rsv_spt=1&issp=1&rsv_bp=0&ie=utf-8&tn=baiduhome_pg。
For message 2, include the partial information of HOST field, after extraction, the initial TCP sequence number of the partial information of preservation HOST field and the disappearance information of HOST field, and the initial TCP sequence number of the disappearance information of HOST field is designated as to SN1, preserve form as follows:
HOST:www.bai, disappearance partial sequence number: SN1.
For message 3, " du.com " belongs to the information of unknown field, after extraction, preserves information and the initial TCP sequence number value thereof of unknown field, and the initial TCP sequence number of the information of unknown field is designated as to SN2.
In splicing and processing, find that SN1 equates with SN2, so the information of unknown field is namely the disappearance information of HOST field, the information of extracting in three messages is spliced, show that complete URL is as follows:
http://www.baidu.com/s?wd=%E6%B5%8B%E8%AF%95&rsv_spt=1&issp=1&rsv_bp=0&ie=utf-8&tn=baiduhome_pg。
Due to this URL, be the URL of search engine, extract search key and be " %E6%B5%8B%E8%AF%95 ", this search key is mated to scanning by the AC state machine of setting up in advance, discovery is hit search key " test " and is used the form after UTF-8+URL encodes, the content that belongs to tabu search, dropping packets 1,2,3, and the page from a tabu search to user side that respond is to client, point out this keyword to be prohibited search.
Based on same inventive concept, the embodiment of the present invention provides a kind of packet filtering device, and this device can be arranged in the network equipment, and the network equipment can be gateway device etc.The structure of packet filtering device as shown in Figure 2, comprising:
Storage unit 20, the HTTP request message that flows for the current data from obtaining extracts the information of URL relevant field, and is kept in the buffer zone that current data stream is corresponding.
Whether determining unit 21, can be spliced into complete URL for determining buffer zone canned data corresponding to current data stream; If cannot, turn to acquiring unit 22; If of course, turn to extraction unit 23.
Acquiring unit 22, be used to obtaining the next HTTP request message of current data stream, turn to storage unit 20.
Extraction unit 23, after being the URL of search engine at definite complete URL, from complete URL, extracting the information of search key field.
Whether processing unit 24, be stored in the search key set of setting up in advance for the information of determining the search key field of extracting; If exist, abandon the HTTP request message of the information that comprises the URL relevant field that the complete URL of splicing uses; If do not exist, the HTTP request message of the information comprise the URL relevant field that the complete URL of splicing uses of letting pass.
Concrete, the URL relevant field comprises unknown field, URI field and HOST field; Above-mentioned storage unit 20, extract the information of URL relevant field for the HTTP request message of the current data from obtaining stream, and be kept in the buffer zone that current data stream is corresponding, specifically for:
If what extract is the complete information of URI field and/or HOST field, directly be kept in the buffer zone that current data stream is corresponding;
If what extract is the information of unknown field, the information of unknown field and corresponding initial TCP sequence number thereof are kept in the buffer zone that current data stream is corresponding;
If what extract is the partial information of URI field, the initial TCP sequence number that the disappearance information of the partial information of the URI field of extraction and URI field is corresponding is kept in the buffer zone that current data stream is corresponding;
If what extract is the partial information of HOST field, the initial TCP sequence number that the disappearance information of the partial information of the HOST field of extraction and HOST field is corresponding is kept in the buffer zone that current data stream is corresponding.
Concrete, above-mentioned determining unit 21, for splicing buffer zone canned data corresponding to current data stream, specifically for:
The information of the partial information of the information of URI field, HOST field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of HOST field is spliced; Or
The information of the partial information of the information of HOST field, URI field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of URI field is spliced; Or
The information of the partial information of the information of the partial information of HOST field, the unknown field identical with initial TCP sequence number corresponding to the disappearance information of HOST field, URI field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of URI field is spliced.
Concrete, above-mentioned processing unit 24, also for:
After the HTTP request message that abandons the information that comprises the URL relevant field that uses of the complete URL of splicing, to the transmitting terminal of the HTTP message that abandons, send and abandon information.
Concrete, above-mentioned processing unit 24, be used to setting up the search key set, specifically for:
Obtain forbidden search key;
Use known coded format to encode respectively to forbidden Chinese search keyword;
Forbidden Chinese search keyword after coding and the forbidden non-Chinese search keyword that obtains form the search key set.
Obviously, those skilled in the art can carry out various changes and modification and not break away from the spirit and scope of the present invention the present invention.Like this, if within of the present invention these are revised and modification belongs to the scope of the claims in the present invention and equivalent technologies thereof, the present invention also is intended to comprise these changes and modification interior.

Claims (10)

1. a message filtering method, is characterized in that, for each data flow, carries out:
A, from the HTML (Hypertext Markup Language) HTTP request message of the current data obtained stream, extracting the information of uniform resource position mark URL relevant field, and be kept in the buffer zone that described current data stream is corresponding;
B, determine in buffer zone that described current data stream is corresponding, whether canned data can be spliced into complete URL, if can, D carried out; If cannot, carry out C;
C, obtain the next HTTP request message of described current data stream, carry out A;
D, after definite described complete URL is the URL of search engine, from described complete URL, extracting the information of search key field;
Whether the information of E, definite search key field of extracting is stored in the search key set of setting up in advance; If exist, abandon the HTTP request message of the information that comprises the URL relevant field that the described complete URL of splicing uses; If do not exist, the HTTP request message of the information comprise the URL relevant field that the described complete URL of splicing uses of letting pass.
2. the method for claim 1, is characterized in that, described URL relevant field comprises unknown field, Uniform Resource Identifier URI field and HOST field;
From the HTTP request message that the current data of obtaining flows, extracting the information of URL relevant field, and be kept in the buffer zone that described current data stream is corresponding, specifically comprise:
If what extract is the complete information of URI field and/or HOST field, directly be kept in the buffer zone that described current data stream is corresponding;
If what extract is the information of unknown field, the information of unknown field and corresponding original transmission control protocol TCP sequence number thereof are kept in the buffer zone that described current data stream is corresponding;
If what extract is the partial information of URI field, the initial TCP sequence number that the disappearance information of the partial information of the URI field of extraction and URI field is corresponding is kept in the buffer zone that described current data stream is corresponding;
If what extract is the partial information of HOST field, the initial TCP sequence number that the disappearance information of the partial information of the HOST field of extraction and HOST field is corresponding is kept in the buffer zone that described current data stream is corresponding.
3. method as claimed in claim 2, is characterized in that, splices canned data in the buffer zone that described current data stream is corresponding, specifically comprises:
The information of the partial information of the information of URI field, HOST field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of HOST field is spliced; Or
The information of the partial information of the information of HOST field, URI field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of URI field is spliced; Or
The information of the partial information of the information of the partial information of HOST field, the unknown field identical with initial TCP sequence number corresponding to the disappearance information of HOST field, URI field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of URI field is spliced.
4. the method for claim 1, is characterized in that, after the HTTP request message that abandons the information that comprises the URL relevant field that uses of the described complete URL of splicing, also comprises:
Transmitting terminal transmission to the HTTP message that abandons abandons information.
5. described method as arbitrary as claim 1-4, is characterized in that, sets up the search key set, specifically comprises:
Obtain forbidden search key;
Use known coded format to encode respectively to forbidden Chinese search keyword;
Forbidden Chinese search keyword after coding and the forbidden non-Chinese search keyword that obtains form the search key set.
6. a packet filtering device, is characterized in that, comprising:
Storage unit, the HTML (Hypertext Markup Language) HTTP request message that flows for the current data from obtaining extracts the information of uniform resource position mark URL relevant field, and is kept in the buffer zone that described current data stream is corresponding;
Whether determining unit, can be spliced into complete URL for determining buffer zone canned data corresponding to described current data stream; If cannot, turn to acquiring unit; If of course, turn to extraction unit;
Described acquiring unit, be used to obtaining the next HTTP request message of described current data stream, turn to described storage unit;
Described extraction unit, after being the URL of search engine at definite described complete URL, from described complete URL, extracting the information of search key field;
Whether processing unit, be stored in the search key set of setting up in advance for the information of determining the search key field of extracting; If exist, abandon the HTTP request message of the information that comprises the URL relevant field that the described complete URL of splicing uses; If do not exist, the HTTP request message of the information comprise the URL relevant field that the described complete URL of splicing uses of letting pass.
7. device as claimed in claim 6, is characterized in that, described URL relevant field comprises unknown field, Uniform Resource Identifier URI field and HOST field;
Described storage unit, extract the information of URL relevant field for the HTTP request message of the current data from obtaining stream, and be kept in the buffer zone that described current data stream is corresponding, specifically for:
If what extract is the complete information of URI field and/or HOST field, directly be kept in the buffer zone that described current data stream is corresponding;
If what extract is the information of unknown field, the information of unknown field and corresponding original transmission control protocol TCP sequence number thereof are kept in the buffer zone that described current data stream is corresponding;
If what extract is the partial information of URI field, the initial TCP sequence number that the disappearance information of the partial information of the URI field of extraction and URI field is corresponding is kept in the buffer zone that described current data stream is corresponding;
If what extract is the partial information of HOST field, the initial TCP sequence number that the disappearance information of the partial information of the HOST field of extraction and HOST field is corresponding is kept in the buffer zone that described current data stream is corresponding.
8. device as claimed in claim 7, is characterized in that, described determining unit, for splicing buffer zone canned data corresponding to described current data stream, specifically for:
The information of the partial information of the information of URI field, HOST field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of HOST field is spliced; Or
The information of the partial information of the information of HOST field, URI field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of URI field is spliced; Or
The information of the partial information of the information of the partial information of HOST field, the unknown field identical with initial TCP sequence number corresponding to the disappearance information of HOST field, URI field and the unknown field identical with initial TCP sequence number corresponding to the disappearance information of URI field is spliced.
9. device as claimed in claim 6, is characterized in that, described processing unit, also for:
After the HTTP request message that abandons the information that comprises the URL relevant field that uses of the described complete URL of splicing, to the transmitting terminal of the HTTP message that abandons, send and abandon information.
10. described device as arbitrary as claim 6-9, is characterized in that, described processing unit, be used to setting up the search key set, specifically for:
Obtain forbidden search key;
Use known coded format to encode respectively to forbidden Chinese search keyword;
Forbidden Chinese search keyword after coding and the forbidden non-Chinese search keyword that obtains form the search key set.
CN2013103058404A 2013-07-19 2013-07-19 Message filtering method and device Pending CN103401850A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013103058404A CN103401850A (en) 2013-07-19 2013-07-19 Message filtering method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013103058404A CN103401850A (en) 2013-07-19 2013-07-19 Message filtering method and device

Publications (1)

Publication Number Publication Date
CN103401850A true CN103401850A (en) 2013-11-20

Family

ID=49565376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013103058404A Pending CN103401850A (en) 2013-07-19 2013-07-19 Message filtering method and device

Country Status (1)

Country Link
CN (1) CN103401850A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646113A (en) * 2013-12-26 2014-03-19 北京西塔网络科技股份有限公司 Keyword restoration method and device
CN104954280A (en) * 2015-04-23 2015-09-30 华为技术有限公司 Data message processing method and device
CN105491027A (en) * 2015-11-25 2016-04-13 广西职业技术学院 Method and system for filtering hypertext transfer protocol (HTTP) connection request based on uniform resource locator (URL)
CN105938475A (en) * 2015-12-28 2016-09-14 杭州迪普科技有限公司 Keyword filtering method and device
CN105938472A (en) * 2015-08-26 2016-09-14 杭州迪普科技有限公司 Web access control method and device
CN106250497A (en) * 2016-08-02 2016-12-21 北京集奥聚合科技有限公司 A kind of analysis method of APP application shop search key
CN106961443A (en) * 2017-04-26 2017-07-18 杭州迪普科技股份有限公司 The filter method and device of a kind of message
CN107104993A (en) * 2016-02-19 2017-08-29 中国移动通信集团公司 A kind of transmission of Uniform Resource Identifier, preparation method and device
CN109076381A (en) * 2016-05-13 2018-12-21 华为技术有限公司 Business data flow sending method and device
CN112187935A (en) * 2020-09-30 2021-01-05 杭州迪普科技股份有限公司 Information identification method and read-only memory

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102780681A (en) * 2011-05-11 2012-11-14 中兴通讯股份有限公司 URL (Uniform Resource Locator) filtering system and URL filtering method
CN102868693A (en) * 2012-09-17 2013-01-09 苏州迈科网络安全技术股份有限公司 URL (Uniform Resource Locator) filtering method and URL (Uniform Resource Locator) filtering system aiming at HTTP (Hyper Text Transport Protocol) segment request

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102780681A (en) * 2011-05-11 2012-11-14 中兴通讯股份有限公司 URL (Uniform Resource Locator) filtering system and URL filtering method
CN102868693A (en) * 2012-09-17 2013-01-09 苏州迈科网络安全技术股份有限公司 URL (Uniform Resource Locator) filtering method and URL (Uniform Resource Locator) filtering system aiming at HTTP (Hyper Text Transport Protocol) segment request

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646113A (en) * 2013-12-26 2014-03-19 北京西塔网络科技股份有限公司 Keyword restoration method and device
CN104954280B (en) * 2015-04-23 2018-10-09 华为技术有限公司 A kind of data message processing method and device
CN104954280A (en) * 2015-04-23 2015-09-30 华为技术有限公司 Data message processing method and device
CN105938472A (en) * 2015-08-26 2016-09-14 杭州迪普科技有限公司 Web access control method and device
CN105491027A (en) * 2015-11-25 2016-04-13 广西职业技术学院 Method and system for filtering hypertext transfer protocol (HTTP) connection request based on uniform resource locator (URL)
CN105491027B (en) * 2015-11-25 2019-01-01 广西职业技术学院 The method and system that HTTP connection request is filtered based on URL
CN105938475A (en) * 2015-12-28 2016-09-14 杭州迪普科技有限公司 Keyword filtering method and device
CN107104993A (en) * 2016-02-19 2017-08-29 中国移动通信集团公司 A kind of transmission of Uniform Resource Identifier, preparation method and device
CN109076381A (en) * 2016-05-13 2018-12-21 华为技术有限公司 Business data flow sending method and device
CN109076381B (en) * 2016-05-13 2021-02-23 华为技术有限公司 Service data stream sending method and device
US11102671B2 (en) 2016-05-13 2021-08-24 Huawei Technologies Co., Ltd. Service data flow sending method and apparatus
CN106250497A (en) * 2016-08-02 2016-12-21 北京集奥聚合科技有限公司 A kind of analysis method of APP application shop search key
CN106961443A (en) * 2017-04-26 2017-07-18 杭州迪普科技股份有限公司 The filter method and device of a kind of message
CN112187935A (en) * 2020-09-30 2021-01-05 杭州迪普科技股份有限公司 Information identification method and read-only memory

Similar Documents

Publication Publication Date Title
CN103401850A (en) Message filtering method and device
CN108206802B (en) Method and device for detecting webpage backdoor
US9893970B2 (en) Data loss monitoring of partial data streams
CN104125209B (en) Malice website prompt method and router
CN101132420B (en) Link overwriting method and device based on SSL VPN
CN102361484B (en) Passive network performance measuring system and page identification method thereof
CN104283723B (en) Network access log processing method and processing device
CN105812351B (en) Realize the shared method and system of session
Grigorik Making the web faster with HTTP 2.0
CN101895516B (en) Method and device for positioning cross-site scripting attack source
WO2012071951A1 (en) Method and device used in acquiring parameters for general analysis of protocol and in general analysis of protocol
CN103581130B (en) data compression processing method, system and device
CN110430188B (en) Rapid URL filtering method and device
EP2482517A1 (en) Method, apparatus and system for protocol identification
CN102694830A (en) Method, system and apparatus for realizing network content sharing
CN102523196B (en) Information identification method, device and system
CN103825772B (en) Identifying user clicks on the method and gateway device of behavior
CN102893576A (en) Method and device for mitigating cross-site vulnerabilities
JP5364012B2 (en) Data extraction apparatus, data extraction method, and data extraction program
CN103354546A (en) Message filtering method and message filtering apparatus
CN101184002A (en) Point-to-point flux deepness monitoring method and equipment
JP6548823B2 (en) Real-time validation of JSON data applying tree graph properties
US9584537B2 (en) System and method for detecting mobile cyber incident
CN109788050B (en) Method, system, electronic device and medium for acquiring IP address of source station
CN111901218A (en) Message transmission method, SSLVPN proxy server, electronic device and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20131120

RJ01 Rejection of invention patent application after publication