WO2022001577A1 - 一种基于白名单的内容锁防火墙方法及系统 - Google Patents

一种基于白名单的内容锁防火墙方法及系统 Download PDF

Info

Publication number
WO2022001577A1
WO2022001577A1 PCT/CN2021/098312 CN2021098312W WO2022001577A1 WO 2022001577 A1 WO2022001577 A1 WO 2022001577A1 CN 2021098312 W CN2021098312 W CN 2021098312W WO 2022001577 A1 WO2022001577 A1 WO 2022001577A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
data packet
parsed
pattern library
payload
Prior art date
Application number
PCT/CN2021/098312
Other languages
English (en)
French (fr)
Inventor
张文力
万文凯
陈明宇
Original Assignee
中国科学院计算技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算技术研究所 filed Critical 中国科学院计算技术研究所
Publication of WO2022001577A1 publication Critical patent/WO2022001577A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0263Rule management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers

Definitions

  • the present invention relates to the field of network security, in particular to the field of firewalls.
  • Firewall is the main means of website security protection. Firewalls can be divided into packet filtering firewalls and content filtering firewalls. Packet filtering firewalls are widely used in uncomplicated networks, because they only check the packet header information of the IP layer and the TCP layer, and the speed is relatively fast. The main problem with packet filtering firewalls is the inability to inspect the contents of packets. Content filtering Firewall can filter the contents of data packets, and the detected contents can be URL addresses, signature codes, etc. URL is the unique identification of every web page and its resources on the Internet. Use URL filtering to release or intercept the HTTP request packet by extracting the URL field in the GET/POST request of the user's HTTP connection, and judging the validity of the URL.
  • a website When choosing a firewall, a website needs to consider factors such as its own business volume, service objects, content sensitivity and maintenance costs.
  • factors such as its own business volume, service objects, content sensitivity and maintenance costs.
  • the functions of most websites are relatively fixed, and the need for updating is not high, such as the official websites of colleges and universities, enterprises and institutions, and the portal websites of some small companies. According to statistics, about 68.3% of the websites are updated once more than 6 months.
  • Such websites are generally unwilling or unable to bear the high cost of professional maintenance and only use simple firewall protection, thereby exposing the website to the risk of being attacked.
  • the filtering rules of the firewall can be divided into whitelist and blacklist.
  • Whitelist means that all traffic matching the rule will be allowed
  • blacklist means that the traffic matching the rule will be blocked.
  • packet filtering firewalls use both whitelists and blacklists, and content filtering firewalls generally use blacklists.
  • the key problem of traditional firewalls is that the blacklist is incomplete. Even if the learning rule base is continuously updated, there is still the possibility of unknown types of attacks, and the maintenance cost of the blacklist is high.
  • the present invention proposes a whitelist-based content locking firewall method and system, including the following:
  • a whitelist-based content locking firewall method including:
  • Step 200 Semantically parse the payload of the data packet received by the website, and obtain the parsed text of the received data packet;
  • Step 300 Match the parsed text of the data packet received by the website with a text pattern library to decide whether to forward or intercept the received data packet, the text pattern library contains the value fields and structures of multiple keywords feature.
  • the text pattern library can also be generated by writing settings.
  • the step 200 further includes: performing semantic analysis on the payload of the data packet received by the website, and obtaining the value domain and structural feature of the parsed text of the received data packet; and
  • Step 300 further includes:
  • Step 110 preprocesses normal traffic
  • Step 120 parses the payload according to the keyword to obtain parsed text
  • Step 130 classifies the obtained large amount of parsed texts according to specific rules, the classification rules include web page URLs or payload keywords, trains and learns each type of parsed texts, and obtains the value range and structural features of the parsed texts;
  • Step 140 aggregates the value ranges and structural features of all parsed texts to form a text feature library, and then converts the text features in the text feature library into text patterns one by one to form a text pattern library.
  • the interception log is reviewed by using a counter trigger or a fixed time.
  • step 400 further includes:
  • the intercepted data packet payload is parsed to obtain the value fields and structural features of multiple keywords, and the obtained text features are added to the text pattern library.
  • using the counter to trigger includes:
  • a computing system comprising:
  • the storage device is used for storing one or more computer programs, and the computer programs are used for implementing the whitelist-based content locking firewall method of the present invention when executed by the processor.
  • the advantages of the embodiments of the present invention are: using normal traffic as a sample to train and generate a text pattern library can effectively defend against known and new network attacks. Whether the text content of the entire data packet complies with the text pattern library is used as the detection condition instead of just using URL or signature code, which can prevent attackers from launching attacks by exploiting the blind spot of detection conditions.
  • the exploitation of the site's own system vulnerabilities is usually not a normal access behavior. Does not conform to the text mode library and can be filtered out directly.
  • this solution does not require high-cost maintenance of the blacklist database, and can defend against new network attack modes. For websites developed by third parties that are difficult to update, by deploying this firewall, they can run with loopholes while ensuring normal functions, and do not need to spend expensive upgrades.
  • Figure 2 shows a system block diagram according to one embodiment of the present invention.
  • the inventor of the present patent proposes a whitelist-based content lock data packet filtering method, which uses the text pattern library as a whitelist and allows only trusted text content to pass through the firewall.
  • This method makes use of the regularity of the normal traffic content of the website, performs text parsing and training on the normal traffic content, and learns to generate a text pattern library.
  • the content detection process only the data packets that are completely matched by the text pattern library are allowed to pass through the firewall. , and perform background review on unmatched packets, and update the text pattern library as needed to prevent false interception.
  • Figure 1 shows a flow chart of an embodiment of the method, including steps 100, 200, 300 and 400, and the specific content of each step is as follows:
  • Step [100] includes the following steps:
  • Methods for collecting normal traffic include, but are not limited to, collection in a closed environment and labeling after collection in an open environment. Then, the traffic is parsed into a single packet, and the unloaded data packet is filtered according to the length field of the packet network layer header. If the data packet has a load, it will go to step [120] for further processing, otherwise the next data packet of normal traffic will continue to be processed. Determine whether there is a load;
  • the URL cannot contain non-ASCII characters.
  • the non-ASCII characters in the URL will be converted into ASCII characters. is UTF-8) into the code point in Unicode, and then use the hexadecimal form of the corresponding value of the code point, add the % sign prefix to form an encoded string, and use this string to replace the corresponding non-ASCII in the URL.
  • String for this kind of URL, format conversion is required before processing.
  • AppleWebKit/537.36 KHTML,like Gecko)Chrome/55.0.2883.87
  • keywords can also be obtained from other parts of the HTTP header, such as cookies.
  • sid K7wWaFvPa4s3dYLAACpk
  • the value domain includes the range of text character type and text length; the structural feature refers to the structural form of the parsed text.
  • the text containing the cookie field in the request to access the same URL is classified into one category
  • the text containing the sid field is classified into one category
  • the text containing the t field is classified into one category.
  • the value domain type of the example Cookie in step [120] includes letters, numbers and special characters, and the length is 20.
  • the text character types of the example sid are only letters and numbers, the letters include upper and lower case, and the text length is 20. While the text character type of example t includes letters and _, does not contain numbers, and the text length is 7.
  • string 1 has two values application and text
  • string 2 also has two values json and plain, and application and json appear in pairs. Only two examples are given here, when there are more examples, the string may take more values.
  • Step [140] Aggregate all the value ranges and structural features of the parsed text obtained in step [130] to form a text feature library, and then convert the text features in the text feature library into text patterns one by one to form a text pattern library.
  • the network card receives the data packet, parses the packet header and filters the data packet without payload according to the length field, performs semantic analysis on the data packet with the payload, and decompresses the data packet and converts the format if necessary.
  • Step [200] includes the following steps:
  • Step [300] Match the content lock firewall, and execute the release or block operation
  • Step [300] includes the following steps:
  • Step [310] Match the text pattern library obtained in step [100] with the parsed text obtained in step [200], and the matching range includes the character type of the text, the text length and the text structure, etc.;
  • Step [400] Review the log and update the text schema library as needed
  • Step [400] includes the following steps:
  • Step [410] The administrator reviews the interception log in the background, reviews the intercepted data packet, and checks the content of the data packet payload. If the data packet belonging to the normal range is intercepted, a text mode is generated for the data packet according to the process of step [120] to step [140], and then the new text mode is added to the text mode library to realize the update.
  • a text mode is generated for the data packet according to the process of step [120] to step [140], and then the new text mode is added to the text mode library to realize the update.
  • FIG. 2 shows a system embodiment of a whitelist-based content lock packet filtering method, including modules 10, 20, 30 and 40, the following is their introduction:
  • Module 10 a text mode library generation module, which preprocesses normal traffic, processes the traffic into a single data packet, and filters unloaded data packets according to the length field of the data packet header. Then, semantic parsing is performed on each data packet payload to obtain a plurality of parsed texts, and the parsed texts are classified according to the agreed form. Train and learn the features of each type of text, including text value range, structure, etc., and finally combine all text features to generate a text pattern library. The text pattern library can also be directly written and set by the website content designer. Module 10 further includes modules 11 , 12 , 13 , 14 .
  • Module 11 Parse the normal traffic into a single data packet by packet, filter the unloaded data packet according to the length field of the packet header, and then send the payload data packet to the module 12 for processing;
  • Module 12 Extract the payload of the data packet obtained by the module 11, and then perform semantic parsing on the payload according to key fields to obtain parsed text. If necessary, perform operations such as decompression and format conversion on the data packet; for example, if it is an HTTP data packet, the key fields are derived from the fields specified by the HTTP protocol, such as HOST, GET, Cookie, User-Agent, Origin, etc. Observe Can the parsed text obtained from preliminary parsing be further parsed, such as the parsed text POST /players/8/?
  • Module 13 Classify a large number of parsed texts obtained by modules 11 and 12, and the classification rules may be but not limited to web page URLs or payload keywords; train and learn each type of parsed text to obtain the range and structural features of the parsed text.
  • the value range includes the range of text character type and text length; the structural feature refers to the structural form of the parsed text;
  • accessing the text containing the keyword sid in the same URL can be divided into one category.
  • the text content of the keyword sid consists of numbers and letters, and the length is fixed to 20.
  • Module 14 Aggregate all the value ranges and structural features of the parsed text obtained by the aggregation module 13 to form a text feature library, and then convert the text features in the text feature library into text patterns one by one to form a text pattern library.
  • Module 20 a data packet parsing module, used for parsing and filtering the data packets, this module processes the data packets received by the network card, first filters the data packets without payload according to the packet header, and then performs semantics on the payload data packets. Parsing, if you need to perform format conversion, decompression, etc., to obtain text that can be recognized by the text pattern library. Module 20 further includes modules 21,22.
  • Module 21 Parse the data packet header, filter the unloaded data packet according to the length field of the data packet header, the judgment method is the same as that of module 11, and obtain the payload data packet;
  • Module 22 Extract the packet payload obtained by the module 21, and then perform semantic parsing on the payload according to key fields to obtain parsed text. If necessary, perform operations such as decompression and format conversion on the data packet;
  • Module 30 a data packet filtering module, this module matches the text pattern library obtained by module 10 with the parsed text obtained by module 20, the matching range includes text character type, text length, text structure, etc., and then checks the matching results, all of which are correctly matched The data packet is forwarded through the network card, otherwise the data packet is intercepted and the interception log is written. At the same time, the counter is used to accumulate the intercepted data packets. When the cumulative intercepted data packets exceed the set threshold, an early warning is triggered to prompt the administrator to view the interception log. , and reset the counter. Module 30 further includes modules 31 , 32 .
  • Module 31 Match the text pattern library obtained by module 10 with all the parsed texts obtained by module 20, and the matching range includes text character type, text length, text structure, etc.;
  • Module 32 Check the matching result of module 31, if both the parsed text and the text pattern library match, forward the data packet, and return to module 21 to check the next data packet, if not completely matched, intercept the data packet, and will intercept the data packet Recorded in the interception log, use the counter to count the intercepted data packets. If the counter exceeds the set threshold, an early warning can be triggered to prompt the administrator to view the interception log, then reset the counter, and then go back to module 21 to start checking the next data packet.
  • Module 40 a background check module for intercepting packets, which is used to review the intercepted data packets to prevent the normal access website data packets from being intercepted by the module 32, and check according to the field content of the data packet payload. If the normal data packets are intercepted, the data The textual features of the package are appended to the textual pattern library obtained in module 12. Module 40 further includes module 41 .
  • Module 41 The administrator checks the interception log and checks the payload content of the intercepted data packet. If the normal data packets are intercepted, the data packets are sent to the module 12 for processing, and the text features of the data packets are added to the text mode library. There are two ways to trigger the viewing of the interception log. One is to trigger the counter in the module 32, and the other is to view it at regular intervals. The granularity of the interval time can be adjusted according to the amount of error interception.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer And Data Communications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明提供一种基于白名单的内容锁防火墙方法,包括:步骤200:对网站收到的数据包载荷进行语义解析,获得收到的数据包的解析文本;步骤300:将所述网站收到的数据包的解析文本与文本模式库进行匹配,以决定对收到的数据包进行转发还是拦截,所述文本模式库包含多个关键字的值域和结构特征。基于本发明的实施例,对于功能相对固定的网站,通过部署本防火墙可以有效防御已知的和新型的网络攻击,可以在保障正常功能的情况下带着漏洞运行,不需要花高昂成本升级。

Description

一种基于白名单的内容锁防火墙方法及系统 技术领域
本发明涉及网络安全领域,尤其涉及防火墙领域。
背景技术
防火墙是网站安全防护的主要手段。防火墙从功能上可以分为包过滤防火墙和内容过滤防火墙。包过滤防火墙在不复杂网络中被广泛使用,因为只检查IP层和TCP层的包头信息,速度比较快。包过滤防火墙的主要问题是无法对数据包的内容进行核查。内容过滤防火墙可以对数据包内容进行过滤,检测内容可以是URL地址、特征码等。URL是Internet上每一个网页及其资源的唯一的标识。使用URL过滤,通过提取用户HTTP连接GET/POST请求中的URL字段,并判断URL合法性对HTTP请求包进行放行或者拦截。通常内容过滤防火墙厂商使用大数据技术对恶意流量进行分析、训练和学习得到恶意流量检测模型库,并用来对访问流量进行过滤。这种技术方案的优点是可以追加新的恶意流量检测模型,有效提升网络恶意流量检测的精确度。缺点是从新型网络攻击出现到建立并更新模型有时间差,这期间的攻击无法被拦截。
此外,针对特定网站的漏洞攻击,使用大数据技术无法有效学习到。系统漏洞修复一直是Web应用程序不得不面对的问题。对于系统漏洞问题,除了定期升级系统软件外,使用扫描软件对自身网站进行漏洞扫描也是一种方案,但网站漏洞扫描软件基本都需要付费。平台扫描是近几年兴起的另一种解决方式,要将网站提交到平台认证,认证后将扫描结果通过邮件以漏洞清单的形式发给用户。上面提到的方案也只是查找网站漏洞,找到漏洞后还需要专业人员去修复漏洞,另外,据Imperva统计超过三分之一的Web应用程序漏洞没有可用的解决方案。
网站挑选防火墙需要考虑其自身业务量、服务对象、内容敏感度和维护成本等因素。多数网站功能相对固定,更新需求不高,比如高校和企事业单位的官方网站,以及一些小公司的门户网站,据统计约68.3%的网站 超过6个月才更新一次。这类网站一般不愿或不能承担专业维护的高成本,只使用简单的防火墙防护,从而使网站暴露在被攻击的风险中。
发明内容
防火墙的过滤规则可以分为白名单和黑名单。白名单是指匹配规则的流量将都被放行,黑名单是指匹配规则的流量将被拦截。通常包过滤防火墙同时使用白名单和黑名单,内容过滤防火墙一般以黑名单为主。传统防火墙的关键问题是黑名单是不完备的,即使不断更新学习规则库,仍然存在未知类型攻击的可能,并且黑名单的维护成本高。
为解决这一问题,本发明提出了一种基于白名单的内容锁防火墙方法及系统,包括以下:
根据本发明的第一方面,提供了一种基于白名单的内容锁防火墙方法,包括:
步骤200:对网站收到的数据包载荷进行语义解析,获得收到的数据包的解析文本;
步骤300:将所述网站收到的数据包的解析文本与文本模式库进行匹配,以决定对收到的数据包进行转发还是拦截,所述文本模式库包含多个关键字的值域和结构特征。
在本发明的一个实施例中,其中,所述文本模式库通过下述方式生成:
根据对网站已有的正常流量的数据包载荷进行语义解析,通过对解析的文本训练学习,获得多个关键字的值域和结构特征。
在本发明的一个实施例中,其中,所述文本模式库也可通过编写设定生成。
在本发明的一个实施例中,其中步骤200还包括:对网站收到的数据包载荷进行语义解析,获得收到的数据包的解析文本的值域和结构特征;以及
步骤300进一步包括:
对收到数据包的所有解析文本的值域和结构特征与文本模式库进行逐一匹配,如果都正确匹配,则匹配成功,否则匹配失败;
如果匹配成功,则转发收到的数据包;
如果匹配失败,则拦截收到的数据包,并写入拦截日志。
在本发明的一个实施例中,其中,所述文本模式库的生成包括:
步骤110对正常流量进行预处理;
步骤120按照关键字对载荷进行解析获得解析文本;
步骤130根据特定的规则对得到的大量解析文本进行分类,分类规则包括网页URL或有效载荷关键字,训练学习每一类解析文本,得到解析文本的值域和结构特征;
步骤140聚合所有解析文本的值域和结构特征形成文本特征库,再将文本特征库中的文本特征逐条转换成文本模式,形成文本模式库。
在本发明的一个实施例中,还包括步骤400:
对拦截日志进行复查;
其中,采用计数器触发或者固定时间对拦截日志进行复查。
在本发明的一个实施例中,步骤400还包括:
对于属于正常访问但被拦截的数据包,则对被拦截的数据包载荷进行解析,获得多个关键字的值域和结构特征,将得到的文本特征追加到所述文本模式库。
在本发明的一个实施例中,其中使用计数器触发包括:
使用计数器对拦截包个数累积计数,当计数器超过设置的阈值时,触发警告提醒查看拦截日志,然后重置计数器。
根据本发明的第二方面,提供了一种计算机可读存储介质,其中存储有一个或者多个计算机程序,所述计算机程序在被执行时用于实现本发明的基于白名单的内容锁防火墙方法。
根据本发明的第三方面,提供了一种计算系统,包括:
存储装置、以及一个或者多个处理器;
其中,所述存储装置用于存储一个或者多个计算机程序,所述计算机程序在被所述处理器执行时用于实现本发明的基于白名单的内容锁防火墙方法。
与现有技术相比,本发明的实施例的优点在于:采用正常流量作为样本训练生成文本模式库可以有效防御已知的和新型的网络攻击。将整个数据包的文本内容是否符合文本模式库作为检测条件而不是仅仅使用URL或 者特征码可以避免攻击者利用检测条件的盲点发起攻击,对于站点自身系统漏洞的利用,通常都不是正常访问行为,不符合文本模式库,可以直接被过滤掉。相比于传统防火墙,本方案不需要高成本维护黑名单库,且能防御新型网络攻击模式。对于更新困难的第三方开发的网站,通过部署本防火墙可以在保障正常功能的情况下带着漏洞运行,不需要花高昂成本升级。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并与说明书一起用于解释本发明的原理。显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1示出了根据本发明的一个实施例的处理流程图。
图2示出了根据本发明的一个实施例的系统框图。
具体实施方式
如背景技术中所介绍,传统防火墙技术存在应对缓慢或维护成本较高等问题,因此需要一种更为方便快捷的方法。为此,本专利发明人提出了一种基于白名单的内容锁的数据包过滤方法,其将文本模式库作为白名单,只允许受信任的文本内容通过防火墙。该方法利用网站的正常流量内容是有规律可循的特征,对正常流量内容进行文本解析并训练、学习生成文本模式库,在内容检测过程中仅允许被文本模式库完全匹配的数据包通过防火墙,并对未匹配的数据包进行后台复查,并按需更新文本模式库防止错误拦截。图1示出了该方法的一个实施例的流程图,包含步骤100、200、300和400,每个步骤的具体内容如下:
步骤[100]生成文本模式库
对正常流量中的每一个数据包载荷进行语义解析得到多个解析文本,并按照约定形式对解析文本进行分类。训练学习每一类文本的特征,包括文本值域、结构等,将不同的文本特征聚集在一起形成文本模式库,文本 模式库也可以由网站内容设计者直接编写设定。步骤[100]包括以下步骤:
步骤[110]:采集正常流量并对正常流量进行预处理。采集正常流量的方法包括但不限于封闭环境采集以及开放环境的采集后标注。然后,将流量解析为单个的包,并根据数据包网络层包头的长度字段过滤无载荷数据包,如果数据包有载荷则进入步骤[120]进一步处理,否则处理正常流量的下一个数据包继续判断是否有载荷;
步骤[120]:提取数据包的载荷,然后按关键字段对载荷进行语义解析得到解析文本;如果需要,对数据包进行解压缩和格式转换。
例如URL中不能包含非ASCII字符,当需要使用ASCII码无法表示的内容时,会将URL中的非ASCII字符转换为ASCII字符,采用的规则是:将非ASCII字符按照一定的编码方式(典型的就是UTF-8)转换为Unicode中的码位,然后用码位对应数值的16进制形式,加上%号前缀,形成编码后的字符串,用这个字符串替换掉URL中对应的非ASCII字符串,对于这种URL需做格式转换后再进行处理。
如果是HTTP包,则关键字段源自HTTP协议规定的字段,如HOST,GET,Cookie,User-Agent,Origin等,解析文本以关键字段开头,以“\r\n”结尾。
比如对于如下HTTP消息
GET/player_auth?data=d1ff06c5adce6daee3264fbedebc904dda19ef6035ffbf5ed29973445940f34de80e06cba65bbfaffedfb7af8d8d7c0ef39994e1c1fefaef16d37285ec9bea7dec5e44326a61069ea26c6d5cba7dc75f HTTP/1.1
Host:wf1.394225.com:8855
Cdn-Src-Ip:61.166.74.129
User-Agent:Mozilla/5.0(Windows NT 6.1;WOW64)AppleWebKit/537.36
(KHTML,like Gecko)Chrome/55.0.2883.87 Safari/537.36
Origin:http://www.wf5558.com
Accept:*/*
Referer:http://www.wf5558.com/
Accept-Language:zh-CN,zh;q=0.8
Accept-Encoding:gzip
X-Via:1.1 erdianxin12:10(Cdn Cache Server V2.0)
Connection:keep-alive
Cookie:io=Mehilvc_LHaXUav4AAAS
可被解析成如下12个解析文本
(1)GET/player_auth?data=d1ff06c5adce6daee3264fbedebc904dda19ef6035ffbf5ed29973445940f34de80e06cba65bbfaffedfb7af8d8d7c0ef39994e1c1fefaef16d37285ec9bea7dec5e44326a61069ea26c6d5cba7dc75f HTTP/1.1
(2)Host:wf1.394225.com:8855
(3)Cdn-Src-Ip:61.166.74.129
(4)User-Agent:Mozilla/5.0(Windows NT 6.1;WOW64)
AppleWebKit/537.36(KHTML,like Gecko)Chrome/55.0.2883.87
Safari/537.36
(5)Origin:http://www.wf5558.com
(6)Accept:*/*
(7)Referer:http://www.wf5558.com/
(8)Accept-Language:zh-CN,zh;q=0.8
(9)Accept-Encoding:gzip
(10)X-Via:1.1 erdianxin12:10(Cdn Cache Server V2.0)
(11)Connection:keep-alive
(12)Cookie:io=Mehilvc_LHaXUav4AAAS
观察解析文本,以确定是否进一步分解。例如URL的格式为protocol://hostname[:port]/path/[;parameters][?query]#fragment,其中符号“?”标识其后的字符串为query,query的格式为field=value,先后按“&”和“=”切割query字符串,就可以获得对应的关键字,例如对于URL中的query“?EIO=3&transport=polling&t=MEHG-Fo&sid=BRBOevtVU7wsYymWAAAS”,用此方法可以获得对应的关键字{EIO,transport,t,sid}及其value。除URL外,关键字也可从HTTP消息头的其它部分获得,例如Cookie等。
例如解析文本POST/players/8/?EIO=3&transport=polling&t=MEvFx_q&sid=K7wWaFvPa4s3d YLAACpk HTTP/1.1就可以通过关键字transport,t,sid等进行进一步分解,将原文本分解成如下6个短文本:
(1)POST/players/8/;
(2)EIO=3;
(3)transport=polling;
(4)t=MEvFx_q;
(5)sid=K7wWaFvPa4s3dYLAACpk;
(6)HTTP/1.1。
步骤[130]:根据特定的规则对步骤[110]和步骤[120]得到的大量解析文本进行分类,分类规则可以是但不限于网页URL或有效载荷关键字;训练学习每一类解析文本,得到解析文本的值域和结构特征。值域包括文本字符类型范围,文本长度范围;结构特征是指解析文本的构造形式。
例如,将访问同一URL的请求中含有Cookie字段的文本归为一类,含sid字段的文本归为一类,含有t字段的归为一类文本。
例如步骤[120]中示例Cookie的值域类型含字母、数字和特殊字符,长度为20。示例sid的文本字符类型只有字母和数字,字母包括大小写,文本长度为20。而示例t的文本字符类型包括字母和_,不包含数字,文本长度为7。
比如字段Content-Type的文本内容有Content-Type:application/json;charset=utf-8,也可以有Content-Type:text/plain;charset=UTF-8。从给出的两个示例可以看出,Content-Type字段的解析文本的结构特征可以表述成“Content-Type:字符串1/字符串2;字符串3=字符串4”。其中,字符串1有两个取值application和text,字符串2也有两个取值json和plain,且application和json是成对出现的。这里只是给出了两个示例,当有更多示例时,字符串的取值可能更多。
步骤[140]:聚合步骤[130]得到的所有解析文本的值域和结构特征形成文本特征库,再将文本特征库中的文本特征逐条转换成文本模式,形成文本模式库。
如正则表达式便是文本模式的一种。上述Cookie字段的文本特征结构可以用正则表达式表示成Cookie:io=[-0-9a-zA-Z_]{20}。而含有t字段的文本特征结构可以用正则表达式表示成t=[-a-zA-Z_]{7}。
步骤[200]流量内容语义解析
网卡接收数据包,解析数据包包头并根据长度字段过滤没有载荷的数据包,对有载荷的数据包进行语义解析,如果需要,对数据包进行解压缩和格式转换等操作。步骤[200]包括以下步骤:
步骤[210]:网卡收到数据包后,解析数据包头,根据包头长度字段判断是否包含有效载荷,如果具有有效载荷,则进入步骤[220];对无载荷的数据包根据公认的规则判断处理,这一部分功能本专利不涉及。
步骤[220]:提取数据包的载荷,对载荷进行语义解析,如果需要,对数据包进行格式转换和解压缩等操作。
步骤[300]匹配内容锁防火墙,执行放行或拦截操作
内容锁防火墙将步骤[100]得到的文本模式库与步骤[200]得到的解析文本进行匹配检查,所有文本的字符类型、长度范围、结构等都与文本模式库正确匹配则转发数据包,否则拦截数据包并写入拦截日志,然后回到步骤[200]。可使用计数器对拦截数据包计数,计数器超过阈值则触发预警,提示管理员查看拦截日志。步骤[300]包括以下步骤:
步骤[310]:将步骤[100]得到的文本模式库与步骤[200]得到的解析文本进行匹配,匹配范围包括文本的字符类型、文本长度和文本结构等;
步骤[320]:一个数据包的载荷的所有文本都能被文本模式库正确匹配,则认为该数据包合法,同时转发该数据包,然后回到步骤[210]继续执行。如不能完全匹配,则拦截该数据包,将拦截数据包写入日志,并采用计数器对拦截包个数累积计数,然后回到步骤[210]开始执行,当计数器超过设置的阈值时触发警告提醒管理员查看拦截日志,然后重置计数器;
步骤[400]复查日志并按需更新文本模式库
管理员后台查看拦截日志对步骤[320]拦截的数据包进行复查,如果属于正常访问但被拦截,则对数据包载荷进行训练和学习,将得到的文本特征追加到文本模式库。步骤[400]包括以下步骤:
步骤[410]:管理员后台复查拦截日志,对拦截数据包进行复查,检查数据包载荷内容。如果属于正常范围数据包被拦截,则对数据包按照步骤[120]到步骤[140]的过程生成文本模式,再将新的文本模式追加到文本模式库实现更新。对触发查看拦截日志可有两种方式,一是步骤[320]中 的计数器触发,二是每隔固定时间查看,间隔时间粒度可以根据错误拦截量调整。
图2示出了基于白名单的内容锁的数据包过滤方法的一个系统实施例,包括模块10、20、30和40,以下是它们的介绍:
模块10:文本模式库生成模块,其对正常流量进行预处理,将流量处理成单个的数据包,并根据数据包包头的长度字段过滤无载荷数据包。然后对每一个数据包载荷进行语义解析得到多个解析文本,并按照约定形式对解析文本进行分类。训练学习每一类文本的特征,包括文本值域、结构等,最后合并所有文本特征生成文本模式库,文本模式库也可以由网站内容设计者直接编写设定。模块10进一步包括模块11、12、13、14。
模块11:将正常流量按包解析成单个数据包,并根据数据包包头的长度字段过滤无载荷数据包,然后将有载荷数据包送到模块12进行处理;
模块12:提取模块11得到的数据包的载荷,然后按关键字段对载荷进行语义解析得到解析文本。如果需要,对数据包进行解压缩和格式转换等操作;例如,如果是HTTP数据包,那么关键字段源自HTTP协议规定的字段,如HOST,GET,Cookie,User-Agent,Origin等,观察初步解析得到的解析文本是否可以进一步解析,如解析文本POST/players/8/?EIO=3&transport=polling&t=MEvFx_q&sid=K7wWaFvPa4s3dYLAACpk HTTP/1.1就可以进一步通过关键字transport,t,sid等进行分解,将原文本分解成如下6个短文本:(1)POST/players/8/;(2)EIO=3;(3)transport=polling;(4)t=MEvFx_q;(5)sid=K7wWaFvPa4s3dYLAACpk;(6)HTTP/1.1;
模块13:将模块11和模块12得到的大量解析文本进行分类,分类规则可以是但不限于网页URL或有效载荷关键字;训练学习每一类解析文本,得到解析文本的值域和结构特征。值域包括文本字符类型范围,文本长度范围;结构特征是指解析文本的构造形式;
如对于访问同一URL中含有关键字sid的文本可以分为一类,关键字sid的文本内容由数字和字母组成,且长度固定为20。
模块14:聚集模块13得到的所有解析文本的值域和结构特征形成文本特征库,再将文本特征库中的文本特征逐条转换成文本模式,形成文本模式库。
例如,正则表达式便是一种文本模式,如sid的文本特征用正则表达式可以表示成sid=[-0-9a-zA-Z_]{20}。
模块20:数据包解析模块,用于对数据包进行解析及过滤,此模块处理由网卡收到的数据包,先根据数据包包头过滤没有载荷的数据包,然后对有载荷的数据包进行语义解析,如果需要进行格式转换、解压缩等操作,得到能被文本模式库识别的文本。模块20进一步包括模块21、22。
模块21:解析数据包包头,根据数据包包头长度字段过滤无载荷数据包,判断方法同模块11,得到有载荷数据包;
模块22:提取模块21得到的数据包载荷,然后按关键字段对载荷进行语义解析得到解析文本。如果需要,对数据包进行解压缩和格式转换等操作;
模块30:数据包过滤模块,此模块将模块10得到的文本模式库与模块20得到的解析文本进行匹配,匹配范围包括文本字符类型,文本长度,文本结构等,然后检查匹配结果,都正确匹配则通过网卡转发该数据包,否则拦截数据包,并写入拦截日志,同时采用计数器对拦截数据包累积计数,当累计拦截数据包个数超过设定的阈值则触发预警提示管理员查看拦截日志,并重置计数器。模块30进一步包括模块31、32。
模块31:将模块10得到的文本模式库与模块20得到的所有解析文本进行匹配,匹配范围包括文本字符类型、文本长度、文本结构等;
模块32:检查模块31的匹配结果,如解析文本与文本模式库都匹配则转发该数据包,并回到模块21检查下一个数据包,如未完全匹配则拦截数据包,并将拦截数据包记录在拦截日志中,使用计数器对拦截数据包进行计数,如计数器超过设定的阈值可触发预警提示管理员查看拦截日志,然后重置计数器,再回到模块21开始检查下一个数据包。
模块40:拦截包后台检查模块,用于对拦截数据包进行复查,防止正常访问网站数据包被模块32拦截,根据数据包载荷的字段内容进行检查,如有正常数据包被拦截,则将数据包的文本特征追加到模块12中得到的文本模式库。模块40进一步包括模块41。
模块41:管理员查看拦截日志,检查拦截数据包载荷内容。如果属于正常数据包被拦截,则将数据包送到模块12处理,将数据包的文本特征追加到文本模式库。触发查看拦截日志可有两种方式,一是模块32中的计数器触发,二是每隔固定时间查看,间隔时间粒度可以根据错误拦截量调整。
需要说明的是,上述实施例中介绍的各个步骤并非都是必须的,本领域技术人员可以根据实际需要进行适当的取舍、替换、修改等。
最后所应说明的是,以上实施例仅用以说明本发明的技术方案而非限制。尽管上文参照实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,对本发明的技术方案进行修改或者等同替换,都不脱离本发明技术方案的精神和范围,其均应涵盖在本发明的权利要求范围当中。

Claims (10)

  1. 一种基于白名单的内容锁防火墙方法,包括:
    步骤200:对网站收到的数据包载荷进行语义解析,获得收到的数据包的解析文本;
    步骤300:将所述网站收到的数据包的解析文本与文本模式库进行匹配,以决定对收到的数据包进行转发还是拦截,所述文本模式库包含多个关键字的值域和结构特征。
  2. 根据权利要求1所述的方法,其中,所述文本模式库通过下述方式生成:
    根据对网站已有的正常流量的数据包载荷进行语义解析,通过对解析的文本训练学习,获得多个关键字的值域和结构特征。
  3. 根据权利要求1所述的方法,其中,所述文本模式库也可通过编写设定生成。
  4. 根据权利要求1所述的方法,其中步骤200还包括:对网站收到的数据包载荷进行语义解析,获得收到的数据包的解析文本的值域和结构特征;以及
    步骤300进一步包括:
    对收到数据包的所有解析文本的值域和结构特征与文本模式库进行逐一匹配,如果都正确匹配,则匹配成功,否则匹配失败;
    如果匹配成功,则转发收到的数据包;
    如果匹配失败,则拦截收到的数据包,并写入拦截日志。
  5. 根据权利要求2所述的方法,其中,所述文本模式库的生成包括:
    步骤110对正常流量进行预处理;
    步骤120按照关键字对载荷进行解析获得解析文本;
    步骤130根据特定的规则对得到的大量解析文本进行分类,分类规则包括网页URL或有效载荷关键字,训练学习每一类解析文本,得到解析文本的值域和结构特征;
    步骤140聚合所有解析文本的值域和结构特征形成文本特征库,再将文本特征库中的文本特征逐条转换成文本模式,形成文本模式库。
  6. 根据权利要求4所述的方法,还包括步骤400:
    对拦截日志进行复查;
    其中,采用计数器触发或者固定时间对拦截日志进行复查。
  7. 根据权利要求6所述的方法,步骤400还包括:
    对于属于正常访问但被拦截的数据包,则对被拦截的数据包载荷进行解析,获得多个关键字的值域和结构特征,将得到的文本特征追加到所述文本模式库。
  8. 根据权利要求6所述的方法,其中使用计数器触发包括:
    使用计数器对拦截包个数累积计数,当计数器超过设置的阈值时,触发警告提醒查看拦截日志,然后重置计数器。
  9. 一种计算机可读存储介质,其中存储有一个或者多个计算机程序,所述计算机程序在被执行时用于实现如权利要求1-8任意一项所述的方法。
  10. 一种计算系统,包括:
    存储装置、以及一个或者多个处理器;
    其中,所述存储装置用于存储一个或者多个计算机程序,所述计算机程序在被所述处理器执行时用于实现如权利要求1-8任意一项所述的方法。
PCT/CN2021/098312 2020-06-29 2021-06-04 一种基于白名单的内容锁防火墙方法及系统 WO2022001577A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010606191.1A CN111770097B (zh) 2020-06-29 2020-06-29 一种基于白名单的内容锁防火墙方法及系统
CN202010606191.1 2020-06-29

Publications (1)

Publication Number Publication Date
WO2022001577A1 true WO2022001577A1 (zh) 2022-01-06

Family

ID=72722960

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/098312 WO2022001577A1 (zh) 2020-06-29 2021-06-04 一种基于白名单的内容锁防火墙方法及系统

Country Status (2)

Country Link
CN (1) CN111770097B (zh)
WO (1) WO2022001577A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11621974B2 (en) * 2019-05-14 2023-04-04 Tenable, Inc. Managing supersedence of solutions for security issues among assets of an enterprise network
CN111770097B (zh) * 2020-06-29 2021-04-23 中国科学院计算技术研究所 一种基于白名单的内容锁防火墙方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9973507B2 (en) * 2016-02-10 2018-05-15 Extreme Networks, Inc. Captive portal having dynamic context-based whitelisting
CN109347846A (zh) * 2018-10-30 2019-02-15 郑州市景安网络科技股份有限公司 一种网站放行方法、装置、设备及可读存储介质
CN110290148A (zh) * 2019-07-16 2019-09-27 深圳乐信软件技术有限公司 一种web防火墙的防御方法、装置、服务器及存储介质
CN110661680A (zh) * 2019-09-11 2020-01-07 深圳市永达电子信息股份有限公司 一种基于正则表达式进行数据流白名单检测的方法及系统
CN111770097A (zh) * 2020-06-29 2020-10-13 中国科学院计算技术研究所 一种基于白名单的内容锁防火墙方法及系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103825900A (zh) * 2014-02-28 2014-05-28 广州云宏信息科技有限公司 网站访问方法及装置、过滤表单下载和更新方法及系统
US10291583B2 (en) * 2016-04-13 2019-05-14 VisualThreat Inc. Vehicle communication system based on controller-area network bus firewall
CN106131090B (zh) * 2016-08-31 2021-11-09 北京力鼎创软科技有限公司 一种web认证下的用户访问网络的方法和系统
CN107622333B (zh) * 2017-11-02 2020-08-18 北京百分点信息科技有限公司 一种事件预测方法、装置及系统
CN108898015B (zh) * 2018-06-26 2021-07-27 暨南大学 基于人工智能的应用层动态入侵检测系统及检测方法
CN110336782A (zh) * 2019-05-09 2019-10-15 苏州乐米信息科技股份有限公司 数据访问安全认证方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9973507B2 (en) * 2016-02-10 2018-05-15 Extreme Networks, Inc. Captive portal having dynamic context-based whitelisting
CN109347846A (zh) * 2018-10-30 2019-02-15 郑州市景安网络科技股份有限公司 一种网站放行方法、装置、设备及可读存储介质
CN110290148A (zh) * 2019-07-16 2019-09-27 深圳乐信软件技术有限公司 一种web防火墙的防御方法、装置、服务器及存储介质
CN110661680A (zh) * 2019-09-11 2020-01-07 深圳市永达电子信息股份有限公司 一种基于正则表达式进行数据流白名单检测的方法及系统
CN111770097A (zh) * 2020-06-29 2020-10-13 中国科学院计算技术研究所 一种基于白名单的内容锁防火墙方法及系统

Also Published As

Publication number Publication date
CN111770097B (zh) 2021-04-23
CN111770097A (zh) 2020-10-13

Similar Documents

Publication Publication Date Title
Valeur et al. A learning-based approach to the detection of SQL attacks
Wang et al. Seeing through network-protocol obfuscation
Robertson et al. Using generalization and characterization techniques in the anomaly-based detection of web attacks
US8051484B2 (en) Method and security system for indentifying and blocking web attacks by enforcing read-only parameters
KR101005927B1 (ko) 웹 어플리케이션 공격 탐지 방법
US20210203678A1 (en) Network security intrusion detection
WO2022001577A1 (zh) 一种基于白名单的内容锁防火墙方法及系统
Alani Big data in cybersecurity: a survey of applications and future trends
Sammour et al. DNS tunneling: A review on features
Bherde et al. Recent attack prevention techniques in web service applications
Raftopoulos et al. Detecting, validating and characterizing computer infections in the wild
Shaheed et al. Web application firewall using machine learning and features engineering
Gupta et al. A survey and classification of XML based attacks on web applications
La et al. Network monitoring using mmt: An application based on the user-agent field in http headers
Abbes et al. Efficient decision tree for protocol analysis in intrusion detection
Zheng et al. A network state based intrusion detection model
Kozik et al. Evolutionary‐based packets classification for anomaly detection in web layer
Gunestas et al. Log analysis using temporal logic and reconstruction approach: web server case
Hsiao et al. Detecting stepping‐stone intrusion using association rule mining
Helmer Intelligent multi-agent system for intrusion detection and countermeasures
Garg Feature-Rich Models and Feature Reduction for Malicious URLs Classification and Prediction
Ji Holistic network defense: Fusing host and network features for attack classification
Erlacher Efficient intrusion detection in high-speed networks.
Sandnes Applying machine learning for detecting exploit kit traffic
Fouda Payload based signature generation for DDoS attacks

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21832429

Country of ref document: EP

Kind code of ref document: A1