WO2013026362A1 - Method and system for monitoring network traffic - Google Patents

Method and system for monitoring network traffic Download PDF

Info

Publication number
WO2013026362A1
WO2013026362A1 PCT/CN2012/080039 CN2012080039W WO2013026362A1 WO 2013026362 A1 WO2013026362 A1 WO 2013026362A1 CN 2012080039 W CN2012080039 W CN 2012080039W WO 2013026362 A1 WO2013026362 A1 WO 2013026362A1
Authority
WO
WIPO (PCT)
Prior art keywords
url
hotspot
unit
protocol
requested
Prior art date
Application number
PCT/CN2012/080039
Other languages
French (fr)
Chinese (zh)
Inventor
陈旭
宋璇
尹咸阳
张仁卓
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2013026362A1 publication Critical patent/WO2013026362A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters

Definitions

  • the present invention relates to the field of the Internet and, more particularly, to monitoring data traffic in the Internet.
  • BACKGROUND With the rapid development of the Internet, the Internet has become the main channel for information dissemination. However, the traditional Internet lacks supervision, and the information of malicious/yellow/personal attacks is rampant. Even cases of terrorist organizations using the Internet to train terrorists and organize terrorist attacks have emerged. In order to cope with this bad situation, the use of technical means to supervise the Internet has become the consensus of governments and operators.
  • the traffic monitoring system came into being in this context. The traffic monitoring system collects traffic information, restores the original information, and performs intelligent analysis based on the characteristics in the original information, so as to timely discover the vulnerabilities in the network and strive to prevent the network attacks from happening.
  • the traditional traffic monitoring method is generally divided into three steps: traffic diversion, protocol reorganization, and background content analysis. These three steps are performed by the corresponding three functional units, as shown in Fig. 1, which are the drainage classification unit 102, the protocol reassembly unit 104, and the background content analysis unit 106, respectively.
  • the traffic classification unit 102 distributes the received data packets to different protocol reorganization units 104 according to the protocol type, and the protocol reorganization unit 104 restores the application layer information (for example, from the Simple Mail Transfer Protocol (SMTP) message.
  • SMTP Simple Mail Transfer Protocol
  • the background content analysis unit 106 is composed of a server cluster, and performs hot spot statistics, Internet information analysis, and the like on the restored application layer information, thereby taking certain measures to suppress network attacks.
  • the traffic classification unit passively drains, and the large-flow data is directly imported into the protocol reorganization unit and the background content analysis unit.
  • the processing costs of the server clusters in the protocol reorganization unit and the back-end content analysis unit increase dramatically.
  • the background content analysis function is completed after the protocol is reorganized, thus repeating a large amount of the same content. Reorganization, resulting in a large demand for the performance of the protocol reorganization unit.
  • an aspect of the present invention provides a method for network traffic monitoring, the method comprising: requesting a number of times of a Uniform / Universal Resource Locator (URL) within a predetermined time Performing statistics to determine a hotspot URL; actively crawling resources corresponding to the hotspot URL; performing protocol reorganization on resources corresponding to the hotspot URL actively crawled; and performing content analysis on data reorganized by the protocol.
  • a Uniform / Universal Resource Locator URL
  • An aspect of the present invention provides a system for network traffic monitoring, the system comprising: a traffic classification unit for classifying traffic packets; and a hotspot statistics unit for using a uniform resource locator URL within a predetermined time The requested number of times is counted to determine a hotspot URL; the active crawling unit is configured to actively capture the resource corresponding to the hotspot URL; and the protocol reorganization unit is configured to perform protocol reorganization on the resource corresponding to the hot spot URL actively crawled And a background content analysis unit for performing content analysis on the data after the protocol reorganization.
  • FIG. 1 is a schematic diagram of a conventional network traffic monitoring system in the prior art.
  • FIG. 2 is a schematic diagram of an embodiment of a network traffic monitoring system according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of another embodiment of a network traffic monitoring system according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an embodiment of a network traffic monitoring method according to an embodiment of the present invention.
  • the system includes: a drainage classification unit 202, a hotspot statistics unit 204, an active crawling unit 206, a protocol reorganization unit 208, and a background content analysis unit 210.
  • the traffic classification unit 202 is configured to perform traffic classification on the data packet.
  • the hotspot statistics unit 204 is configured to perform statistics on the requested number of times of the uniform resource locator URL to determine the hotspot URL within a predetermined time.
  • the active crawling unit 206 For retrieving the resource corresponding to the hotspot URL, the protocol reorganization unit 208 is configured to perform protocol reorganization on the resource corresponding to the hot spot URL that is actively captured; the background content analysis unit 210 is configured to Data for content analysis.
  • the hotspot resource is obtained first, and then the protocol reorganization and background content analysis are performed, so that the same content is processed only once, thereby reducing the burden of the protocol reorganization unit and the background content analysis unit, and providing the entire system. s efficiency.
  • P2P file fragments distributed throughout the system can be actively captured to support monitoring of P2P traffic.
  • FIG. 3 depicts a network traffic monitoring system in accordance with another embodiment of the present invention.
  • the network traffic monitoring system includes:
  • the traffic classification unit 302 is configured to perform traffic classification on the data packet.
  • the hotspot statistics unit 304 is configured to calculate the requested number of times of the uniform resource locator URL within a predetermined time to determine the hotspot URL;
  • the active crawling unit 310 is configured to actively capture resources corresponding to the hotspot URL;
  • the protocol reorganization unit 312 is configured to perform protocol reorganization on resources corresponding to the hot spot URL that is actively captured;
  • the background content analysis unit 314 is configured to perform content analysis on the data after the protocol reorganization.
  • the hotspot statistics unit 304 further includes a hierarchical statistics unit 306 and a determining unit 308.
  • the ranking statistics unit 306 is configured to establish a resource table to hierarchically count the requested times of the URL to determine whether each level of the URL is a hotspot URL.
  • the resource table will store the number of times each level of URL is requested within a predetermined time and a predetermined threshold.
  • the determining unit 308 is configured to determine the URL when the requested number of times of a certain URL exceeds a predetermined threshold within a predetermined time Is a hot URL.
  • the resource corresponding to the hotspot URL may be a webpage or a P2P file fragmentation.
  • Figure 4 shows a flow chart of a method for network traffic monitoring. This method can reduce the burden of the protocol reorganization unit and the background content analysis unit, improve the efficiency of the whole system and reduce the cost. Secondly, for the distributed P2P resources, it can actively capture the P2P file fragments distributed throughout, and support the P2P. Traffic monitoring.
  • the method for monitoring network traffic shown in Figure 4 includes:
  • the traffic classification unit performs traffic classification on the data packet
  • the data packet is subjected to traffic classification according to the protocol type to which the captured data packet belongs. If the protocol type to which the packet belongs is HTTP, only the request header is sent to the hotspot statistics unit.
  • the request header in the HTTP request message includes a request line, and the request line includes a request method, and the request method may be GET or POST.
  • GET is generally used to get/query resource information
  • POST is generally used to update resource information.
  • the client wants to read a document from the server
  • the GET request method is used.
  • the GET request method requires the server to return the resource of the URL location to the client in the data part of the response message.
  • the GET request method is used here.
  • the URL of the request link is also included in the GET request line.
  • the hotspot statistics unit performs statistics on the requested times of the uniform resource locator URL to determine a hotspot URL
  • the requested number of Uniform Resource Locators (URLs) in the HTTP request header is counted within a predetermined time.
  • the scheduled time can usually be set to 10 days.
  • the number of times the URL is requested is sorted from high to low within 10 days, and the URLs that are sorted backwards are periodically cleared.
  • the URL is determined to be a hotspot URL, and the active crawling unit is triggered to perform an active crawling action.
  • the active crawling unit actively captures resources corresponding to the hotspot URL.
  • the active crawling unit actively captures the resource corresponding to the hotspot URL.
  • the resource may be a webpage corresponding to the hotspot URL and other webpages to which the hotspot URL is linked; the resource corresponding to the hotspot URL may also be a file fragment distributed on different nodes in a peer-to-peer network (P2P).
  • P2P peer-to-peer network
  • the protocol reorganization unit performs protocol reorganization on resources corresponding to the hot spot URL that is actively captured.
  • the background content analysis unit performs content analysis on the data reorganized by the protocol.
  • Internet sensation refers to the hot spot that the public is most concerned about in real life. These are high The issues of concern are mainly spread through forums, blogs, microblogs and other means. Due to the rapid spread of the network, after some hot issues occur, it will be out of control in a short period of time. By monitoring the network public opinion, we can respond to public emergencies in the network and fully grasp the social conditions and public opinion.
  • the hotspot statistics unit determines the hotspot URL by counting the requested number of URLs in the HTTP/GET request within a predetermined time, and then actively crawling the webpage corresponding to the hotspot URL and other links thereof.
  • the webpage can achieve the purpose of public opinion monitoring.
  • the hotspot statistics unit makes a record every time a HTTP/GET message is received within a predetermined time.
  • the URL can be hierarchically counted in the form of a resource table.
  • the depth of the statistics is determined according to the requirements of the monitoring. It will be understood by those skilled in the art that each divisor (/) in the URL is divided into one level. For example, for a URL of www.xxx.com/sport/football/fifa2012/index.html, you can set the stats depth to 3. The first level is 3 ⁇ 4 www.xxx.com; the second level is www.xxx.com/sport; the third level is www.xxx.com/sport/football.
  • the statistics and predetermined thresholds are stored in the resource table.
  • the setting of the threshold is usually referred to the empirical value. If the experience value is set too low, it will cause a large amount of content to be cached locally. If the setting is too high, it will lead to the omission of some hotspot information.
  • the experience value can be reasonably set according to the definition of the monitoring hotspot and the storage capacity of the system.
  • the setting of the predetermined threshold can be related to the system used by the customer. For example, in China's national dry network, the threshold can be set to tens of thousands; in the provincial and municipal export networks, it can be set to several thousand. Table 1 below shows a schematic resource table for statistics on hotspot URLs: Table 1
  • the URL is determined to be a hotspot URL.
  • the resource table may be stored on the data file in a hash table manner, and the index of the resource table is stored in the memory. The hash value is found according to the URL, and the index is found by the hash value, and the data file is directly located according to the index pointer.
  • the active crawling unit actively captures the webpage corresponding to the hotspot URL and other webpages to which the hotspot URL is linked.
  • the A webpage is a hot webpage
  • the A webpage contains a link to the B webpage
  • the B webpage contains a link to the C webpage.
  • the A, B, and C pages are actively captured locally.
  • the specific excavation depth in the actual application is set manually. Under normal circumstances, the excavation depth is 5 levels to complete the monitoring.
  • the active crawl unit sends an HTTP/GET request to www.xxx.com, which usually returns directly to Index.html.
  • the Index webpage represents a homepage, and the content of the webpages at all levels is crawled step by step from the homepage. Deep crawling uses recursive crawling of all encountered hyperlinks until the recursion reaches the required crawl level.
  • Breadth crawling is to retrieve all the hyperlinks of a web page, send HTTP requests to fetch all the content, and then drill down to the required crawl level.
  • the captured resources are reorganized by the protocol for analysis in the background, and data such as independent IP (Internet Protocol, IP protocol, IP) address traffic, website page traffic, independent user traffic, new user traffic, etc. can be learned, thereby achieving sensational monitor.
  • independent IP Internet Protocol, IP protocol, IP
  • P2P Peer-To-Peer
  • a P2P network can be simply defined as a direct exchange to enable resource sharing between different systems.
  • computers connected via the Internet are treated as equal participants, their status is equal to each other, and each node participating in the communication is referred to as a Peer.
  • P2P mode the boundary between the server and the client is canceled. Since data storage, processing, and network bandwidth are all operated in a completely decentralized, asynchronous manner, the various loads can be perfectly balanced.
  • the P2P application mode is characterized by the more people downloading, the wider the bandwidth provided, the more seeds will be available, and the download speed will be faster and faster.
  • the P2P node downloads the required seed file through the browser to the website, and then obtains and connects to the address of the Tracker server. After the connection is successful, the Track server returns other nodes (neighbor nodes) that are downloading the same resource file. Information. After the requesting node obtains the information, it sends a message to these neighboring nodes to establish a connection and download the resources, thereby realizing sharing resources and services between the peer nodes in the network.
  • the seed file is the "index" of the downloaded file, and the index information and the Hash verification code of each block of the downloaded file are written into the seed file.
  • the Tracker server is the server that collects the downloaders and provides this information to other downloaders, allowing the downloaders to connect to each other to transfer data.
  • the downloader needs to download the file content, first need to get the corresponding seed file, and then parse the seed.
  • the file gets the address of the Tracker server and connects to the Tracker server.
  • the downloader obtains the IP address of other downloaders (neighbor nodes) from the response message of the Tracker server, and connects other downloaders to complete the sharing of data and resources.
  • the file to be downloaded is divided into several file fragments, which are respectively stored in different nodes, and the Tracker server can know the IP addresses of different nodes stored in each file fragment.
  • the communication between the node and the Tracker server is based on the HTTP protocol. That is, the node connected to the Tracker server needs to first send an HTTP/GET request to the Tracker server.
  • the URL contained in the request is the address of the Tracker server recorded in the seed file.
  • the hotspot statistics unit counts the number of times the P2P node sends the requested number of URLs in the HTTP/GET request to the Tracker server within a predetermined time.
  • the URL is determined to be a hotspot URL when the number of requests for a URL within a predetermined time exceeds a predetermined threshold.
  • the active crawling module requests the tracker corresponding to the hotspot URL to download the IP address of the node stored in each file fragment of the file, and then obtains different file fragments from different nodes, and reassembles the fragments into original content.
  • background content analysis unit for analysis. It can be understood that the active crawling unit here is similar to a P2P node.
  • exemplary logic blocks, units, circuits, components and/or components set forth in connection with the embodiments disclosed herein may be implemented by a general purpose processor, a digital signal processing (DSP), or an application specific integrated circuit (Application Specific Integrated) Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic component, discrete gate or transistor logic, discrete hardware component, or any combination thereof designed to perform the functions described herein.
  • DSP digital signal processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • a general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.
  • the processor may also be implemented as a combination of computing components, such as a combination of a DSP and a microprocessor, a plurality of microprocessors, a combination of one or more microprocessors and a DSP core, or any other such configuration.
  • the embodiment of the present invention counts the requested number of times of the Uniform Resource Locator URL to determine the hotspot URL within a predetermined time, and then actively crawls the resource corresponding to the hotspot URL for protocol reorganization and content analysis. Therefore, the burden of the protocol reorganization unit and the background content analysis unit can be reduced.
  • the technical solution of the embodiment of the present invention can actively capture P2P file fragments distributed throughout the distributed P2P resources to support monitoring of P2P traffic.

Abstract

The embodiments of the present invention relate to a method and system for monitoring network traffic. The method of the embodiments of the present invention includes: counting the number of times a uniform resource locator (URL) is requested within a pre-determined time period to determine a hotspot URL; actively capturing the resources corresponding to the hotspot URL; performing protocol regrouping on the actively captured resources corresponding to the hotspot URL; and analyzing the contents of the protocol regrouped data. Also provided is a system for monitoring network traffic. The embodiments of the present invention can effectively reduce the overhead of protocol regrouping and background content analysis, improve the efficiency of the entire system, and lower the system costs; additionally, for distributed P2P resources, intelligent regrouping can be performed, and P2P monitoring is also supported.

Description

用于网络流量监控的方法和系统 本申请要求于 2011年 08月 22日提交中国专利局、 申请号为 201110241618.3、 发 明名称为 "用于网络流量监控的方法和系统"的中国专利申请的优先权, 其全部内容通 过引用结合在本申请中。  METHOD AND SYSTEM FOR NETWORK TRAFFIC MONITORING [0001] This application claims priority to Chinese Patent Application, filed on Aug. 22, 2011, to the Chinese Patent Office, Application No. 201110241618.3, entitled "Method and System for Network Traffic Monitoring" The entire contents of which are incorporated herein by reference.
技术领域 本发明涉及互联网(Internet)领域, 且更具体而言,涉及互联网中数据流量的监控。 背景技术 随着 Internet的快速发展, 互联网成为信息传播承载的主要途径。 然而, 传统互联 网缺乏监管, 恶意 /黄色 /人身攻击的信息泛滥, 甚至已经出现恐怖组织利用互联网培养 恐怖分子、 组织恐怖袭击的案例。 为了应对这种不良情态, 采用技术手段对互联网进行 监管已成为各国政府和运营商的共识。 流量监控系统在这样的背景下应运而生。 流量监 控系统采集流量信息, 还原原始信息, 并根据原始信息中的特征进行智能分析, 从而及 时发现网络中存在的漏洞, 并力求对网络攻击做到防患于未然。 TECHNICAL FIELD The present invention relates to the field of the Internet and, more particularly, to monitoring data traffic in the Internet. BACKGROUND With the rapid development of the Internet, the Internet has become the main channel for information dissemination. However, the traditional Internet lacks supervision, and the information of malicious/yellow/personal attacks is rampant. Even cases of terrorist organizations using the Internet to train terrorists and organize terrorist attacks have emerged. In order to cope with this bad situation, the use of technical means to supervise the Internet has become the consensus of governments and operators. The traffic monitoring system came into being in this context. The traffic monitoring system collects traffic information, restores the original information, and performs intelligent analysis based on the characteristics in the original information, so as to timely discover the vulnerabilities in the network and strive to prevent the network attacks from happening.
传统的流量监控方法一般分为三个步骤: 流量引流、 协议重组及后台内容分析。 这 三个步骤由对应的三种功能单元完成, 如图 1所示, 分别为引流分类单元 102、 协议重 组单元 104及后台内容分析单元 106。 其中, 引流分类单元 102按照协议类型将收到的 数据包分发给不同的协议重组单元 104, 协议重组单元 104还原应用层信息 (例如, 从 简单邮件传输协议 (Simple Mail Transfer Protocol, SMTP) 报文还原电子邮件 (Email) 信息、 从超文本传输协议 (HyperText Transfer Protocol, HTTP)报文还原网页的超文本 标记语言 (Hypertext Markup Language, HTML), 然后, 将还原的应用层信息连同时间 标签, 链路信息等发送至后台内容分析单元 106进行分析。 后台内容分析单元 106由服 务器集群组成, 对还原的应用层信息进行热点统计、 互联网信息分析等, 从而采取一定 的措施抑制网络攻击。  The traditional traffic monitoring method is generally divided into three steps: traffic diversion, protocol reorganization, and background content analysis. These three steps are performed by the corresponding three functional units, as shown in Fig. 1, which are the drainage classification unit 102, the protocol reassembly unit 104, and the background content analysis unit 106, respectively. The traffic classification unit 102 distributes the received data packets to different protocol reorganization units 104 according to the protocol type, and the protocol reorganization unit 104 restores the application layer information (for example, from the Simple Mail Transfer Protocol (SMTP) message. Restore email (Email) information, restore Hypertext Markup Language (HTML) from the HyperText Transfer Protocol (HTTP) message, and then restore the restored application layer information along with the time label, chain The road information and the like are sent to the background content analysis unit 106 for analysis. The background content analysis unit 106 is composed of a server cluster, and performs hot spot statistics, Internet information analysis, and the like on the restored application layer information, thereby taking certain measures to suppress network attacks.
但是, 在上述现有技术中, 引流分类单元被动引流, 大流量的数据直接被导入到协 议重组单元和后台内容分析单元。 随着流量的增长, 协议重组单元和后台内容分析单元 中的服务器集群的处理成本会大幅上升。  However, in the above prior art, the traffic classification unit passively drains, and the large-flow data is directly imported into the protocol reorganization unit and the background content analysis unit. As traffic increases, the processing costs of the server clusters in the protocol reorganization unit and the back-end content analysis unit increase dramatically.
其次, 后台内容分析功能是在协议重组后完成的, 从而对大量相同的内容进行重复 重组, 导致对协议重组单元的性能需求很大。 Second, the background content analysis function is completed after the protocol is reorganized, thus repeating a large amount of the same content. Reorganization, resulting in a large demand for the performance of the protocol reorganization unit.
此夕卜, 上述传统的流量监控方法由于只能获得部分的对等网络(Peer-To-Peer, 英文 简称为 P2P)文件分片而无法实现智能链接其它的 P2P文件分片, 从而不能处理对等网 络的流量监控。 发明内容 有鉴于此, 本发明的一方面提供一种用于网络流量监控的方法, 所述方法包括: 在 预定时间内对统一资源定位符 URL (Uniform / Universal Resource Locator, URL) 的被 请求次数进行统计以确定热点 URL; 主动抓取所述热点 URL对应的资源; 对主动抓取 的所述热点 URL对应的资源进行协议重组; 及对经协议重组的数据进行内容分析。  In addition, the above-mentioned conventional traffic monitoring method cannot obtain partial peer-to-peer (Peer-To-Peer, P2P) file fragmentation, and cannot implement smart link other P2P file fragmentation, thereby failing to process the pair. Network traffic monitoring. SUMMARY OF THE INVENTION In view of this, an aspect of the present invention provides a method for network traffic monitoring, the method comprising: requesting a number of times of a Uniform / Universal Resource Locator (URL) within a predetermined time Performing statistics to determine a hotspot URL; actively crawling resources corresponding to the hotspot URL; performing protocol reorganization on resources corresponding to the hotspot URL actively crawled; and performing content analysis on data reorganized by the protocol.
本发明的一方面提供一种用于网络流量监控的系统,所述系统包括:引流分类单元, 用于对数据包进行引流分类;热点统计单元,用于在预定时间内对统一资源定位符 URL 的被请求次数进行统计以确定热点 URL; 主动抓取单元, 用于主动抓取所述热点 URL 对应的资源; 协议重组单元, 用于对主动抓取的所述热点 URL对应的资源进行协议重 组; 及后台内容分析单元, 用于对经协议重组后的数据进行内容分析。  An aspect of the present invention provides a system for network traffic monitoring, the system comprising: a traffic classification unit for classifying traffic packets; and a hotspot statistics unit for using a uniform resource locator URL within a predetermined time The requested number of times is counted to determine a hotspot URL; the active crawling unit is configured to actively capture the resource corresponding to the hotspot URL; and the protocol reorganization unit is configured to perform protocol reorganization on the resource corresponding to the hot spot URL actively crawled And a background content analysis unit for performing content analysis on the data after the protocol reorganization.
本发明实施例的技术方案在预定时间内对统一资源定位符 URL的被请求次数进行 统计以确定热点 URL, 然后主动抓取所述热点 URL对应的资源进行协议重组和内容分 析。 因此, 可以减少协议重组和后台内容分析的负担。 此外, 本发明实施例的技术方案 对于分布式 P2P资源, 可主动抓取分布在各处的 P2P文件分片, 以支持对 P2P流量的 监控。 附图说明 图 1为现有技术中传统的网络流量监控系统的示意图。  The technical solution of the embodiment of the present invention counts the requested number of times of the uniform resource locator URL to determine the hotspot URL, and then actively captures the resource corresponding to the hotspot URL for protocol reorganization and content analysis. Therefore, the burden of protocol reorganization and background content analysis can be reduced. In addition, the technical solution of the embodiment of the present invention can actively capture P2P file fragments distributed throughout the distributed P2P resources to support monitoring of P2P traffic. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic diagram of a conventional network traffic monitoring system in the prior art.
图 2为本发明实施例中网络流量监控系统的一个实施例示意图。  FIG. 2 is a schematic diagram of an embodiment of a network traffic monitoring system according to an embodiment of the present invention.
图 3为本发明实施例中网络流量监控系统的另一个实施例示意图。  FIG. 3 is a schematic diagram of another embodiment of a network traffic monitoring system according to an embodiment of the present invention.
图 4为本发明实施例中网络流量监控方法的一个实施例示意图。  FIG. 4 is a schematic diagram of an embodiment of a network traffic monitoring method according to an embodiment of the present invention.
结合附图阅读时将更好地了解以上发明内容以及以下本发明的某些实施例的详细 描述。 出于说明本发明的目的, 在图中展示某些实施例。 然而, 应了解, 本发明不限于 附图中所展示的布置和手段。 具体实施方式 下文结合附图所阐述的详细说明意在说明本发明的各种实施例, 而非代表本发明仅 可实施为这些实施例。 详细说明包括具体细节, 以便达成对本发明的透彻了解。 然而, 所属领域的技术人员应了解,本发明的实施也可以不使用这些具体细节。在某些实例中, 以方块图的形式显示各众所周知的结构及组件, 以免淡化对本发明的说明。 The above summary of the invention, as well as the following detailed description of certain embodiments of the invention Certain embodiments are shown in the drawings for purposes of illustrating the invention. However, it should be understood that the invention is not to be construed as limited The detailed description set forth below with reference to the drawings is intended to illustrate various embodiments of the invention, The detailed description includes specific details in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the present invention may be practiced without the specific details. In some instances, well-known structures and components are shown in the form of a block diagram in order to avoid a description of the invention.
图 2描绘的是根据本发明一实施例的网络流量监控系统。 该系统包括: 引流分类单 元 202、热点统计单元 204、主动抓取单元 206、协议重组单元 208及后台内容分析单元 210。 其中, 引流分类单元 202, 用于对数据包进行引流分类; 热点统计单元 204, 用于 在预定时间内对统一资源定位符 URL的被请求次数进行统计以确定热点 URL; 主动抓 取单元 206, 用于主动抓取所述热点 URL对应的资源; 协议重组单元 208, 用于对主动 抓取的所述热点 URL对应的资源进行协议重组;后台内容分析单元 210,用于对经协议 重组后的数据进行内容分析。 在该网络流量监控系统中, 先获得热点资源, 再进行协议 重组和后台内容分析, 使得对同样的内容仅进行一次处理, 从而减少了协议重组单元和 后台内容分析单元的负担, 提供了整个系统的效率。 此外, 对于分布式 P2P资源, 可主 动抓取分布在各处的 P2P文件分片, 以支持对 P2P流量的监控。  2 depicts a network traffic monitoring system in accordance with an embodiment of the present invention. The system includes: a drainage classification unit 202, a hotspot statistics unit 204, an active crawling unit 206, a protocol reorganization unit 208, and a background content analysis unit 210. The traffic classification unit 202 is configured to perform traffic classification on the data packet. The hotspot statistics unit 204 is configured to perform statistics on the requested number of times of the uniform resource locator URL to determine the hotspot URL within a predetermined time. The active crawling unit 206, For retrieving the resource corresponding to the hotspot URL, the protocol reorganization unit 208 is configured to perform protocol reorganization on the resource corresponding to the hot spot URL that is actively captured; the background content analysis unit 210 is configured to Data for content analysis. In the network traffic monitoring system, the hotspot resource is obtained first, and then the protocol reorganization and background content analysis are performed, so that the same content is processed only once, thereby reducing the burden of the protocol reorganization unit and the background content analysis unit, and providing the entire system. s efficiency. In addition, for distributed P2P resources, P2P file fragments distributed throughout the system can be actively captured to support monitoring of P2P traffic.
图 3描绘的是根据本发明的另一实施例的网络流量监控系统。该网络流量监控系统 包括:  FIG. 3 depicts a network traffic monitoring system in accordance with another embodiment of the present invention. The network traffic monitoring system includes:
引流分类单元 302, 用于对数据包进行引流分类;  The traffic classification unit 302 is configured to perform traffic classification on the data packet.
热点统计单元 304,用于在预定时间内对统一资源定位符 URL的被请求次数进行统 计以确定热点 URL;  The hotspot statistics unit 304 is configured to calculate the requested number of times of the uniform resource locator URL within a predetermined time to determine the hotspot URL;
主动抓取单元 310, 用于主动抓取所述热点 URL对应的资源;  The active crawling unit 310 is configured to actively capture resources corresponding to the hotspot URL;
协议重组单元 312,用于对主动抓取的所述热点 URL对应的资源进行协议重组; 以 及  The protocol reorganization unit 312 is configured to perform protocol reorganization on resources corresponding to the hot spot URL that is actively captured; and
后台内容分析单元 314, 用于对经协议重组后的数据进行内容分析。  The background content analysis unit 314 is configured to perform content analysis on the data after the protocol reorganization.
其中, 热点统计单元 304进一步包括分级统计单元 306和判断单元 308。 其中, 分 级统计单元 306, 用于建立资源表对所述 URL分级统计被请求次数以确定每级 URL是 否为热点 URL。 资源表将存储每级 URL在预定时间内被请求的次数和预定的阈值。 判 断单元 308,用于当预定时间内某一 URL的被请求次数超过预定的阈值时,确定该 URL 为热点 URL。 热点 URL对应的资源可以是网页, 也可以是 P2P文件分片。 The hotspot statistics unit 304 further includes a hierarchical statistics unit 306 and a determining unit 308. The ranking statistics unit 306 is configured to establish a resource table to hierarchically count the requested times of the URL to determine whether each level of the URL is a hotspot URL. The resource table will store the number of times each level of URL is requested within a predetermined time and a predetermined threshold. The determining unit 308 is configured to determine the URL when the requested number of times of a certain URL exceeds a predetermined threshold within a predetermined time Is a hot URL. The resource corresponding to the hotspot URL may be a webpage or a P2P file fragmentation.
图 4展示了一种用于网络流量监控的方法流程图。这种方法可减少协议重组单元和 后台内容分析单元的负担, 提高整个系统的效率并降低成本; 其次, 对于分布式 P2P资 源, 可主动抓取分布在各处的 P2P文件分片, 支持对 P2P流量的监控。  Figure 4 shows a flow chart of a method for network traffic monitoring. This method can reduce the burden of the protocol reorganization unit and the background content analysis unit, improve the efficiency of the whole system and reduce the cost. Secondly, for the distributed P2P resources, it can actively capture the P2P file fragments distributed throughout, and support the P2P. Traffic monitoring.
图 4所示的网络流量监控的方法包括:  The method for monitoring network traffic shown in Figure 4 includes:
402: 引流分类单元对数据包进行引流分类;  402: The traffic classification unit performs traffic classification on the data packet;
根据本实施例, 根据所捕获的数据包所属的协议类型对数据包进行引流分类。 若数据包所属的协议类型是 HTTP, 则只将请求头发送给热点统计单元。 在 HTTP 建立请求的过程中, HTTP请求报文中的请求头包含请求行, 请求行包含请求方法, 请 求方法可以是 GET或 POST。 GET—般用于获取 /查询资源信息, 而 POST—般用于更 新资源信息。 当客户端要从服务器中读取文档时, 使用 GET请求方法。 GET请求方法 要求服务器将 URL定位的资源放在响应报文的数据部分回送给客户端。 此处采用 GET 请求方法。 GET请求行中还包括请求链接的 URL。  According to this embodiment, the data packet is subjected to traffic classification according to the protocol type to which the captured data packet belongs. If the protocol type to which the packet belongs is HTTP, only the request header is sent to the hotspot statistics unit. In the process of establishing a request by HTTP, the request header in the HTTP request message includes a request line, and the request line includes a request method, and the request method may be GET or POST. GET is generally used to get/query resource information, while POST is generally used to update resource information. When the client wants to read a document from the server, the GET request method is used. The GET request method requires the server to return the resource of the URL location to the client in the data part of the response message. The GET request method is used here. The URL of the request link is also included in the GET request line.
404: 热点统计单元对统一资源定位符 URL 的被请求次数进行统计以确定热点 URL;  404: The hotspot statistics unit performs statistics on the requested times of the uniform resource locator URL to determine a hotspot URL;
可选地, 当请求方法为 GET时, 在预定时间内对 HTTP请求头中的统一资源定位 符 (URL) 的被请求次数进行统计。 通常可将预定时间设为 10天。 在 10天内按 URL 被请求的次数从高到低排序, 定期清除排序靠后的 URL。 当预定时间内某一 URL被请 求的次数超过预定阈值时, 则确定该 URL为热点 URL, 触发主动抓取单元执行主动抓 取动作。  Optionally, when the request method is GET, the requested number of Uniform Resource Locators (URLs) in the HTTP request header is counted within a predetermined time. The scheduled time can usually be set to 10 days. The number of times the URL is requested is sorted from high to low within 10 days, and the URLs that are sorted backwards are periodically cleared. When the number of times a certain URL is requested exceeds a predetermined threshold within a predetermined time, the URL is determined to be a hotspot URL, and the active crawling unit is triggered to perform an active crawling action.
406: 主动抓取单元主动抓取所述热点 URL对应的资源;  406: The active crawling unit actively captures resources corresponding to the hotspot URL.
在确定了热点 URL后, 主动抓取单元主动抓取热点 URL对应的资源。 该资源可以 是热点 URL对应的网页以及其链接到的其它网页; 该热点 URL对应的资源还可以是分 布在对等网络 (P2P) 中不同节点上的文件分片。  After the hotspot URL is determined, the active crawling unit actively captures the resource corresponding to the hotspot URL. The resource may be a webpage corresponding to the hotspot URL and other webpages to which the hotspot URL is linked; the resource corresponding to the hotspot URL may also be a file fragment distributed on different nodes in a peer-to-peer network (P2P).
408: 协议重组单元对主动抓取的所述热点 URL对应的资源进行协议重组; 408: The protocol reorganization unit performs protocol reorganization on resources corresponding to the hot spot URL that is actively captured.
410: 后台内容分析单元对经协议重组的数据进行内容分析。 410: The background content analysis unit performs content analysis on the data reorganized by the protocol.
为便于理解, 下面介绍两个具体的应用场景。  For ease of understanding, the following describes two specific application scenarios.
网络舆情监控  Internet public opinion monitoring
网络舆情指的是网络里产生的公众对现实生活里最关心的热点焦点问题。这些被高 度关注的问题主要通过论坛、 博客、 微博等途径得以传播。 由于网络的快速传播性, 一 些热点问题发生后, 在很短的时间里就会一发不可收拾。 对网络舆情进行监控, 可以及 时应对网络突发的公共事件和全面掌握社情民意。 Internet sensation refers to the hot spot that the public is most concerned about in real life. These are high The issues of concern are mainly spread through forums, blogs, microblogs and other means. Due to the rapid spread of the network, after some hot issues occur, it will be out of control in a short period of time. By monitoring the network public opinion, we can respond to public emergencies in the network and fully grasp the social conditions and public opinion.
在本应用场景中,热点统计单元通过在预定时间内对统计 HTTP/GET请求中的 URL 的被请求次数来确定热点 URL, 然后主动抓取单元抓取该热点 URL对应的网页及其链 接的其他网页, 可以达到舆情监控的目的。  In this application scenario, the hotspot statistics unit determines the hotspot URL by counting the requested number of URLs in the HTTP/GET request within a predetermined time, and then actively crawling the webpage corresponding to the hotspot URL and other links thereof. The webpage can achieve the purpose of public opinion monitoring.
在一些实施例中,热点统计单元在预定时间内每收到一次 HTTP/GET报文记做一次 记录。 可以采用资源表的形式对 URL进行分级统计。 统计的深度根据监控的要求来确 定。 本领域的技术人员可以理解的是, URL 中每个除号 (/)划分一个级别。 比如, 对于 www.xxx.com/sport/football/fifa2012/index.html的 URL, 可以将统计深度设为 3。 第一级 ¾ www.xxx.com; 第二级为 www.xxx.com/sport; 第 3级为 www.xxx.com/sport/football。 统 计所得的数据和预定阈值都存储在资源表中。  In some embodiments, the hotspot statistics unit makes a record every time a HTTP/GET message is received within a predetermined time. The URL can be hierarchically counted in the form of a resource table. The depth of the statistics is determined according to the requirements of the monitoring. It will be understood by those skilled in the art that each divisor (/) in the URL is divided into one level. For example, for a URL of www.xxx.com/sport/football/fifa2012/index.html, you can set the stats depth to 3. The first level is 3⁄4 www.xxx.com; the second level is www.xxx.com/sport; the third level is www.xxx.com/sport/football. The statistics and predetermined thresholds are stored in the resource table.
需要说明的是, 阈值的设置通常参考经验值。 如果将经验值设置过低, 则会导致大 量内容缓存在本地, 设置过高又会导致部分热点信息的漏报。 经验值可根据对监控热点 的定义、 系统的存储容量进行合理设置。 预定阈值的设置可与客户所用的系统相关。 例 如, 在中国国干网, 阈值可设为几万; 在省市出口网, 则可以设为几千。 下表 1展示对 热点 URL进行统计的示意资源表: 表 1  It should be noted that the setting of the threshold is usually referred to the empirical value. If the experience value is set too low, it will cause a large amount of content to be cached locally. If the setting is too high, it will lead to the omission of some hotspot information. The experience value can be reasonably set according to the definition of the monitoring hotspot and the storage capacity of the system. The setting of the predetermined threshold can be related to the system used by the customer. For example, in China's national dry network, the threshold can be set to tens of thousands; in the provincial and municipal export networks, it can be set to several thousand. Table 1 below shows a schematic resource table for statistics on hotspot URLs: Table 1
Figure imgf000007_0001
其中, 在预定时间内, www.xxx.com的请求次数 10000超过了阈值 8000, 则确定该 URL为热点 URL。 在一些实施例中, 可以采用哈希表的方式将资源表存储在数据文件 上, 资源表的索引存储在内存中。 根据 URL找到散列值, 再由散列值找到索引, 直接 根据索引指针定位到数据文件。 在热点统计单元确定热点 URL后, 主动抓取单元主动抓取热点 URL对应的网页以 及其链接到的其它网页。若 A网页是热点网页, A网页包含到 B网页的链接, B网页包 含到 C网页的链接。 在挖掘深度为 3的情况下, A, B, C网页都被主动抓取到本地。 实际应用中具体的挖掘深度由手工设置,在通常情况下挖掘深度为 5级可以完成监控的 需要。
Figure imgf000007_0001
Wherein, within a predetermined time, the number of requests 10000 of www.xxx.com exceeds the threshold 8000, and the URL is determined to be a hotspot URL. In some embodiments, the resource table may be stored on the data file in a hash table manner, and the index of the resource table is stored in the memory. The hash value is found according to the URL, and the index is found by the hash value, and the data file is directly located according to the index pointer. After the hotspot statistics unit determines the hotspot URL, the active crawling unit actively captures the webpage corresponding to the hotspot URL and other webpages to which the hotspot URL is linked. If the A webpage is a hot webpage, the A webpage contains a link to the B webpage, and the B webpage contains a link to the C webpage. In the case of a depth of 3, the A, B, and C pages are actively captured locally. The specific excavation depth in the actual application is set manually. Under normal circumstances, the excavation depth is 5 levels to complete the monitoring.
举例而言, 若 www.xxx.com被确定为热点 URL, 则主动抓取单元发送 HTTP/GET 请求到 www.xxx.com, 这时通常直接返回 Index.html。分析 Index.html上的链接, 做广度 或者深度抓取。 通常 Index网页代表一个主页, 由主页开始逐级抓取各级网页内容。 深 度抓取采用的是递归抓取所有遇到的超级链接, 直到递归达到要求的抓取级别。 广度抓 取则是检索一个网页的全部超级链接, 分别发送 HTTP请求以抓取全部内容, 然后再逐 级深入直到要求的抓取级别。  For example, if www.xxx.com is determined to be a hotspot URL, the active crawl unit sends an HTTP/GET request to www.xxx.com, which usually returns directly to Index.html. Analyze links on Index.html for breadth or depth crawling. Usually, the Index webpage represents a homepage, and the content of the webpages at all levels is crawled step by step from the homepage. Deep crawling uses recursive crawling of all encountered hyperlinks until the recursion reaches the required crawl level. Breadth crawling is to retrieve all the hyperlinks of a web page, send HTTP requests to fetch all the content, and then drill down to the required crawl level.
抓取到的资源通过协议重组后供后台进行分析, 可以了解到独立 IP ( Internet Protocol, 网络协议, IP)地址流量、 网站页面流量、独立用户流量、新用户流量等数据, 从而实现对舆情的监控。  The captured resources are reorganized by the protocol for analysis in the background, and data such as independent IP (Internet Protocol, IP protocol, IP) address traffic, website page traffic, independent user traffic, new user traffic, etc. can be learned, thereby achieving sensational monitor.
P2P, 即 Peer-To-Peer, 作为对等网络的代名词已被人们所熟知。 P2P网络可以简单 的定义成通过直接交换来实现不同系统之间的资源共享。 在 P2P 网络环境中, 通过 Internet连接的计算机被看做是平等的参与者, 它们的地位是彼此对等的, 每个参与通 信的节点被称作为一个 Peer。在 P2P模式下, 服务器和客户端之间的界限被取消了。 由 于数据存储、 处理和网络带宽等均是以一种完全分散、 异步的方式来运行, 各种负载就 可以得到完全合理的均衡。 P2P的应用模式的特点就是下载的人越多, 提供的带宽也越 宽, 种子也会越来越多, 下载的速度越来越快。  P2P, Peer-To-Peer, is synonymous with peer-to-peer networks. A P2P network can be simply defined as a direct exchange to enable resource sharing between different systems. In a P2P network environment, computers connected via the Internet are treated as equal participants, their status is equal to each other, and each node participating in the communication is referred to as a Peer. In P2P mode, the boundary between the server and the client is canceled. Since data storage, processing, and network bandwidth are all operated in a completely decentralized, asynchronous manner, the various loads can be perfectly balanced. The P2P application mode is characterized by the more people downloading, the wider the bandwidth provided, the more seeds will be available, and the download speed will be faster and faster.
在 P2P 应用中, P2P 节点通过浏览器到网站下载需要的种子文件, 然后从中获取 Tracker服务器的地址并与之连接,连接成功后 Track服务器就会返回正在下载同一资源 文件的其它节点 (邻居节点) 的信息。 请求节点获取该信息后向这些邻居节点发出消息 建立连接,进行资源的下载,从而实现在网络中的对等节点之间共享资源和服务。其中, 种子文件是被下载文件的 "索引", 下载文件的每个块的索引信息和 Hash验证码都写入 种子文件。 Tracker服务器是收集下载者的服务器, 并将此信息提供给其它下载者, 使 下载者们相互连接起来传输数据。  In a P2P application, the P2P node downloads the required seed file through the browser to the website, and then obtains and connects to the address of the Tracker server. After the connection is successful, the Track server returns other nodes (neighbor nodes) that are downloading the same resource file. Information. After the requesting node obtains the information, it sends a message to these neighboring nodes to establish a connection and download the resources, thereby realizing sharing resources and services between the peer nodes in the network. The seed file is the "index" of the downloaded file, and the index information and the Hash verification code of each block of the downloaded file are written into the seed file. The Tracker server is the server that collects the downloaders and provides this information to other downloaders, allowing the downloaders to connect to each other to transfer data.
由此可见, 下载者要下载文件内容, 首先需要得到相应的种子文件, 然后解析种子 文件得到 Tracker服务器的地址, 连接 Tracker服务器。 下载者从 Tracker服务器的回应 消息中获得其它下载者 (邻居节点) 的 IP地址, 连接其它下载者完成数据和资源的共 享。 在这个过程中, 要下载的文件被分为若干个文件分片, 其分别存储于不同的节点当 中, 而 Tracker服务器能获知每个文件分片所存储的不同节点的 IP地址。 It can be seen that the downloader needs to download the file content, first need to get the corresponding seed file, and then parse the seed. The file gets the address of the Tracker server and connects to the Tracker server. The downloader obtains the IP address of other downloaders (neighbor nodes) from the response message of the Tracker server, and connects other downloaders to complete the sharing of data and resources. In this process, the file to be downloaded is divided into several file fragments, which are respectively stored in different nodes, and the Tracker server can know the IP addresses of different nodes stored in each file fragment.
节点与 Tracker服务器之间的通信基于 HTTP协议。 也就是说, 节点连接 Tracker 服务器需要首先向该 Tracker服务器发送 HTTP/GET请求,该请求中包含的 URL是种子 文件中记录的 Tracker服务器的地址。  The communication between the node and the Tracker server is based on the HTTP protocol. That is, the node connected to the Tracker server needs to first send an HTTP/GET request to the Tracker server. The URL contained in the request is the address of the Tracker server recorded in the seed file.
在一些实施例中, 热点统计单元对 P2P 节点在预定时间内向 Tracker服务器发送 HTTP/GET请求中的 URL的被请求次数进行统计。 当在预定时间内对某个 URL的请求 次数超过预定阈值时,将该 URL确定为热点 URL。主动抓取模块向该热点 URL对应的 Tracker请求下载文件的每个文件分片所存储的节点的 IP地址, 然后从不同的节点获取 不同的文件分片, 将这些分片重新组合为原始内容, 供后台内容分析单元进行分析。 可 以使理解的是, 这里的主动抓取单元类似一个 P2P节点。  In some embodiments, the hotspot statistics unit counts the number of times the P2P node sends the requested number of URLs in the HTTP/GET request to the Tracker server within a predetermined time. The URL is determined to be a hotspot URL when the number of requests for a URL within a predetermined time exceeds a predetermined threshold. The active crawling module requests the tracker corresponding to the hotspot URL to download the IP address of the node stored in each file fragment of the file, and then obtains different file fragments from different nodes, and reassembles the fragments into original content. For background content analysis unit for analysis. It can be understood that the active crawling unit here is similar to a P2P node.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通 过程序来指令相关的硬件完成, 该程序可以存储于一种计算机可读存储介质中, 上述提 到的存储介质可以是只读存储器, 磁盘或光盘等。  A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, and the above mentioned storage medium may be It is a read-only memory, a disk or a disc.
结合本文所揭示实施例阐述的各种例示性逻辑块、单元、 电路、元件及 /或组件可通 过通用处理器、 数字信号处理器 (Digital Signal Processing, DSP)、 应用专用集成电路 ( Application Specific Integrated Circuit, ASIC )、 现场可编程门阵列 ( Field - Programmable Gate Array, FPGA)或其它可编程逻辑组件、 离散门或晶体管逻辑、 离散 硬件组件、 或设计用于执行本文所述功能的其任何组合来实施或执行。 通用处理器可为 微处理器, 但另一选择为, 处理器也可为任何常规处理器、 控制器、 微控制器、 或状态 机。 处理器也可实施为计算组件的组合, 例如 DSP 与微处理器的组合、 多个微处理器 的组合、 一个或多个微处理器与 DSP核心的组合、 或任何其它这种配置。  Various exemplary logic blocks, units, circuits, components and/or components set forth in connection with the embodiments disclosed herein may be implemented by a general purpose processor, a digital signal processing (DSP), or an application specific integrated circuit (Application Specific Integrated) Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic component, discrete gate or transistor logic, discrete hardware component, or any combination thereof designed to perform the functions described herein. Implement or execute. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. The processor may also be implemented as a combination of computing components, such as a combination of a DSP and a microprocessor, a plurality of microprocessors, a combination of one or more microprocessors and a DSP core, or any other such configuration.
本发明的实施例在预定时间内对统一资源定位符 URL的被请求次数进行统计以确 定热点 URL,然后主动抓取所述热点 URL对应的资源进行协议重组和内容分析。因此, 可以减少协议重组单元和后台内容分析单元的负担。 此外, 本发明实施例的技术方案对 于分布式 P2P资源, 可主动抓取分布在各处的 P2P文件分片, 以支持对 P2P流量的监 控。 以上对本发明所提供的用于网络流量监控的方法和系统进行了详细介绍,对于本领 域的一般技术人员, 依据本发明实施例的思想, 在具体实施方式及应用范围上均会有改 变之处, 因此, 本说明书内容不应理解为对本发明的限制。 The embodiment of the present invention counts the requested number of times of the Uniform Resource Locator URL to determine the hotspot URL within a predetermined time, and then actively crawls the resource corresponding to the hotspot URL for protocol reorganization and content analysis. Therefore, the burden of the protocol reorganization unit and the background content analysis unit can be reduced. In addition, the technical solution of the embodiment of the present invention can actively capture P2P file fragments distributed throughout the distributed P2P resources to support monitoring of P2P traffic. The method and system for network traffic monitoring provided by the present invention are described in detail above. For those skilled in the art, according to the idea of the embodiment of the present invention, there are changes in the specific implementation manner and application scope. Therefore, the content of the specification should not be construed as limiting the invention.

Claims

权利要求 Rights request
1、 一种用于网络流量监控的方法, 其特征在于, 所述方法包括:  A method for network traffic monitoring, the method comprising:
对数据包进行引流分类;  Divide classification of data packets;
在预定时间内对统一资源定位符 URL的被请求次数进行统计以确定热点 URL;  Counting the requested number of Uniform Resource Locator URLs within a predetermined time to determine a hotspot URL;
主动抓取所述热点 URL对应的资源;  Actively capturing resources corresponding to the hotspot URL;
对主动抓取的所述热点 URL对应的资源进行协议重组; 及  Reorganizing the resources corresponding to the hot spot URL actively crawled; and
对经协议重组的数据进行内容分析。  Content analysis of the data reorganized by the agreement.
2、 根据权利要求 1所述的方法, 其特征在于,  2. The method of claim 1 wherein
在预定时间内对 URL的被请求次数进行统计以确定热点 URL包括: 对所述 URL分级统 计被请求次数以确定每级 URL是否为热点 URL。  Counting the requested number of URLs within a predetermined time to determine the hotspot URL includes: rating the requested number of times for the URL to determine whether each level of the URL is a hotspot URL.
3、 根据权利要求 1或 2所述的方法, 其特征在于, 其中,  3. The method according to claim 1 or 2, wherein
所述热点 URL对应的资源包括: 网页或对等网络 P2P文件分片。  The resources corresponding to the hotspot URL include: a webpage or a peer-to-peer network P2P file fragmentation.
4、 根据权利要求 1-3中任一项所述的方法, 其特征在于, 在预定时间内对 URL的被 请求次数进行统计以确定热点 URL包括: 当某个 URL在预定时间内被请求次数超过预定阈 值时, 则将该 URL确定为所述热点 URL。  The method according to any one of claims 1 to 3, wherein the number of requested times of the URL is counted within a predetermined time to determine the hotspot URL comprises: when a certain URL is requested in a predetermined time When the predetermined threshold is exceeded, the URL is determined as the hotspot URL.
5、 一种用于网络流量监控的系统, 其特征在于, 所述系统包括:  5. A system for network traffic monitoring, wherein the system comprises:
引流分类单元, 用于对数据包进行引流分类;  a traffic classification unit, configured to perform traffic classification on the data packet;
热点统计单元, 用于在预定时间内对统一资源定位符 URL的被请求次数进行统计以 确定热点 URL;  a hotspot statistic unit, configured to perform statistics on the requested times of the uniform resource locator URL within a predetermined time to determine a hotspot URL;
主动抓取单元, 用于主动抓取所述热点 URL对应的资源;  An active crawling unit, configured to actively capture resources corresponding to the hotspot URL;
协议重组单元, 用于对主动抓取的所述热点 URL对应的资源进行协议重组; 及 后台内容分析单元, 用于对经协议重组后的数据进行内容分析。  The protocol reorganization unit is configured to perform protocol reorganization on the resources corresponding to the hot spot URL that is actively captured; and the background content analysis unit is configured to perform content analysis on the data after the protocol reorganization.
6、 根据权利要求 5所述的系统, 其特征在于, 所述热点统计单元进一步包括分级 统计单元,所述分级统计单元用于对所述 URL分级统计请求次数以确定每级 URL是否为热 点 URL。  The system according to claim 5, wherein the hotspot statistics unit further comprises a hierarchical statistics unit, wherein the hierarchical statistical unit is configured to hierarchically count the number of requests for the URL to determine whether each level of the URL is a hotspot URL. .
7、 根据权利要求 5或 6所述的系统, 其特征在于,  7. A system according to claim 5 or claim 6 wherein:
所述热点 URL对应的资源包括: 网页或对等网络 P2P文件分片。  The resources corresponding to the hotspot URL include: a webpage or a peer-to-peer network P2P file fragmentation.
8、 根据权利要求 5-7中任一项所述的系统, 其特征在于, 所述热点统计单元进一步 包括判断单元,所述判断单元用于当某个 URL在预定时间内被请求次数超过预定阈值时, 则将该 URL确定为热点 URL。  The system according to any one of claims 5-7, wherein the hotspot statistics unit further comprises a determining unit, wherein the determining unit is configured to: when a certain URL is requested for a predetermined time, exceeds a predetermined number of times When the threshold is reached, the URL is determined to be a hotspot URL.
PCT/CN2012/080039 2011-08-22 2012-08-13 Method and system for monitoring network traffic WO2013026362A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110241618.3 2011-08-22
CN201110241618.3A CN102957571B (en) 2011-08-22 2011-08-22 Method and system for monitoring network flows

Publications (1)

Publication Number Publication Date
WO2013026362A1 true WO2013026362A1 (en) 2013-02-28

Family

ID=47745932

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/080039 WO2013026362A1 (en) 2011-08-22 2012-08-13 Method and system for monitoring network traffic

Country Status (2)

Country Link
CN (1) CN102957571B (en)
WO (1) WO2013026362A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376797A (en) * 2018-11-20 2019-02-22 大连理工大学 A kind of net flow assorted method based on binary coder and more Hash tables
WO2020042979A1 (en) * 2018-08-30 2020-03-05 京东方科技集团股份有限公司 Closed system monitoring method and device and monitoring apparatus
CN113094621A (en) * 2021-04-23 2021-07-09 中南大学 Network public opinion cloud platform
CN113556260A (en) * 2020-04-24 2021-10-26 北京三快在线科技有限公司 Flow monitoring method and device, storage medium and electronic equipment

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103281367B (en) * 2013-05-22 2016-03-02 北京蓝汛通信技术有限责任公司 A kind of load-balancing method and device
CN103338249B (en) * 2013-06-26 2018-05-25 优视科技有限公司 Caching method and device
CN103593446A (en) * 2013-11-18 2014-02-19 北京国双科技有限公司 Flow quality analyzing method and device
CN104092620A (en) * 2014-07-04 2014-10-08 浪潮(北京)电子信息产业有限公司 Method and device for achieving adjustment of network bandwidth
CN105119764B (en) * 2015-09-29 2019-06-28 百度在线网络技术(北京)有限公司 Method and apparatus for traffic monitoring
CN106209796A (en) * 2016-06-27 2016-12-07 安徽科成信息科技有限公司 A kind of safe network monitoring apparatus
CN106209985A (en) * 2016-06-27 2016-12-07 安徽科成信息科技有限公司 A kind of safety monitoring device
CN106161433A (en) * 2016-06-27 2016-11-23 安徽科成信息科技有限公司 A kind of network monitoring apparatus ensureing Web vector graphic safety
CN109429262A (en) * 2017-09-04 2019-03-05 中国移动通信有限公司研究院 A kind of detection method of hot spot, the network equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101437030A (en) * 2008-11-29 2009-05-20 成都市华为赛门铁克科技有限公司 Method for preventing server from being attacked, detection device and monitoring device
US7661136B1 (en) * 2005-12-13 2010-02-09 At&T Intellectual Property Ii, L.P. Detecting anomalous web proxy activity
CN101902365A (en) * 2009-05-26 2010-12-01 北京启明星辰信息技术股份有限公司 Method for monitoring P2P traffic of wide area network and system thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101753341A (en) * 2008-12-16 2010-06-23 上海冰峰计算机网络技术有限公司 Monitoring method of computer network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7661136B1 (en) * 2005-12-13 2010-02-09 At&T Intellectual Property Ii, L.P. Detecting anomalous web proxy activity
CN101437030A (en) * 2008-11-29 2009-05-20 成都市华为赛门铁克科技有限公司 Method for preventing server from being attacked, detection device and monitoring device
CN101902365A (en) * 2009-05-26 2010-12-01 北京启明星辰信息技术股份有限公司 Method for monitoring P2P traffic of wide area network and system thereof

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020042979A1 (en) * 2018-08-30 2020-03-05 京东方科技集团股份有限公司 Closed system monitoring method and device and monitoring apparatus
US11567822B2 (en) 2018-08-30 2023-01-31 Boe Technology Group Co., Ltd. Method of monitoring closed system, apparatus thereof and monitoring device
CN109376797A (en) * 2018-11-20 2019-02-22 大连理工大学 A kind of net flow assorted method based on binary coder and more Hash tables
CN109376797B (en) * 2018-11-20 2023-05-16 大连理工大学 Network traffic classification method based on binary encoder and multi-hash table
CN113556260A (en) * 2020-04-24 2021-10-26 北京三快在线科技有限公司 Flow monitoring method and device, storage medium and electronic equipment
CN113556260B (en) * 2020-04-24 2022-12-09 北京三快在线科技有限公司 Flow monitoring method and device, storage medium and electronic equipment
CN113094621A (en) * 2021-04-23 2021-07-09 中南大学 Network public opinion cloud platform
CN113094621B (en) * 2021-04-23 2023-08-08 中南大学 Internet public opinion cloud platform

Also Published As

Publication number Publication date
CN102957571A (en) 2013-03-06
CN102957571B (en) 2015-04-29

Similar Documents

Publication Publication Date Title
WO2013026362A1 (en) Method and system for monitoring network traffic
US10432652B1 (en) Methods for detecting and mitigating malicious network behavior and devices thereof
Liu et al. FL-GUARD: A detection and defense system for DDoS attack in SDN
US20160344796A1 (en) Network acceleration method, apparatus and device based on router device
WO2017107780A1 (en) Method, device and system for recognizing illegitimate proxy for charging fraud
US20100115613A1 (en) Cacheable Mesh Browsers
CN110636068B (en) Method and device for identifying unknown CDN node in CC attack protection
WO2017025052A1 (en) Resource caching method and device
Wang et al. Cooperative-filter: countering interest flooding attacks in named data networking
US11089039B2 (en) Network traffic spike detection and management
Cai et al. Detecting HTTP botnet with clustering network traffic
CN111629051A (en) Performance optimization method and device for industrial internet identification analysis system
US11734367B2 (en) Direct data center request for proxy web scraping
US9055113B2 (en) Method and system for monitoring flows in network traffic
EP2885896A1 (en) Data services in a computer system
US11936753B2 (en) Graceful shutdown of supernodes in an internet proxy system
CN103825916B (en) A kind of resource downloading method and system
Nwebonyi et al. Reputation based approach for improved fairness and robustness in P2P protocols
CN112019508A (en) Method, system and electronic device for detecting DDos attack based on Web log analysis
WO2017097092A1 (en) Method and system for processing cache cluster service
Shafiee Sarjaz et al. Securing BitTorrent using a new reputation-based trust management system
EP2605480B1 (en) Apparatus and method for HTTP analysis
KR101084681B1 (en) Behavior pattern modelling system of network traffic for botnet detecting and behavior pattern modelling method of network traffic for botnet detecting
Wang et al. Identifying influential factors of cdn performance with large-scale data analysis
Douglas Circumvention of censorship of internet access and publication

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12825030

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12825030

Country of ref document: EP

Kind code of ref document: A1