WO2013162264A1 - Method and system for collecting objects by using packet mirroring - Google Patents

Method and system for collecting objects by using packet mirroring Download PDF

Info

Publication number
WO2013162264A1
WO2013162264A1 PCT/KR2013/003477 KR2013003477W WO2013162264A1 WO 2013162264 A1 WO2013162264 A1 WO 2013162264A1 KR 2013003477 W KR2013003477 W KR 2013003477W WO 2013162264 A1 WO2013162264 A1 WO 2013162264A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
identification information
packet
mirroring
unit
Prior art date
Application number
PCT/KR2013/003477
Other languages
French (fr)
Korean (ko)
Inventor
송진영
Original Assignee
줌인터넷 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 줌인터넷 주식회사 filed Critical 줌인터넷 주식회사
Priority claimed from KR1020130045071A external-priority patent/KR101471515B1/en
Publication of WO2013162264A1 publication Critical patent/WO2013162264A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Definitions

  • the present invention relates to a method and a system for collecting objects, and more particularly, to a method and a system for collecting content included in an object to be searched.
  • the search technology refers to a technology for searching all the search target objects interspersed in the Internet network as needed to find information included in the search target object.
  • web page search technology may include link collection technology to obtain a web page link (or uniform resource locator) that is the target of a search, crawling technology to obtain web page contents of collected links, and crawled web page content. It is composed of indexing technology that processes the data into a form suitable for search, and search engine technology that provides web page results related to a search word required by a user.
  • the conventional search technology secures a link of a web page to be searched, and then acquires the content of the web page based on the link.
  • the crawler continuously collects and processes the content of the search target web page, there is a problem that not only the hardware resources required for this, but also a huge traffic occurs.
  • An object of the present invention is to provide an object collection method and system that can collect the content of the object corresponding to the search target without generating new traffic.
  • Another object of the present invention is to provide a method and system for collecting objects that can minimize unnecessary hardware resources.
  • Another object of the present invention is to provide an object collection method and system that can optimally maintain the freshness of an object to be searched.
  • Another object of the present invention is to provide an object collection method and system that can reduce the number of search targets to collect content by filtering spam in advance.
  • the object collection method using the packet mirroring according to the first aspect of the present invention, the user terminal in the method for collecting the content contained in the object to be searched by the search system, A mirroring step of mirroring packets transmitted and received with the web server; An identification information extraction step of extracting identification information of an object to be searched from the packet mirrored in the mirroring step; A content collection step of collecting content of an object to be searched from the packet mirrored in the mirroring step; And a storage step of storing the content collected in the content collection step in association with the content and in association with the identification information extracted in the identification information extraction step.
  • An object collection system using packet mirroring includes an identification information database for storing identification information of an object that is a target of a search service and content of an object corresponding to the identification information; A mirroring unit for mirroring packets transmitted and received by a user terminal with a web server; A link processing unit which extracts identification information from the packet mirrored by the mirroring unit, and adds the extracted identification information to the identification information database; And a content collecting unit for extracting content of an object corresponding to identification information added to the identification information database by the link processing unit from the packet mirrored by the mirroring unit, and storing the extracted content in the identification information database.
  • An identification information database for storing identification information of an object that is a target of a search service and content of an object corresponding to the identification information
  • a mirroring unit for mirroring packets transmitted and received by a user terminal with a web server
  • a link processing unit which extracts identification information from the packet mirrored by the mirroring unit, and adds the extracted identification information to the identification information database
  • And a content collecting unit for extract
  • the freshness of the object to be searched can be optimally maintained.
  • FIG. 1 is a network diagram including an object collection system using packet mirroring according to an embodiment of the present invention.
  • FIG. 2 is a functional block diagram illustrating an object collection system using packet mirroring according to an embodiment of the present invention.
  • FIG. 3 is a flowchart illustrating an object collection method using packet mirroring according to an embodiment of the present invention.
  • FIG. 1 is a block diagram of an object collection system using packet mirroring according to an embodiment of the present invention.
  • the network N may include a local area network (LAN), a wide area network (WAN), a value added network (VAN), a personal local area network (PAN), and a mobile communication network (PAN). It can be implemented in all kinds of wired and wireless networks such as mobile radio communication network or satellite communication network.
  • LAN local area network
  • WAN wide area network
  • VAN value added network
  • PAN personal local area network
  • PAN mobile communication network
  • the user terminal 100 is a communication terminal for connecting to a remote web server 300 through a network N, and may be implemented as a computer, a portable terminal, a television, etc. that can be connected to other terminals and servers.
  • the computer includes, for example, a laptop, desktop, laptop, etc., which is equipped with a web browser
  • the portable terminal is, for example, a wireless communication device that ensures portability and mobility.
  • the television may include an Internet Protocol Television (IPTV), an Internet Television (Internet Television), a terrestrial TV, a cable TV, or the like.
  • IPTV Internet Protocol Television
  • Internet Television Internet Television
  • Web server 300 is a computer system that can be connected to the user terminal 100 or another server in the remote place through the network (N), mainly serves to provide a data service to the communication partner.
  • the user terminal 100 transmits a service request to the web server 300 through a web browser or other application program.
  • a service request is made in which an HTTP request packet through Hyper Text Transfer Protocol (HTTP) is transmitted from the user terminal 100 to the web server 300, and the web server 300 corresponds to the HTTP request packet.
  • HTTP Hyper Text Transfer Protocol
  • Service is provided by transmitting an HTTP Response packet to the user terminal 100.
  • the object collection system 200 is also a computer system that can be connected to the remote user terminal 100 or another server through the network (N), the HTTP request packet received from the user terminal 100 The service may be provided by transmitting a corresponding HTTP response packet back to the user terminal 100.
  • the object collection system 200 may include a mirroring unit 201 connected to the network backbone network or may communicate with the mirroring unit 201 to receive data acquired by the mirroring unit 201.
  • the object collection system 200 includes a mirroring unit 201, but the object collection system 200 must include a mirroring unit 201 connected to a network (N) backbone network. It is not necessary to merely receive data acquired by the mirroring unit 201.
  • N network
  • 2 is a functional block diagram illustrating an object collection system using packet mirroring according to an embodiment of the present invention.
  • the object collection system 200 includes a mirroring unit 201 connected to a wired or wireless communication line connecting the user terminal 100 and the network N.
  • the mirroring unit 201 may mirror or sniff packet data transmitted and received between the web server 300 connected to the network N and the user terminal 100.
  • the mirroring unit 201 of the object collection system 200 may be implemented as a tap (TAP) device or a switch mirror installed in a network backbone network, or the network of the user terminal 100. It may also be a module of hardware or software installed in an access point serving as a contact point.
  • the packet mirrored or sniffed by the mirroring unit 201 may be an HTTP request packet transmitted from the user terminal 100 to the web server 300 or may be an HTTP response packet.
  • the mirroring unit 201 may mirror the HTTP response packet and may mirror both the HTTP request packet and the HTTP response packet.
  • the object collection system 200 may include a link processing unit 202.
  • the link processing unit 202 processes the packets mirrored by the mirroring unit 201 and adds search target identification information to the identification information database 208 to be described later.
  • the search object identification information obtained by the link processing unit 202 from the mirrored packet is information for uniquely identifying the object to be searched.
  • an address (URL; Uniform Resource Locator) of the search object is Can be.
  • the object to be searched may be any type of data existing on the Internet such as a web page, an image, a video, and a document.
  • the link processing unit 202 stores the identification information of the search target in the identification information database 208 in a structural form.
  • the identification information database 208 stores and manages search object identification information indicating each object to be searched, which is obtained from the mirrored packet.
  • the identification information database 208 stores and manages additional information on each search target identification information, wherein the additional information includes the identification information, a packet including the identification information, or the identification information. Information or content included in at least one of another packet corresponding to the packet or an object corresponding to the identification information may be included.
  • the identification database 208 may be configured as a physical or software module including a conventional database system for structurally storing data.
  • the link processing unit 202 extracts the identification information included in the packet mirrored by the mirroring unit 201 and writes the HTTP request packet mirrored by the mirroring unit 201 to record the identification information database 208. It is available.
  • the link processing unit 202 may include not only the identification information included in the HTTP request packet, but also information included in the header of the layered protocol of the packet, including time information on which the packet is mirrored (for example, source address information (Source IP address)). Alternatively, destination address information (Destination IP address) may be stored together with the identification information.
  • the information included in the header of the layer protocol may be used by the pairing unit 205 to be described later.
  • the object collection system 200 may include a spam processing unit 203.
  • the spam processing unit 203 may determine at least some of the identification information registered in the identification information database 208 by the link processing unit 202 as spam. For example, when the status information (HTTP status code) included in the HTTP response packet corresponding to the identification information registered in the identification information database 208 is "200", the spam processing unit 203 performs normal processing. 400 ”can be classified as spam.
  • the spam processing unit 203 may refer to the referrer included in the HTTP request packet including the identification information, and may classify the identification information as spam if the referrer is not a pre-registered host.
  • the object collection system 200 may include a content collector 204.
  • the content collection unit 204 collects content included in each search target object classified by the identification information.
  • the content collecting unit 204 may collect content by mirroring an HTTP response packet transmitted from the web server 300 to the user terminal 100 at the request of the user terminal 100. That is, the web server 300 may collect the content included in the body of the HTTP response packet transmitted to the user terminal 100.
  • the content collecting unit 204 may not collect the content corresponding to the identification information determined by the spam processing unit 203 as spam.
  • the content collector 204 mirrors a plurality of packets corresponding to one piece of identification information in order to collect the contents of a specific object that is divided into a plurality of packets and then collects the contents included in the body of the packet. have.
  • the content collection unit 204 may not collect the content within a certain period of time, if the content has already been collected for one identification information. Alternatively, if a predetermined time has elapsed since the last time content was collected for one piece of identification information, but a packet including the content corresponding to the identification information is not mirrored, an HTTP request for an object corresponding to the identification information is directly performed. By sending a packet, the content contained in the object may be crawled, and the content stored in association with the identification information may be updated with the crawled content.
  • the content collection unit 204 may collect only the content of the host allowed to access according to a pre-stored access permission policy (for example, robots.txt).
  • the content collection unit 204 discards the content of the object that is not accessible by the access permission policy for each host, even if the mirroring unit 201 mirrors the content of the object, and accesses the content of the object allowed by the access permission policy. Can only collect.
  • the identification information corresponding to the content that is not accessible according to the access permission policy may be pre-registered as spam in the spam processing unit 203.
  • the object collection system 200 may include a pairing unit 205.
  • the pairing unit 205 confirms a correspondence relationship between the extracted identification information and the collected content. That is, the specific identification information and the content of the corresponding object should be matched with each other.
  • the pairing unit 205 confirms the correspondence between the identification information and the content, and can confirm the correspondence between the packets as the correspondence between the packets.
  • the pairing unit 205 may check whether the HTTP request packet from which the identification information is extracted and the HTTP response packet from which the content is collected correspond to each other. That is, it may be checked whether the HTTP response packet in which the content is collected is a response to the HTTP request packet from which identification information is extracted. At this time, the pairing unit 205 may check the correspondence between the two packets by comparing the protocol header included in the packet. In particular, the packet may be transmitted and received according to a plurality of protocols layered in two or more different layers, and thus may include headers of two or more layer protocols. Accordingly, the pairing unit 205 may compare the information included in the same layer protocol header included in the two packets with each other, and determine whether the two packets correspond to each other.
  • the pairing unit 205 compares the source address information (Source IP address) and the destination address information (Destination IP address) included in the TCP / IP protocol header included in the HTTP request packet and the HTTP response packet, respectively, and correspond to each other. If, the two packets may be determined to correspond to each other, and thus identification information and contents extracted from each of the two packets may be determined to correspond. Of course, the pairing unit 205 may refer to the information on the time when the two packets are transmitted and received together when determining the correspondence relationship.
  • the pairing unit 205 may be unnecessary when the identification information and the content are extracted in the same packet.
  • the object collection system 200 may include an indexing unit 206 for indexing each piece of identification information based on the contents included in the object collected by the content collecting unit 204.
  • the indexing unit 206 indexes each identification information by referring to a string such as text included in the collected content or a content name included in the collected content, thereby searching an object corresponding to each identification information.
  • the object collection system 200 includes a search unit 207.
  • the search unit 207 receives a search request including a search word from the user terminal 100, the search unit 207 searches for identification information of an object in which a string corresponding to the received search word is indexed.
  • the search unit 207 may arrange the searched identification information and provide the searched identification information to the user terminal 100.
  • the mirroring unit 201, the link processing unit 202, the spam processing unit 203, the content collecting unit 204, the pairing unit 205, and the indexing unit included in the object collection system 200 may be configured as a software module in which functions are separated from each other in one physical system, or may be configured as at least one software module or a combination of at least one hardware module.
  • 3 and 4 are flowcharts illustrating an object collection method using packet mirroring according to an embodiment of the present invention.
  • the object collection method begins with mirroring a packet transmitted and received by the user terminal 100 with the web server 300 (S11).
  • the mirrored packet includes an object request packet requesting the user terminal 100 to transmit a specific object to the web server 300, and an object transmitted by the web server 300 to the user terminal 100. It may be a response packet.
  • the object request packet is an HTTP Request packet
  • a URL of the packet may be included as identification information of the requested object.
  • the object response packet is an HTTP response packet
  • the body of the packet may include the content of the requested object.
  • the object collecting method performs the step of extracting and collecting identification information included in the mirrored packet (S12).
  • the identification information may be a URL included in the header of the mirrored packet as described above.
  • the object collection method includes the step (S13) for collecting the content contained in the mirrored packet in the object collection system 200.
  • the step S13 is, for example, if the identification information and the content is included in one packet, by collecting the content in the packet from which the identification information is extracted, for example, the identification information and the content in different packets If included, the packet from which the identification information is extracted and the packet for collecting the content are first paired, and then the content identified as having a corresponding relationship can be collected.
  • the object collecting method includes storing the content collected in step S13 in association with the identification information collected in step S12 (S14).
  • the content collected in step S13 may be stored in association with the identification information in the corresponding relationship through a process of confirming the correspondence with the identification information.
  • the content stored in association with the identification information may be used to index the index word in the identification information.
  • the step (S13) of collecting the content of the object corresponding to the identification information may be made a plurality of times for one identification information. Even if the content corresponding to the object corresponding to the same identification information has already been collected and the collected content is stored in association with each other, if a predetermined time elapses, the process of collecting the content corresponding to the identification information may be repeated. Of course, whenever specific identification information is extracted, the corresponding content may be collected together to maintain the freshness of the content. However, a time may be set in advance, and a process of collecting content for the same identification information may not be repeated within the time.
  • the object collection system is not mirrored. 200 may directly send an object request packet requesting an object for the identification information to the web server 300, and collect the content from the response. This is to provide more accurate search results by maintaining the freshness of the content corresponding to the identification information.
  • the object collecting method using packet mirroring may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer.
  • Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media.
  • Computer readable media may include both computer storage media and communication media.
  • Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transmission mechanism, and includes any information delivery media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention relates to a method and a system for collecting objects by using packet mirroring. The method for a search system collecting content, which is included in an object that is a search target, comprises: a mirroring step for mirroring a packet which is transreceived between a user terminal and a web server; an identification information extraction step for extracting identification information from the packet which is mirrored in the mirroring step; a content collection step for collecting the content of the object, which is the search target, from the packet that is mirrored in the mirroring step; and a saving step for matching the content that is collected in the content collection step with the content, associating same with the identification information that is extracted in the identification information extraction step, and saving same. According to the present invention, content corresponding to the search target can be collected by using minimal hardware resource.

Description

패킷미러링을 이용한 객체 수집 방법 및 시스템Object collection method and system using packet mirroring
본 발명은 객체 수집 방법 및 시스템에 관한 것으로 보다 상세하게는 검색대상이 되는 객체에 포함되는 콘텐츠를 수집하는 방법 및 그 시스템에 관한 것이다.The present invention relates to a method and a system for collecting objects, and more particularly, to a method and a system for collecting content included in an object to be searched.
최근 인터넷 통신기술이 발달하고 이에 따라 컨텐츠의 생산 및 소비방식 또한 인터넷 통신기술을 중심으로 변화하고 있다. 기존에 오프라인 매체를 통한 컨텐츠의 생산 및 소비에 비해 인터넷 시대의 정보유통은 속도와 파급력 면에서 비교가 되지 않는다. 이러한 인터넷 시대의 정보유통에 있어 가장 핵심적이고 중추가 되는 기술은 검색기술이라 할 수 있다.Recently, Internet communication technology has been developed, and accordingly, the method of producing and consuming content is changing with the focus on Internet communication technology. Compared to the production and consumption of contents through off-line media, information distribution in the Internet age is not comparable in speed and impact. Search technology is the most important and pivotal technology in the information distribution in the Internet age.
검색기술이라 함은, 인터넷망에 산재되어 있는 모든 검색대상 객체를 필요에 따라 검색하여 그 검색대상 객체에 포함된 정보를 찾아볼 수 있도록 하는 기술을 의미한다.The search technology refers to a technology for searching all the search target objects interspersed in the Internet network as needed to find information included in the search target object.
일례로 웹페이지 검색기술은 검색의 대상이 되는 웹페이지 링크(또는 유알엘; URL; uniform resource locator)를 확보하는 링크수집기술, 수집된 링크의 웹페이지 내용을 확보하는 크롤링기술, 크롤링된 웹페이지 내용을 검색에 적합한 형태로 가공하는 인덱싱기술, 그리고 사용자의 필요에 의한 검색어에 연관된 웹페이지 결과물을 제공하는 검색엔진기술로 구성된다. For example, web page search technology may include link collection technology to obtain a web page link (or uniform resource locator) that is the target of a search, crawling technology to obtain web page contents of collected links, and crawled web page content. It is composed of indexing technology that processes the data into a form suitable for search, and search engine technology that provides web page results related to a search word required by a user.
이에 따르면 종래의 검색 기술은, 검색대상이 되는 웹페이지의 링크를 확보한 후, 이를 기초로 웹페이지의 내용을 획득한다. 이때 웹페이지의 내용을 획득하기 위하여, 크롤러(Crawler)가 지속적으로 검색대상 웹페이지들의 콘텐츠를 수집하고 처리하는데, 이를 위해 필요한 하드웨어 리소스가 클 뿐 아니라 막대한 트래픽이 발생하는 문제점이 있었다. According to this, the conventional search technology secures a link of a web page to be searched, and then acquires the content of the web page based on the link. At this time, in order to obtain the contents of the web page, the crawler (Crawler) continuously collects and processes the content of the search target web page, there is a problem that not only the hardware resources required for this, but also a huge traffic occurs.
즉 기존의 검색 기술에서는, 검색대상의 콘텐츠를 수집하기 위해서 장비에 많은 비용을 투자해야 할 뿐 아니라, 검색대상의 콘텐츠를 요청하고 응답을 받는 과정에서 불필요하게 많은 트래픽이 발생하였다. 따라서 트래픽을 발생시키지 않고 검색대상 콘텐츠를 확보하기 위한 방법이 요구되었다. That is, in the existing search technology, not only has to invest a lot of money in the equipment to collect the content of the search target, but also unnecessarily generated a lot of traffic in the process of requesting and receiving the content of the search target. Therefore, a method for securing content to be searched without generating traffic is required.
나아가 종래의 검색 기술에서는, 이미 수집한 검색대상의 콘텐츠의 신선도(Freshness)를 유지하기 위하여, 주기적으로 콘텐츠 수집을 반복하여야 하는 문제점이 있었을 뿐 아니라, 검색대상의 콘텐츠를 수집해온 후에 스팸을 걸러내기 때문에, 불필요한 리소스가 낭비되는 문제점이 있었다.Furthermore, in the conventional search technology, in order to maintain the freshness of the already-searched content, the content collection has to be repeated periodically, and the spam is filtered after collecting the content of the search target. Therefore, there is a problem that unnecessary resources are wasted.
본 발명의 목적은 새로운 트래픽의 발생 없이 검색대상에 대응하는 객체의 콘텐츠를 수집할 수 있는 객체 수집 방법 및 시스템을 제공하는 것이다. An object of the present invention is to provide an object collection method and system that can collect the content of the object corresponding to the search target without generating new traffic.
본 발명의 다른 목적은 불필요한 하드웨어 리소스를 최소화할 수 있는 객체 수집 방법 및 시스템을 제공하는 것이다.Another object of the present invention is to provide a method and system for collecting objects that can minimize unnecessary hardware resources.
본 발명의 다른 목적은 검색대상이 되는 객체의 신선도를 최적으로 유지할 수 있는 객체 수집 방법 및 시스템을 제공하는 것이다. Another object of the present invention is to provide an object collection method and system that can optimally maintain the freshness of an object to be searched.
나아가 본 발명의 또 다른 목적은 스팸을 미리 걸러내어 콘텐츠를 수집할 검색대상의 수를 줄일 수 있는 객체 수집 방법 및 시스템을 제공하는 것이다.Furthermore, another object of the present invention is to provide an object collection method and system that can reduce the number of search targets to collect content by filtering spam in advance.
상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 제1측면에 따른 패킷미러링을 이용한 객체 수집 방법은, 검색시스템이 검색대상이 되는 객체에 포함된 콘텐츠를 수집하는 방법에 있어서, 사용자단말이 웹서버와 송수신하는 패킷을 미러링하는 미러링단계; 상기 미러링단계에서 미러링된 패킷으로부터 검색대상이 되는 객체의 식별정보를 추출하는 식별정보추출단계; 상기 미러링단계에서 미러링된 패킷으로부터 검색대상이 되는 객체의 콘텐츠를 수집하는 콘텐츠수집단계; 및 상기 콘텐츠수집단계에서 수집된 콘텐츠를 상기 콘텐츠에 대응하고 상기 식별정보추출단계에서 추출된 식별정보에 연관하여 저장하는 저장단계를 포함한다. As a technical means for achieving the above-described technical problem, the object collection method using the packet mirroring according to the first aspect of the present invention, the user terminal in the method for collecting the content contained in the object to be searched by the search system, A mirroring step of mirroring packets transmitted and received with the web server; An identification information extraction step of extracting identification information of an object to be searched from the packet mirrored in the mirroring step; A content collection step of collecting content of an object to be searched from the packet mirrored in the mirroring step; And a storage step of storing the content collected in the content collection step in association with the content and in association with the identification information extracted in the identification information extraction step.
본 발명의 제2측면에 따른 패킷미러링을 이용한 객체 수집 시스템은, 검색서비스의 대상이 되는 객체의 식별정보 및 식별정보에 대응하는 객체의 콘텐츠를 저장하는 식별정보 데이터베이스; 사용자단말이 웹서버와 송수신하는 패킷을 미러링하는 미러링부; 상기 미러링부에 의해 미러링된 패킷으로부터 식별정보를 추출하고, 추출된 식별정보를 상기 식별정보 데이터베이스에 추가하는 링크처리부; 및 상기 미러링부에 의해 미러링된 패킷으로부터, 상기 링크처리부에서 상기 식별정보 데이터베이스에 추가한 식별정보에 대응하는 객체의 콘텐츠를 추출하고, 추출된 콘텐츠를 상기 식별정보 데이터베이스에 저장하는 콘텐츠수집부를 포함할 수 있다.An object collection system using packet mirroring according to a second aspect of the present invention includes an identification information database for storing identification information of an object that is a target of a search service and content of an object corresponding to the identification information; A mirroring unit for mirroring packets transmitted and received by a user terminal with a web server; A link processing unit which extracts identification information from the packet mirrored by the mirroring unit, and adds the extracted identification information to the identification information database; And a content collecting unit for extracting content of an object corresponding to identification information added to the identification information database by the link processing unit from the packet mirrored by the mirroring unit, and storing the extracted content in the identification information database. Can be.
위와 같은 구성을 갖는 본 발명의 일실시예에 따르면, 새로운 트래픽의 발생 없이 검색대상에 대응하는 객체의 콘텐츠를 수집할 수 있는 객체 수집 방법 및 시스템을 제공할 수 있다. According to one embodiment of the present invention having the above configuration, it is possible to provide an object collection method and system that can collect the content of the object corresponding to the search target without generating new traffic.
또한 본 발명의 일실시예에 따르면, 불필요한 하드웨어 리소스를 최소화할 수 있다.In addition, according to an embodiment of the present invention, unnecessary hardware resources can be minimized.
그리고 본 발명의 일실시예에 따르면, 검색대상이 되는 객체의 신선도를 최적으로 유지할 수 있다. According to an embodiment of the present invention, the freshness of the object to be searched can be optimally maintained.
나아가 본 발명의 일실시예에 따르면, 스팸을 미리 걸러내어 콘텐츠를 수집할 검색대상의 수를 줄일 수 있다.Furthermore, according to an embodiment of the present invention, it is possible to reduce the number of search targets for collecting content by filtering out spam in advance.
도 1은 본 발명의 일실시예에 따른 패킷미러링을 이용한 객체 수집 시스템이 포함된 망구성도이다.1 is a network diagram including an object collection system using packet mirroring according to an embodiment of the present invention.
도 2는 본 발명의 일실시예에 따른 패킷미러링을 이용한 객체 수집 시스템을 도시한 기능블록도이다.2 is a functional block diagram illustrating an object collection system using packet mirroring according to an embodiment of the present invention.
도 3은 본 발명의 일실시예에 따른 패킷미러링을 이용한 객체 수집 방법을 설명하기 위한 순서도이다. 3 is a flowchart illustrating an object collection method using packet mirroring according to an embodiment of the present invention.
아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.
명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element in between. . In addition, when a part is said to "include" a certain component, which means that it may further include other components, except to exclude other components unless otherwise stated.
이하 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.
도 1은 본 발명의 일실시예에 따른 패킷미러링을 이용한 객체 수집 시스템의 구성도이다.1 is a block diagram of an object collection system using packet mirroring according to an embodiment of the present invention.
네트워크(N)는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN), 부가가치 통신망(Value Added Network; VAN), 개인 근거리 무선통신(Personal Area Network; PAN), 이동 통신망(mobile radio communication network) 또는 위성 통신망 등과 같은 모든 종류의 유무선 네트워크로 구현될 수 있다. The network N may include a local area network (LAN), a wide area network (WAN), a value added network (VAN), a personal local area network (PAN), and a mobile communication network (PAN). It can be implemented in all kinds of wired and wireless networks such as mobile radio communication network or satellite communication network.
사용자단말(100)은 네트워크(N)을 통해 원격지의 웹서버(300)에 접속하는 통신단말로서, 타 단말 및 서버와 연결 가능한 컴퓨터나 휴대용 단말기, 텔레비전 등으로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop) 등을 포함하고, 휴대용 단말기는 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smart Phone) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다. 또한, 텔레비전은 IPTV(Internet Protocol Television), 인터넷 TV(Internet Television), 지상파 TV, 케이블 TV 등을 포함할 수 있다.The user terminal 100 is a communication terminal for connecting to a remote web server 300 through a network N, and may be implemented as a computer, a portable terminal, a television, etc. that can be connected to other terminals and servers. Here, the computer includes, for example, a laptop, desktop, laptop, etc., which is equipped with a web browser, and the portable terminal is, for example, a wireless communication device that ensures portability and mobility. , Personal Communication System (PCS), Global System for Mobile communications (GSM), Personal Digital Cellular (PDC), Personal Handyphone System (PHS), Personal Digital Assistant (PDA), International Mobile Telecommunication (IMT) -2000, Code CDMA Division Multiple Access (2000), including all kinds of handheld based wireless communication devices such as W-Code Division Multiple Access (W-CDMA), Wireless Broadband Internet (WBRO) terminals, smart phones, etc. can do. In addition, the television may include an Internet Protocol Television (IPTV), an Internet Television (Internet Television), a terrestrial TV, a cable TV, or the like.
웹서버(300)는 네트워크(N)를 통해 원격지의 사용자단말(100) 또는 타 서버와 연결 가능한 컴퓨터 시스템으로서, 주로 통신 연결된 상대에 대하여 데이터서비스를 제공하는 역할을 한다. Web server 300 is a computer system that can be connected to the user terminal 100 or another server in the remote place through the network (N), mainly serves to provide a data service to the communication partner.
사용자단말(100)은 웹 브라우저 또는 기타 응용프로그램을 통해 웹서버(300)에 서비스 요청을 송신하게 된다. 일례로, HTTP(Hyper Text Transfer Protocol)를 통한 HTTP Request 패킷이 상기 사용자단말(100)로부터 상기 웹서버(300)로 송신되는 것으로 서비스가 요청되고, 상기 웹서버(300)가 HTTP Request 패킷에 대응하여 HTTP Response 패킷을 상기 사용자단말(100)로 송신하는 것으로 서비스가 제공된다.The user terminal 100 transmits a service request to the web server 300 through a web browser or other application program. For example, a service request is made in which an HTTP request packet through Hyper Text Transfer Protocol (HTTP) is transmitted from the user terminal 100 to the web server 300, and the web server 300 corresponds to the HTTP request packet. Service is provided by transmitting an HTTP Response packet to the user terminal 100.
이 때, 본 발명에 의한 객체 수집 시스템(200)도 네트워크(N)를 통해 원격지의 사용자단말(100) 또는 타 서버와 연결 가능한 컴퓨터 시스템으로서, 상기 사용자단말(100)로부터 수신되는 HTTP Request 패킷에 대응하는 HTTP Response 패킷을 다시 상기 사용자단말(100)로 송신함으로써 서비스를 제공할 수 있다. 이때 객체 수집 시스템(200)은 네트워크 백본망에 연결되는 미러링부(201)를 포함하거나 미러링부(201)와 통신하여 미러링부(201)가 획득하는 데이터를 수신할 수 있다. At this time, the object collection system 200 according to the present invention is also a computer system that can be connected to the remote user terminal 100 or another server through the network (N), the HTTP request packet received from the user terminal 100 The service may be provided by transmitting a corresponding HTTP response packet back to the user terminal 100. In this case, the object collection system 200 may include a mirroring unit 201 connected to the network backbone network or may communicate with the mirroring unit 201 to receive data acquired by the mirroring unit 201.
이하에서는 객체 수집 시스템(200)이 미러링부(201)를 포함하는 것으로 본 발명을 설명하나, 객체 수집 시스템(200)이 반드시 네트워크(N) 백본망에 연결되는 미러링부(201)를 포함하여야 하는 것은 아니고 미러링부(201)가 획득하는 데이터를 수신하는 것만으로도 족하다. Hereinafter, the present invention will be described as the object collection system 200 includes a mirroring unit 201, but the object collection system 200 must include a mirroring unit 201 connected to a network (N) backbone network. It is not necessary to merely receive data acquired by the mirroring unit 201.
이하에서는 객체 수집 시스템(200)의 보다 구체적인 구성을 살펴본다. 도 2는 본 발명의 일실시예에 따른 패킷미러링을 이용한 객체 수집 시스템을 도시한 기능블록도이다.Hereinafter, a more detailed configuration of the object collection system 200 will be described. 2 is a functional block diagram illustrating an object collection system using packet mirroring according to an embodiment of the present invention.
도 2에 도시된 바와 같이 본 발명의 실시예에 의한 객체 수집 시스템(200)은 사용자단말(100)과 네트워크(N)을 연결하는 유무선의 통신선과 연결되는 미러링부(201)를 포함하여 구성된다.As shown in FIG. 2, the object collection system 200 according to the exemplary embodiment of the present invention includes a mirroring unit 201 connected to a wired or wireless communication line connecting the user terminal 100 and the network N. FIG. .
상기 미러링부(201)는 상기 네트워크(N)에 연결된 웹서버(300)와 상기 사용자단말(100) 사이에 송수신되는 패킷데이터를 미러링 또는 스니핑할 수 있다. 실제로, 상기 객체 수집 시스템(200)의 상기 미러링부(201)는 네트워크 백본망에 설치되는 탭(TAP)장비 또는 스위치미러(Switch Mirror) 등으로 구현될 수도 있고, 상기 사용자단말(100)의 네트워크 접점이 되는 기지국(Access Point)에 설치된 하드웨어 또는 소프트웨어의 모듈일 수도 있다. 이때 상기 미러링부(201)가 미러링 또는 스니핑하는 패킷은 사용자단말(100)이 웹서버(300)로 전송하는 HTTP Request 패킷일 수 있고, HTTP Response 패킷일 수도 있다. 특히 후술할 콘텐츠수집부(204)에서 콘텐츠를 수집하기 위하여 상기 미러링부(201)는 HTTP Response 패킷을 미러링하며, HTTP Request 패킷과 HTTP Response 패킷을 모두 미러링할 수 있다. The mirroring unit 201 may mirror or sniff packet data transmitted and received between the web server 300 connected to the network N and the user terminal 100. In practice, the mirroring unit 201 of the object collection system 200 may be implemented as a tap (TAP) device or a switch mirror installed in a network backbone network, or the network of the user terminal 100. It may also be a module of hardware or software installed in an access point serving as a contact point. In this case, the packet mirrored or sniffed by the mirroring unit 201 may be an HTTP request packet transmitted from the user terminal 100 to the web server 300 or may be an HTTP response packet. In particular, in order to collect the content in the content collecting unit 204 to be described later, the mirroring unit 201 may mirror the HTTP response packet and may mirror both the HTTP request packet and the HTTP response packet.
그리고 본 발명의 실시예에 의한 객체 수집 시스템(200)은 링크처리부(202)를 포함할 수 있다. 상기 링크처리부(202)는 상기 미러링부(201)에 의해 미러링된 패킷을 처리하여 후술할 식별정보 데이터베이스(208)에 검색대상 식별정보를 추가하는 역할을 한다.And the object collection system 200 according to an embodiment of the present invention may include a link processing unit 202. The link processing unit 202 processes the packets mirrored by the mirroring unit 201 and adds search target identification information to the identification information database 208 to be described later.
여기서 링크처리부(202)가 미러링된 패킷으로부터 획득하는 검색대상 식별정보는, 검색대상이 되는 객체를 고유하게 식별할 수 있도록 하는 정보로서, 일례로 검색대상 객체의 주소(URL; Uniform Resource Locator)가 될 수 있다. 또한 이때 검색대상이 되는 객체는, 웹페이지, 이미지, 동영상, 문서 등 인터넷 상에서 존재하는 모든 형태의 데이터가 될 수 있다.The search object identification information obtained by the link processing unit 202 from the mirrored packet is information for uniquely identifying the object to be searched. For example, an address (URL; Uniform Resource Locator) of the search object is Can be. In addition, the object to be searched may be any type of data existing on the Internet such as a web page, an image, a video, and a document.
이때 상기 링크처리부(202)는 식별정보 데이터베이스(208)에 검색대상의 식별정보를 구조적인 형태로 저장한다. 여기서 상기 식별정보 데이터베이스(208)는 미러링된 패킷으로부터 획득된, 검색대상이 되는 객체 각각을 지시하는 검색대상 식별정보를 저장하고 관리하는 역할을 한다. 또한 상기 식별정보 데이터베이스(208)는 각각의 검색대상 식별정보에 대한 부가정보를 함께 저장하고 관리하는데, 상기 부가정보는 상기 식별정보, 상기 식별정보가 포함되어 있던 패킷 또는 상기 식별정보가 포함되어 있던 패킷에 대응하는 다른 패킷, 또는 상기 식별정보에 대응하는 객체 중 적어도 하나에 포함된 정보나 콘텐츠 등을 포함할 수 있다. 상기 식별정보 데이터베이스(208)는 데이터를 구조적으로 저장하기 위한 통상의 데이터베이스시스템을 포함하여 물리적 또는 소프트웨어적 모듈로써 구성될 수 있다. At this time, the link processing unit 202 stores the identification information of the search target in the identification information database 208 in a structural form. Here, the identification information database 208 stores and manages search object identification information indicating each object to be searched, which is obtained from the mirrored packet. In addition, the identification information database 208 stores and manages additional information on each search target identification information, wherein the additional information includes the identification information, a packet including the identification information, or the identification information. Information or content included in at least one of another packet corresponding to the packet or an object corresponding to the identification information may be included. The identification database 208 may be configured as a physical or software module including a conventional database system for structurally storing data.
특히 상기 링크처리부(202)는 상기 미러링부(201)에서 미러링한 패킷에 포함된 식별정보를 추출하여 상기 식별정보 데이터베이스(208)에 기록하기 위하여 상기 미러링부(201)에서 미러링된 HTTP Request 패킷을 이용할 수 있다. 이때 상기 링크처리부(202)는 HTTP Request 패킷에 포함된 식별정보뿐 아니라, 패킷이 미러링된 시간정보, 패킷의 계층화된 프로토콜의 헤더에 포함된 정보(예를 들어, 소스주소정보(Source IP 주소) 또는 목적지주소정보(Destination IP 주소) 등)를 식별정보에 연관하여 함께 저장할 수 있다. 이때 계층 프로토콜의 헤더에 포함된 정보는 추후 후술할 페어링부(205)에서 사용할 수 있다. In particular, the link processing unit 202 extracts the identification information included in the packet mirrored by the mirroring unit 201 and writes the HTTP request packet mirrored by the mirroring unit 201 to record the identification information database 208. It is available. In this case, the link processing unit 202 may include not only the identification information included in the HTTP request packet, but also information included in the header of the layered protocol of the packet, including time information on which the packet is mirrored (for example, source address information (Source IP address)). Alternatively, destination address information (Destination IP address) may be stored together with the identification information. In this case, the information included in the header of the layer protocol may be used by the pairing unit 205 to be described later.
또한 상기 객체 수집 시스템(200)은 스팸처리부(203)를 포함할 수 있다. 상기 스팸처리부(203)는 상기 링크처리부(202)에서 상기 식별정보 데이터베이스(208)에 등록한 식별정보 중 적어도 일부를 스팸으로 판단할 수 있다. 예를 들어, 상기 스팸처리부(203)는 상기 식별정보 데이터베이스(208)에 등록한 식별정보에 대응하는 HTTP Response 패킷에 포함된 상태정보(HTTP 상태코드)가 “200”인 경우에는 정상처리하고, “400”인 경우 스팸으로 구분할 수 있다. 또는 상기 스팸처리부(203)는 상기 식별정보가 포함되어 있던 HTTP Request 패킷에 포함된 레퍼러(Referer)를 참조하여 레퍼러가 미리 등록된 호스트가 아니면, 해당 식별정보를 스팸으로 구분할 수도 있다. In addition, the object collection system 200 may include a spam processing unit 203. The spam processing unit 203 may determine at least some of the identification information registered in the identification information database 208 by the link processing unit 202 as spam. For example, when the status information (HTTP status code) included in the HTTP response packet corresponding to the identification information registered in the identification information database 208 is "200", the spam processing unit 203 performs normal processing. 400 ”can be classified as spam. Alternatively, the spam processing unit 203 may refer to the referrer included in the HTTP request packet including the identification information, and may classify the identification information as spam if the referrer is not a pre-registered host.
그리고 상기 객체 수집 시스템(200)은 콘텐츠수집부(204)를 포함할 수 있다. 상기 콘텐츠수집부(204)는 상기 식별정보에 의해 구분되는 각각의 검색대상 객체에 포함된 콘텐츠를 수집하는 역할을 한다. 상기 콘텐츠수집부(204)는 상기 사용자단말(100)의 요청에 의하여 상기 웹서버(300)에서 상기 사용자단말(100)로 송신되는 HTTP Response 패킷을 미러링하여 콘텐츠를 수집할 수 있다. 즉, 상기 웹서버(300)에서 상기 사용자단말(100)로 송신되는 HTTP Response 패킷의 바디에 포함된 콘텐츠를 수집할 수 있다. The object collection system 200 may include a content collector 204. The content collection unit 204 collects content included in each search target object classified by the identification information. The content collecting unit 204 may collect content by mirroring an HTTP response packet transmitted from the web server 300 to the user terminal 100 at the request of the user terminal 100. That is, the web server 300 may collect the content included in the body of the HTTP response packet transmitted to the user terminal 100.
이때 상기 콘텐츠수집부(204)는 상기 스팸처리부(203)에서 스팸으로 판단한 식별정보에 대응하는 콘텐츠의 수집은 수행하지 않을 수 있다. 특히 상기 콘텐츠수집부(204)는 복수의 패킷으로 나뉘어 송신되는 특정 객체의 콘텐츠를 수집하기 위하여 하나의 식별정보에 대응하는 복수의 패킷을 미러링한 후, 패킷의 바디에 포함된 콘텐츠들을 취합할 수 있다. In this case, the content collecting unit 204 may not collect the content corresponding to the identification information determined by the spam processing unit 203 as spam. In particular, the content collector 204 mirrors a plurality of packets corresponding to one piece of identification information in order to collect the contents of a specific object that is divided into a plurality of packets and then collects the contents included in the body of the packet. have.
또한 상기 콘텐츠수집부(204)는 하나의 식별정보에 대하여 이미 콘텐츠가 수집된 경우, 일정 기간 내에는 콘텐츠를 수집하지 않을 수도 있다. 또는 하나의 식별정보에 대하여 마지막으로 콘텐츠가 수집된 시간으로부터 일정시간이 이미 경과하였으나, 해당 식별정보에 대응하는 콘텐츠를 포함하는 패킷이 미러링되지 않으면, 직접 해당 식별정보에 대응하는 객체에 대한HTTP Request 패킷을 발송하여 객체에 포함된 콘텐츠를 크롤링하고 해당 식별정보에 연관하여 저장되는 콘텐츠를 크롤링된 콘텐츠로 갱신할 수도 있다. In addition, the content collection unit 204 may not collect the content within a certain period of time, if the content has already been collected for one identification information. Alternatively, if a predetermined time has elapsed since the last time content was collected for one piece of identification information, but a packet including the content corresponding to the identification information is not mirrored, an HTTP request for an object corresponding to the identification information is directly performed. By sending a packet, the content contained in the object may be crawled, and the content stored in association with the identification information may be updated with the crawled content.
특히 상기 콘텐츠수집부(204)는 미리 저장된 호스트별 접근허용정책(예를 들어, robots.txt)에 따라 접근이 허용된 호스트의 콘텐츠만을 수집할 수 있다. 상기 콘텐츠수집부(204)는 호스트별 접근허용정책에 의하여 접근이 불허된 객체의 콘텐츠는 상기 미러링부(201)에서 미러링하였더라도 해당 콘텐츠를 폐기하고, 접근허용정책에 의하여 접근이 허용된 객체의 콘텐츠만 수집할 수 있다. 물론 이때 접근허용정책에 따라 접근이 불허된 콘텐츠에 대응하는 식별정보는 상기 스팸처리부(203)에서 스팸으로 미리 등록할 수도 있다. In particular, the content collection unit 204 may collect only the content of the host allowed to access according to a pre-stored access permission policy (for example, robots.txt). The content collection unit 204 discards the content of the object that is not accessible by the access permission policy for each host, even if the mirroring unit 201 mirrors the content of the object, and accesses the content of the object allowed by the access permission policy. Can only collect. Of course, at this time, the identification information corresponding to the content that is not accessible according to the access permission policy may be pre-registered as spam in the spam processing unit 203.
한편 상기 객체 수집 시스템(200)은 페어링부(205)를 포함할 수 있다. 상기 페어링부(205)는 식별정보가 HTTP Request 패킷으로부터 추출되고, 콘텐츠가 HTTP Response패킷으로부터 수집되는 경우, 추출된 식별정보와 수집된 콘텐츠의 대응관계를 확인한다. 즉, 특정 식별정보와, 그에 대응하는 객체의 콘텐츠는 서로 매칭되어야 하는데, 식별정보와 콘텐츠를 서로 다른 패킷에서 수집하는 경우 이들의 대응관계를 확인하는 절차가 필요할 수 있다. 따라서 상기 페어링부(205)는 식별정보와 콘텐츠의 대응관계를 확인하되, 이들의 대응관계를 패킷 사이의 대응관계로서 확인할 수 있다. Meanwhile, the object collection system 200 may include a pairing unit 205. When the identification information is extracted from the HTTP request packet and the content is collected from the HTTP response packet, the pairing unit 205 confirms a correspondence relationship between the extracted identification information and the collected content. That is, the specific identification information and the content of the corresponding object should be matched with each other. When collecting the identification information and the content in different packets, a procedure for checking their correspondence may be necessary. Therefore, the pairing unit 205 confirms the correspondence between the identification information and the content, and can confirm the correspondence between the packets as the correspondence between the packets.
예를 들어 상기 페어링부(205)는 식별정보가 추출되는 HTTP Request 패킷과, 콘텐츠가 수집되는 HTTP Response 패킷이 서로 대응관계에 있는지 확인할 수 있다. 즉 콘텐츠가 수집된 HTTP Response 패킷이 식별정보가 추출된 HTTP Request 패킷에 대한 응답인지 여부를 확인할 수 있다. 이때 상기 페어링부(205)는 패킷에 포함된 프로토콜 헤더를 비교함으로써 두 패킷 사이의 대응관계를 확인할 수 있다. 특히 이때 패킷은 서로 다른 둘 이상의 계층으로 계층화된 복수의 프로토콜에 따라 송수신될 수 있고, 따라서 둘 이상의 계층 프로토콜의 헤더를 포함할 수 있다. 그에 따라 상기 페어링부(205)는 두 개의 패킷에 포함된 동일한 계층 프로토콜 헤더에 포함된 정보를 서로 비교하여, 두 패킷이 서로 대응관계에 있는지 확인할 수 있다. For example, the pairing unit 205 may check whether the HTTP request packet from which the identification information is extracted and the HTTP response packet from which the content is collected correspond to each other. That is, it may be checked whether the HTTP response packet in which the content is collected is a response to the HTTP request packet from which identification information is extracted. At this time, the pairing unit 205 may check the correspondence between the two packets by comparing the protocol header included in the packet. In particular, the packet may be transmitted and received according to a plurality of protocols layered in two or more different layers, and thus may include headers of two or more layer protocols. Accordingly, the pairing unit 205 may compare the information included in the same layer protocol header included in the two packets with each other, and determine whether the two packets correspond to each other.
상기 페어링부(205)는 HTTP Request 패킷과 HTTP Response 패킷에 각각 포함된 TCP/IP 프로토콜 헤더에 포함된 소스 주소정보(Source IP 주소)와 목적지 주소정보(Destination IP 주소)를 서로 비교하여 서로 대응관계에 있으면 두 패킷이 대응관계에 있다고 판단할 수 있으며, 그에 따라 두 패킷 각각에서 추출된 식별정보와 콘텐츠가 대응한다고 판단할 수 있다. 물론 상기 페어링부(205)는 두 패킷이 송수신되는 시간에 대한 정보를 대응관계를 판단할 때 함께 참조할 수 있다. The pairing unit 205 compares the source address information (Source IP address) and the destination address information (Destination IP address) included in the TCP / IP protocol header included in the HTTP request packet and the HTTP response packet, respectively, and correspond to each other. If, the two packets may be determined to correspond to each other, and thus identification information and contents extracted from each of the two packets may be determined to correspond. Of course, the pairing unit 205 may refer to the information on the time when the two packets are transmitted and received together when determining the correspondence relationship.
물론 본 발명의 실시예에서, 식별정보와 콘텐츠가 서로 동일한 패킷 내에서 추출되는 경우에는 페어링부(205)가 불필요할 수 있다. Of course, in the embodiment of the present invention, the pairing unit 205 may be unnecessary when the identification information and the content are extracted in the same packet.
그리고 상기 객체 수집 시스템(200)은 상기 콘텐츠수집부(204)에서 수집한 객체에 포함된 콘텐츠에 기초하여 각각의 식별정보를 인덱싱하는 인덱싱부(206)를 포함할 수 있다. 예를 들어 인덱싱부(206)는 수집된 콘텐츠 내에 포함된 텍스트나 수집된 콘텐츠 내에 포함된 콘텐츠 명칭 등의 문자열을 참조하여 각각의 식별정보에 색인함으로써, 각각의 식별정보에 대응하는 객체를 검색대상으로 편입한다. The object collection system 200 may include an indexing unit 206 for indexing each piece of identification information based on the contents included in the object collected by the content collecting unit 204. For example, the indexing unit 206 indexes each identification information by referring to a string such as text included in the collected content or a content name included in the collected content, thereby searching an object corresponding to each identification information. Incorporate
또한 상기 객체 수집 시스템(200)은 검색부(207)를 포함한다. 상기 검색부(207)는 상기 사용자단말(100)로부터 검색어를 포함하는 검색 요청을 수신하면, 수신된 검색어에 대응하는 문자열이 색인된 객체의 식별정보들을 검색한다. 또한 검색부(207)는 검색된 식별정보들을 정렬하여 상기 사용자단말(100)로 제공할 수 있다. In addition, the object collection system 200 includes a search unit 207. When the search unit 207 receives a search request including a search word from the user terminal 100, the search unit 207 searches for identification information of an object in which a string corresponding to the received search word is indexed. In addition, the search unit 207 may arrange the searched identification information and provide the searched identification information to the user terminal 100.
상기 객체 수집 시스템(200)에 포함되는 상기 미러링부(201), 상기 링크처리부(202), 상기 스팸처리부(203), 상기 콘텐츠수집부(204), 상기 페어링부(205), 상기 인덱싱부(206), 상기 검색부(207)는 각각 하나의 물리적 시스템 내에서 서로 기능이 구분된 소프트웨어 모듈로서 구성될 수도 있고, 적어도 하나 이상의 소프트웨어 모듈 또는 적어도 하나 이상의 하드웨어 모듈의 조합으로써 구성될 수도 있다.The mirroring unit 201, the link processing unit 202, the spam processing unit 203, the content collecting unit 204, the pairing unit 205, and the indexing unit included in the object collection system 200 ( 206, the search unit 207 may be configured as a software module in which functions are separated from each other in one physical system, or may be configured as at least one software module or a combination of at least one hardware module.
이하에서는 상기 객체 수집 시스템(200)을 이용한 객체 수집 방법을 설명한다. Hereinafter, an object collection method using the object collection system 200 will be described.
도 3및 도 4는 본 발명의 일실시예에 따른 패킷미러링을 이용한 객체 수집 방법을 설명하기 위한 순서도이다.3 and 4 are flowcharts illustrating an object collection method using packet mirroring according to an embodiment of the present invention.
도 3에 도시된 바와 같이 본 발명의 실시예에 의한 객체 수집 방법은, 사용자단말(100)이 웹서버(300)와 송수신하는 패킷을 미러링하는 단계로 시작된다(S11). 상기 미러링된 패킷은 상기 사용자단말(100)이 웹서버(300)에 대하여 특정한 객체를 전송하여줄 것을 요청하는 객체요청패킷과, 상기 웹서버(300)가 상기 사용자단말(100)로 전송하는 객체응답패킷일 수 있다. 특히 상기 객체요청패킷이 HTTP Request 패킷인 경우에는 그 패킷의 헤더에 상기 요청된 객체의 식별정보로서 URL이 포함될 수 있다. 또한 상기 객체응답패킷이 HTTP Response 패킷인 경우, 그 패킷의 바디에 상기 요청된 객체의 콘텐츠가 포함될 수 있다. As shown in FIG. 3, the object collection method according to the embodiment of the present invention begins with mirroring a packet transmitted and received by the user terminal 100 with the web server 300 (S11). The mirrored packet includes an object request packet requesting the user terminal 100 to transmit a specific object to the web server 300, and an object transmitted by the web server 300 to the user terminal 100. It may be a response packet. In particular, when the object request packet is an HTTP Request packet, a URL of the packet may be included as identification information of the requested object. In addition, when the object response packet is an HTTP response packet, the body of the packet may include the content of the requested object.
그리고 본 발명의 실시예에 의한 객체 수집 방법은, 이와 같이 미러링된 패킷에 포함된 식별정보를 추출하여 수집하는 단계(S12)를 수행한다. 여기서 식별정보는 상술한 바와 같이 미러링된 패킷의 헤더에 포함된 URL일 수 있다.The object collecting method according to the embodiment of the present invention performs the step of extracting and collecting identification information included in the mirrored packet (S12). The identification information may be a URL included in the header of the mirrored packet as described above.
또한 본 발명의 실시예에 의한 객체 수집 방법은 상기 객체 수집 시스템(200)이 미러링된 패킷에 포함된 콘텐츠를 수집하는 단계(S13)를 포함한다. 여기서 S13단계는, 예를 들어, 식별정보와 콘텐츠가 하나의 패킷에 포함된 경우라면 식별정보가 추출된 패킷 내에서 콘텐츠를 수집함으로써 이루어지나, 예를 들어, 식별정보와 콘텐츠가 서로 다른 패킷에 포함된 경우라면, 식별정보가 추출된 패킷과, 콘텐츠를 수집할 패킷을 먼저 페어링한 후, 서로 대응관계에 있는 것으로 확인되는 콘텐츠를 수집할 수 있다. In addition, the object collection method according to an embodiment of the present invention includes the step (S13) for collecting the content contained in the mirrored packet in the object collection system 200. Here, the step S13 is, for example, if the identification information and the content is included in one packet, by collecting the content in the packet from which the identification information is extracted, for example, the identification information and the content in different packets If included, the packet from which the identification information is extracted and the packet for collecting the content are first paired, and then the content identified as having a corresponding relationship can be collected.
나아가 본 발명의 실시예에 의한 객체 수집 방법은, S13 단계에서 수집된 콘텐츠를 S12단계에서 수집된 식별정보에 연관하여 저장하는 단계 (S14)를 포함한다. Furthermore, the object collecting method according to the embodiment of the present invention includes storing the content collected in step S13 in association with the identification information collected in step S12 (S14).
이때 이미 설명한 바와 같이 S13단계에서 수집된 콘텐츠는 식별정보와의 대응관계를 확인하는 과정을 거쳐, 대응관계에 있는 식별정보에 연관하여 저장될 수 있다. In this case, as described above, the content collected in step S13 may be stored in association with the identification information in the corresponding relationship through a process of confirming the correspondence with the identification information.
이와 같이 식별정보에 연관하여 저장된 콘텐츠는 식별정보에 색인어를 인덱싱하기 위해 사용될 수 있다. As such, the content stored in association with the identification information may be used to index the index word in the identification information.
한편, 상기 본 발명의 실시예에 의한 객체 수집 시스템(200)에서 설명한 바와 같이 식별정보에 대응하는 객체의 콘텐츠를 수집하는 단계(S13)는 하나의 식별정보에 대하여 복수회 이루어질 수 있다. 동일한 식별정보에 대응하는 객체에 대해서 이미 콘텐츠를 수집하고, 수집된 콘텐츠를 연관하여 저장하였더라도, 미리 설정된 시간이 경과하면, 다시 해당 식별정보에 대응하는 콘텐츠를 수집하는 과정을 반복할 수 있다. 물론 특정 식별정보가 추출될 때마다 그에 대응하는 콘텐츠를 함께 수집하여 콘텐츠의 신선도를 최상으로 유지되도록 할 수도 있다. 그러나 미리 시간을 설정하고, 해당 시간 내에는 동일한 식별정보에 대해서 콘텐츠를 수집하는 과정이 반복되지 않도록 할 수 있다. On the other hand, as described in the object collection system 200 according to the embodiment of the present invention, the step (S13) of collecting the content of the object corresponding to the identification information may be made a plurality of times for one identification information. Even if the content corresponding to the object corresponding to the same identification information has already been collected and the collected content is stored in association with each other, if a predetermined time elapses, the process of collecting the content corresponding to the identification information may be repeated. Of course, whenever specific identification information is extracted, the corresponding content may be collected together to maintain the freshness of the content. However, a time may be set in advance, and a process of collecting content for the same identification information may not be repeated within the time.
한편 또한 특정 식별정보에 대하여 콘텐츠를 수집한 때로부터 미리 설정된 시간이 경과하고, 나아가 상술한 미리 설정된 시간보다 길게 설정되는 한계시간이 경과하였음에도 불구하고, 해당 식별정보가 미러링되지 않으면, 상기 객체 수집 시스템(200)은 직접 상기 웹서버(300)로 해당 식별정보에 대한 객체를 요청하는 객체요청패킷을 발송하고, 그에 대한 응답으로부터 콘텐츠를 수집할 수도 있다. 이는 식별정보에 대응하는 콘텐츠의 신선도를 유지하여 보다 정확한 검색 결과를 제공하기 위함이다. On the other hand, even if a predetermined time has elapsed since the collection of content for a specific identification information and a limit time that is set longer than the above-described predetermined time has elapsed, the object collection system is not mirrored. 200 may directly send an object request packet requesting an object for the identification information to the web server 300, and collect the content from the response. This is to provide more accurate search results by maintaining the freshness of the content corresponding to the identification information.
도 3을 통해 설명된 실시예에 따른 패킷미러링을 이용한 객체 수집 방법은 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체 및 통신 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈, 또는 반송파와 같은 변조된 데이터 신호의 기타 데이터, 또는 기타 전송 메커니즘을 포함하며, 임의의 정보 전달 매체를 포함한다. The object collecting method using packet mirroring according to the embodiment described with reference to FIG. 3 may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, computer readable media may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transmission mechanism, and includes any information delivery media.
전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The foregoing description of the present invention is intended for illustration, and it will be understood by those skilled in the art that the present invention may be easily modified in other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.
본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is shown by the following claims rather than the above description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present invention. do.

Claims (16)

  1. 검색시스템이 검색대상이 되는 객체에 포함된 콘텐츠를 수집하는 방법에 있어서,In the search system collects the content contained in the object to be searched,
    사용자단말이 웹서버와 송수신하는 패킷을 미러링하는 미러링단계;A mirroring step of mirroring packets transmitted and received by a user terminal with a web server;
    상기 미러링단계에서 미러링된 패킷으로부터 검색대상이 되는 객체의 식별정보를 추출하는 식별정보추출단계; An identification information extraction step of extracting identification information of an object to be searched from the packet mirrored in the mirroring step;
    상기 미러링단계에서 미러링된 패킷으로부터 검색대상이 되는 객체의 콘텐츠를 수집하는 콘텐츠수집단계; 및A content collection step of collecting content of an object to be searched from the packet mirrored in the mirroring step; And
    상기 콘텐츠수집단계에서 수집된 콘텐츠를 상기 콘텐츠에 대응하고 상기 식별정보추출단계에서 추출된 식별정보에 연관하여 저장하는 저장단계를 포함하는 패킷미러링을 이용한 객체 수집 방법.And a storage step of storing the content collected in the content collection step in association with the content and in association with the identification information extracted in the identification information extraction step.
  2. 제1항에 있어서,The method of claim 1,
    상기 미러링단계에서 미러링하는 패킷은, 사용자단말로부터 웹서버로 송신되는 객체요청패킷 및 상기 객체요청패킷에 응답하여 상기 웹서버로부터 상기 사용자단말로 송신되는 객체응답패킷 중 적어도 하나인, 패킷미러링을 이용한 객체 수집 방법.The packet mirrored in the mirroring step is at least one of an object request packet transmitted from a user terminal to a web server and an object response packet transmitted from the web server to the user terminal in response to the object request packet. How to collect objects.
  3. 제1항에 있어서,The method of claim 1,
    상기 식별정보추출단계는,The identification information extraction step,
    상기 미러링단계에서 미러링된, 사용자단말로부터 웹서버로 송신되는 객체요청패킷으로부터 상기 식별정보를 추출하는 단계를 포함하고,  Extracting the identification information from the object request packet transmitted from the user terminal to the web server, which is mirrored in the mirroring step,
    상기 콘텐츠수집단계는,The content collection step,
    상기 미러링단계에서 미러링된 상기 객체요청패킷에 응답하여 상기 웹서버로부터 상기 사용자단말로 송신되는 객체응답패킷으로부터 상기 콘텐츠를 수집하는 단계를 포함하는, 패킷미러링을 이용한 객체 수집 방법.And collecting the content from an object response packet transmitted from the web server to the user terminal in response to the object request packet mirrored in the mirroring step.
  4. 제3항에 있어서,The method of claim 3,
    상기 객체 수집 방법은,The object collection method,
    상기 식별정보추출단계에서 추출된 상기 식별정보와, 상기 콘텐츠수집단계에서 수집된 상기 콘텐츠의 대응관계를 확인하는 페어링단계를 더 포함하는, 패킷미러링을 이용한 객체 수집 방법. And a pairing step of confirming a correspondence relationship between the identification information extracted in the identification information extraction step and the content collected in the content collection step.
  5. 제4항에 있어서,The method of claim 4, wherein
    상기 페어링단계는,The pairing step,
    상기 객체요청패킷과 상기 객체응답패킷에 각각 포함된 소스주소정보 및 목적지주소정보의 대응관계를 확인하는 단계를 포함하는, 패킷미러링을 이용한 객체 수집 방법. And checking the correspondence between the source request information and the destination address information included in the object request packet and the object response packet, respectively.
  6. 제4항에 있어서,The method of claim 4, wherein
    상기 페어링단계는, The pairing step,
    상기 객체요청패킷과 상기 객체응답패킷 각각에 포함된 하나 이상의 동일한 계층 프로토콜 헤더에 포함된 정보를 서로 비교하여 이루어지는, 패킷미러링을 이용한 객체 수집 방법. And comparing the information contained in one or more of the same layer protocol headers included in each of the object request packet and the object response packet with each other.
  7. 제1항에 있어서,The method of claim 1,
    상기 객체 수집 방법은, The object collection method,
    상기 저장단계에서 저장된 상기 식별정보와 상기 콘텐츠에 대하여, 상기 콘텐츠를 재수집하여 상기 식별정보에 연관된 상기 콘텐츠를 갱신하는 갱신단계를 더 포함하는, 패킷미러링을 이용한 객체 수집 방법. And updating the content associated with the identification information by re-collecting the content with respect to the identification information and the content stored in the storing step.
  8. 제7항에 있어서, The method of claim 7, wherein
    상기 갱신단계는,The update step,
    상기 미러링단계에서 미러링된 패킷에, 상기 콘텐츠의 수집이 이미 완료된 상기 식별정보에 대한 콘텐츠가 포함된 경우, 상기 식별정보에 연관하여 저장된 상기 콘텐츠를 상기 미러링단계에서 미러링된 패킷에 포함된 콘텐츠로 변경하는 단계를 포함하는, 패킷미러링을 이용한 객체 수집 방법. If the packet mirrored in the mirroring step includes content for the identification information that has already been collected, the content stored in association with the identification information is changed to content included in the mirrored packet in the mirroring step. Comprising a step of collecting the object using the packet mirroring.
  9. 제3항에 있어서,The method of claim 3,
    상기 콘텐츠수집단계는,The content collection step,
    상기 객체응답패킷에 포함된 객체 상태정보를 기초로 상기 객체응답패킷에 포함된 콘텐츠 수집 여부를 결정하는 단계를 포함하는, 패킷미러링을 이용한 객체 수집 방법.And determining whether to collect content included in the object response packet based on object state information included in the object response packet.
  10. 제1항에 있어서,The method of claim 1,
    상기 콘텐츠수집단계는, The content collection step,
    호스트별로 미리 등록된 접근허용정책을 참조하여 상기 미러링부에서 미러링된 패킷에 포함된 콘텐츠 수집 여부를 결정하는 단계를 포함하는, 패킷미러링을 이용한 객체 수집 방법. And determining whether to collect content included in the mirrored packet by the mirroring unit by referring to a pre-registered access permission policy for each host.
  11. 제1항에 있어서,The method of claim 1,
    상기 객체 수집 방법은, The object collection method,
    상기 저장단계에서 상기 식별정보에 연관하여 저장된 상기 콘텐츠에 기초하여, 상기 식별정보에 문자열을 인덱싱하는 인덱싱단계를 더 포함하는, 패킷미러링을 이용한 객체 수집 방법. And indexing a character string in the identification information based on the content stored in association with the identification information in the storing step.
  12. 제11항에 있어서,The method of claim 11,
    상기 객체 수집 방법은,The object collection method,
    사용자단말로부터 검색어를 포함하는 검색 요청이 수신되면, 상기 인덱싱단계에서 상기 식별정보에 인덱싱된 문자열과 검색어를 비교하여 사용자단말로 제공할 식별정보를 검색하는 검색단계를 더 포함하는, 패킷미러링을 이용한 객체 수집 방법. When a search request including a search word is received from the user terminal, the indexing step further comprises a search step of searching for the identification information to be provided to the user terminal by comparing the search string with the string indexed to the identification information, using packet mirroring How to collect objects.
  13. 검색서비스의 대상이 되는 객체의 식별정보 및 식별정보에 대응하는 객체의 콘텐츠를 저장하는 식별정보 데이터베이스;An identification information database for storing identification information of an object which is a target of a search service and contents of an object corresponding to the identification information;
    사용자단말이 웹서버와 송수신하는 패킷을 미러링하는 미러링부; A mirroring unit for mirroring packets transmitted and received by a user terminal with a web server;
    상기 미러링부에 의해 미러링된 패킷으로부터 식별정보를 추출하고, 추출된 식별정보를 상기 식별정보 데이터베이스에 추가하는 링크처리부; 및 A link processing unit for extracting identification information from the packet mirrored by the mirroring unit and adding the extracted identification information to the identification information database; And
    상기 미러링부에 의해 미러링된 패킷으로부터, 상기 링크처리부에서 상기 식별정보 데이터베이스에 추가한 식별정보에 대응하는 객체의 콘텐츠를 추출하고, 추출된 콘텐츠를 상기 식별정보 데이터베이스에 저장하는 콘텐츠수집부를 포함하는, 패킷미러링을 이용한 객체 수집 시스템.And a content collecting unit for extracting content of an object corresponding to identification information added to the identification information database by the link processing unit from the packet mirrored by the mirroring unit, and storing the extracted content in the identification information database. Object Collection System using Packet Mirroring.
  14. 제13항에 있어서,The method of claim 13,
    상기 링크처리부는,The link processing unit,
    상기 미러링부에서 미러링된, 사용자단말로부터 웹서버로 송신되는 객체요청패킷으로부터 식별정보를 추출하고,Extracting identification information from the object request packet transmitted from the user terminal to the web server, which is mirrored by the mirroring unit,
    상기 콘텐츠수집부는,The content collecting unit,
    상기 미러링부에서 미러링된, 상기 객체요청패킷에 응답하여 상기 웹서버로부터 상기 사용자단말로 송신되는 객체응답패킷으로부터 콘텐츠를 수집하는, 패킷미러링을 이용한 객체 수집 시스템. And collecting content from an object response packet transmitted from the web server to the user terminal in response to the object request packet mirrored by the mirroring unit.
  15. 제14항에 있어서,The method of claim 14,
    상기 객체 수집 시스템은,The object collection system,
    상기 링크처리부에서 추출된 식별정보와, 상기 콘텐츠수집부에서 수집된 콘텐츠의 대응관계를 확인하는 페어링부를 더 포함하고,It further comprises a pairing unit for confirming the correspondence between the identification information extracted by the link processing unit and the content collected by the content collection unit,
    상기 콘텐츠수집부는, The content collecting unit,
    상기 페어링부에서 확인한 대응관계에 기초하여, 수집된 콘텐츠를 수집된 콘텐츠와 대응하는 식별정보에 연관하여 저장하는, 패킷미러링을 이용한 객체 수집 시스템.And storing the collected content in association with the collected information and the identification information corresponding to the collected content, based on the correspondence confirmed by the pairing unit.
  16. 제13항에 있어서,The method of claim 13,
    상기 객체 수집 시스템은,The object collection system,
    상기 링크처리부에서 추출된 식별정보 중 일부를 스팸으로 결정하는 스팸처리부를 더 포함하고,Further comprising a spam processing unit for determining a part of the identification information extracted by the link processing unit as spam,
    상기 콘텐츠수집부는,The content collecting unit,
    상기 스팸처리부의 결정에 따라 스팸으로 결정된 식별정보에 대응하는 콘텐츠를 수집하지 않는, 패킷미러링을 이용한 객체 수집 시스템.The object collection system using packet mirroring does not collect the content corresponding to the identification information determined to be spam according to the spam processing unit.
PCT/KR2013/003477 2012-04-23 2013-04-23 Method and system for collecting objects by using packet mirroring WO2013162264A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR20120042326 2012-04-23
KR10-2012-0042326 2012-04-23
KR1020130045071A KR101471515B1 (en) 2012-04-23 2013-04-23 System and method for collecting searching objects using packet sniffing
KR10-2013-0045071 2013-04-23

Publications (1)

Publication Number Publication Date
WO2013162264A1 true WO2013162264A1 (en) 2013-10-31

Family

ID=49483486

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2013/003477 WO2013162264A1 (en) 2012-04-23 2013-04-23 Method and system for collecting objects by using packet mirroring

Country Status (1)

Country Link
WO (1) WO2013162264A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090106203A1 (en) * 2007-10-18 2009-04-23 Zhongmin Shi Method and apparatus for a web search engine generating summary-style search results
KR20090103252A (en) * 2008-03-28 2009-10-01 상원씨엔티 (주) Server System And Method For Forming Dynamic Interface, And Method For Proving Search Service With Dynamic Interface
KR20100037401A (en) * 2008-10-01 2010-04-09 엔에이치엔(주) Method and apparatus for managing search database
KR20100127456A (en) * 2009-05-26 2010-12-06 (주)필링크 A system for intercepting lewd contents in network and the driving method
US20110252478A1 (en) * 2006-07-10 2011-10-13 Websense, Inc. System and method of analyzing web content

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110252478A1 (en) * 2006-07-10 2011-10-13 Websense, Inc. System and method of analyzing web content
US20090106203A1 (en) * 2007-10-18 2009-04-23 Zhongmin Shi Method and apparatus for a web search engine generating summary-style search results
KR20090103252A (en) * 2008-03-28 2009-10-01 상원씨엔티 (주) Server System And Method For Forming Dynamic Interface, And Method For Proving Search Service With Dynamic Interface
KR20100037401A (en) * 2008-10-01 2010-04-09 엔에이치엔(주) Method and apparatus for managing search database
KR20100127456A (en) * 2009-05-26 2010-12-06 (주)필링크 A system for intercepting lewd contents in network and the driving method

Similar Documents

Publication Publication Date Title
JP5624973B2 (en) Filtering device
KR101769222B1 (en) Method and device for preventing service illegal access
EP2632116B1 (en) A method for communication in a tactical network
US7958258B2 (en) Mobile communication device domain name system redirection
WO2010104280A2 (en) System and method for integratedly managing multiple connection statistics servers
US10938776B2 (en) Apparatus and method for correlating addresses of different internet protocol versions
WO2012036449A2 (en) Method and apparatus for managing data
CN101123578A (en) A method and system for improving access speed of network resource
CN105553999A (en) Application program user behavior analysis and security control method and corresponding device
US20210026904A1 (en) Mechanisms for service layer resource ranking and enhanced resource discovery
KR20130026540A (en) Method and apparatus for managing device context by using ip address in communication system
CN101551813A (en) Network connection apparatus, search equipment and method for collecting search engine data source
US20180288612A1 (en) User equipment and method for protection of user privacy in communication networks
EP3389240B1 (en) Method and system for processing cache cluster service
US20120238294A1 (en) Method Of Providing Location-Based Service In A Communication System
US20130268662A1 (en) Hypertext transfer protocol http stream association method and device
KR20150120555A (en) Global IoT Resource Discovery Service Method and Server using the same
WO2013162264A1 (en) Method and system for collecting objects by using packet mirroring
CN103685318A (en) Data processing method and device for protecting network security
US20090248529A1 (en) System and method for providing value added services via wireless access points
US10111081B2 (en) Local communication wireless network system and method thereof
JP5061372B2 (en) Web search system, web search method, and web search program
CN108040124A (en) The method and device of control mobile terminal application based on DNS-Over-HTTP agreements
KR101471515B1 (en) System and method for collecting searching objects using packet sniffing
WO2013162263A1 (en) Method and system for determining search target rank by using packet mirroring

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13781409

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13781409

Country of ref document: EP

Kind code of ref document: A1