WO2010070646A1 - Système et procédé permettant à des agents d'interagir avec des réseaux p2p pour exécuter un traitement requis - Google Patents

Système et procédé permettant à des agents d'interagir avec des réseaux p2p pour exécuter un traitement requis Download PDF

Info

Publication number
WO2010070646A1
WO2010070646A1 PCT/IL2009/001197 IL2009001197W WO2010070646A1 WO 2010070646 A1 WO2010070646 A1 WO 2010070646A1 IL 2009001197 W IL2009001197 W IL 2009001197W WO 2010070646 A1 WO2010070646 A1 WO 2010070646A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
file
structured
database
network
Prior art date
Application number
PCT/IL2009/001197
Other languages
English (en)
Inventor
Doron Frenkel
Raz Alon
Joseph Arie Levy
Original Assignee
Tipayo Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tipayo Ltd filed Critical Tipayo Ltd
Publication of WO2010070646A1 publication Critical patent/WO2010070646A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1087Peer-to-peer [P2P] networks using cross-functional networking aspects
    • H04L67/1091Interfacing with client-server systems or between P2P systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0681Configuration of triggering conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/22Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks comprising specially adapted graphical user interfaces [GUI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user

Definitions

  • the present invention relates generally to P2P networks, and more particularly to a system and method to enable agents to perform searches and extract data from P2P networks.
  • P2P peer-to-peer
  • client/server architecture is a type of distributed network architecture. Often peer-to-peer architecture is implemented by giving each node both server and client capabilities. In recent usage, peer-to-peer has come to describe applications in which users can exchange files with each other over the Internet, either directly or through a mediating server.
  • Popular recent examples of programs for connecting to such file-sharing networks are FastTrack, Gnutella and ED2K.
  • P2P reduces the computing resources and connectivity requirements for the content owners and distributors. Moreover, the traffic model becomes symmetric.
  • NSP Network Service Provider
  • a method is disclosed to provide a system for set of agents residing on computers connected to the Internet in multiple sites to interact with at least one P2P network to perform searches and extract data from the at least one P2P network.
  • the method includes collecting raw data from the at least one P2P network based on system criteria.
  • the method also includes extracting structured data from the raw data and collecting, filtering and analyzing P2P traffic in the at least one P2P network and extracting relevant behavior patterns and relevant statistics, such the set of agents serve as an interface to the P2P networks and interact with the P2P networks using P2P protocols to perform extraction of data and insertion of decoys.
  • RAW information database - database containing downloaded information and information metadata.
  • Pattern database database of known or discovered character patterns and content providing a set of rules whose purpose is to match data (e.g. Visa credit card template).
  • Behavior database database of additional information and parameters pertaining to P2P user behavior that has been observed along time and is used to tweak the internal system parameters for the various processes.
  • Structured database - database containing information that has been processed and categorized/decoded.
  • P2P network any known or future P2P networks with various technologies and architectures.
  • P2P Protocol any known or future P2P protocol used in a P2P Network (each potentially having a different protocol).
  • Fig. 1 is a schematic system block diagram of the method for collecting relevant raw data from P2P networks based on system criteria, according to the principles of the present invention
  • Fig. 2 is a schematic system block diagram of the method for extracting structured data from the raw information, according to the principles of the present invention
  • Fig. 3 is a schematic system block diagram of the method for collecting, filtering and analyzing P2P traffic and extracting relevant behavior patterns and relevant statistics, according to the principles of the present invention
  • Fig. 4 is a schematic system block diagram of the method for processing and indexing the raw data in the raw data database, according to the principles of the present invention
  • Fig. 5 is a schematic system block diagram of the method for searching for data in the system databases, according to the principles of the present invention
  • Fig. 6 is a schematic system block diagram of the method for the triggering mechanism, according to the principles of the present invention
  • Fig. 7 is a schematic system block diagram of the method for decoying, according to the principles of the present invention.
  • Fig. 1 is a schematic system block diagram of the method for collecting relevant raw data from P2P networks based on system criteria, according to the principles of the present invention.
  • System criteria to query P2P Networks for relevant raw data may be based on patterns for file names and additional data of the file or the client on which it resides: IP, Location, Keywords, etc.
  • Other search input includes selected data from the P2P network.
  • Output includes relevant raw file data and metadata stored in the raw information database.
  • the system extracts search parameters from the pattern database. Search parameters are processed to determine their relevant use in the search query.
  • the P2P search query is rendered by combining search parameters and utilizing the appropriate P2P protocol.
  • the P2P query is sent to a P2P network using P2P query protocol based on the pattern database.
  • P2P search results entries are read, including any metadata returned for the search match entries.
  • the result set is stored for processing and relevancy analysis. Processing is either automatic or manual. Automatic processing is done by matching criteria satisfaction based on the pattern database, or any digital data source and/or matching process. Manual processing is done by storing results in a temporary database, to be later processed by a human or system, and later retrieving a list of relevant results based on metadata.
  • a final list of files is produced that have been selected via a matching process to be downloaded from the P2P network.
  • a request is sent to download each file using P2P protocol. Downloaded file chunks or other data structures are received from the P2P network and assembled in memory.
  • the downloaded files and additional file metadata (such as file extension, file identifiers and information on the client(s) on which the file was found) are stored in a raw file database.
  • the system and its agents/sentinels performs a query on any P2P network, searching for information.
  • the query is based on the pattern database or other criteria 101.
  • the pattern database includes a set of parameters such as IP, GEO Location, Customer information and a set of structures such as data types, data structures, file types and file structures.
  • Get the query results from the P2P network Then the query results are analyzed based on scoring derived from the relevancy of the file (IP, GEO Location, customer information, pattern matching, data types, data structure, file types and file structure 102.
  • a download request is sent to the P2P network for each query entry containing the required information based on the analysis step 103.
  • the hash files are stored 104 and the files with all the metadata in the RAW FILE database are stored 105.
  • Fig. 2 is a schematic system block diagram of the method for extracting structured data from the raw information gathered and processed by the system, according to the principles of the present invention.
  • Structured data found in the raw (non-structured) data that is in the raw information database is detected, extracted, processed and stored by using patterns and/or system criteria. Scoring, based on relevant structured data and metadata are stored in the structured database. New or unprocessed or previously processed, but relevant raw files are extracted from raw files database. Relevant structured data patterns are extracted from the patterns database and/or the behavior database based on raw file metadata and/or other system criteria.
  • Raw files are processed using relevant patterns to detect the existence of structured data in the raw files. Detected structured data is extracted from the raw files.
  • Classification and categorization information are added to the extracted data based on classification algorithms (utilizing data and concepts such as IP, GEO, Customer, Keywords, Patterns, etc). Data is rated for relevancy and other internal system measures, which may later affect the "said" system criteria. Detected structured data and structured data's metadata are stored in the structured database.
  • the system imports raw information from the raw information database 201. Then perform a correlation between the criteria and patterns that reside in the patterns database and the raw information 202. Next extract the data and correlated data in the structured data database 203 and add classification and categorization to the extracted data based on classification algorithm (utilizing data and concepts such as IP, GEO, Customer, keywords, patterns) 204. In parallel to process 204 the correlation process searches for new possible patterns 205. Finally, store the new patterns in the Patterns database and rate the existing patterns 206.
  • classification algorithm utilizing data and concepts such as IP, GEO, Customer, keywords, patterns
  • raw files are extracted from the raw files database and relevant system criteria are extracted from the patterns database based on raw file metadata and on recurrence of patterns found in past files processed and/or on other such algorithms for defining criteria for potential patterns.
  • raw file are processed using relevant system criteria to detect potential pattern templates of structured data and found potential pattern templates are extracted from raw files and their generic "template" representation is constructed.
  • Extracted pattern templates are rated using existing pattern templates and system criteria to assist in determining pattern quality. New pattern templates are stored in the patterns database. Note: the pattern will typically not be used by the system until manually approved by a system admin/data manager.
  • Fig. 3 is a schematic system block diagram of the method for collecting, filtering and analyzing P2P traffic and extracting relevant behavior patterns and relevant statistics, according to the principles of the present invention. This procedure assists in tracking the origins of information, queries and traffic (trail of evidence). Inputs are the P2P queries received by system P2P agents and outputs are the relevant behavior patterns and behavior statistics stored in the behavior database.
  • the system connects into the P2P network using P2P protocol.
  • the system receives P2P search and downloads queries and requests from external P2P hosts (this is just how the P2P protocols work - each node receives search requests from other nodes.
  • the system analyzes received P2P queries based on system criteria (geographic location, file requested, known keyword and/or patterns and/or fragments, etc.).
  • relevant data and metadata is extracted from analyzed queries and stored in the behavior database.
  • query statistics (such as risk level, usage level, interest level, general statistics based on geo location, originator, keywords, etc 1 ...) are calculated and stored in the behavior database.
  • the system listens to the P2P queries (queries created by any P2P network users and that flow through the system's P2P agents) 301.
  • the system also analyzes the behavior based on the traffic, quarries metadata, P2P network resources and structure and system criteria. The analysis is based on: General statistics, queries quantities, geographical spread, queries on relevant data (based on key words and patterns) 302.
  • the system stores the selected query information and statistics on the behavioral data base 303.
  • the behavior data is correlated with raw and/or structured data retrieved from the same client or group of clients (e.g., according to geography) 304. For example, the behavior scoring or numbering will be different for a country from which 100 credit cards were found compared to a country in which no credit cards were found.
  • Fig. 4 is a schematic system block diagram of the method for processing and indexing the P2P raw data in the raw data database, according to the principles of the present invention.
  • Indexes of raw files and raw files metadata are stored in the unstructured data index database.
  • Raw files are extracted from the raw files database.
  • An index is created for unstructured data in the raw files based on system criteria and a system index algorithm comprising words, numbers, etc.
  • the raw data index is stored in the unstructured data index database.
  • the data of the raw file database is analyzed 401 and the data is indexed 402. Then the indices are stored in the structured data database 403.
  • the index is rated and scored based on additional criteria that is calculated and derived from the structured data that the system has extracted from the P2P network.
  • FIG. 5 is a schematic system block diagram of the method for structured and non-structured P2P searching based on the system databases, according to the principles of the present invention.
  • An entity such as a person or a machine, accesses one of the various system interfaces using an automatic or manual process.
  • the system interface may be a GUI or an interface used to integrate two computer programs, such as the computer program of the system and the computer program of the proprietary calling website or application.
  • the entity submits a query for structured and/or non- structured data through the various system interfaces.
  • the query is parsed to identify required structured and/or non-structured query words and their internal relation.
  • Entity permissions and system criteria are Then extracted from the system database. Verify that the entity has permission to submit the received query.
  • a query is rendered using the entity query parameters and system criteria and the query is matched against the structured and unstructured data in the respected databases.
  • the query result list is rendered, along with optional metadata describing the reason the match was made.
  • the query results are sent or made available to the querying entity via the various system interfaces.
  • a person, organization or machine accesses the system manually or automatically 501.
  • the structured database is queried 502 and/or the non structured database is queried 503. The answer is projected back to the person or machine.
  • Fig. 6 is a schematic system block diagram of the method for the P2P data triggering mechanism, according to the principles of the present invention.
  • An entity such as a computer program or human accesses one of the various system interfaces using an automatic or manual process. Then the entity requests to create an alert, which is a notification to be sent by the system when it encounters certain data or pattern in the structured and/or non-structured data in the system databases. The data may already be in the system databases, or may have "just" been found by the extraction process des ⁇ ribed above. Note: this process actually describes two processes: creating the alert trigger and actually producing the alert by the system when data is found or the system criteria changes such that the alert condition is met. The latter occurs on a different timeline.
  • the alerts trigger definitions are stored in the system database.
  • An alert is sent to an entity, such as a computer program via an interface used to integrate two computer programs, or to a human via a GUI, based on a trigger request and trigger matching criteria.
  • the alert creation process is basically identical to the query process except the query is not actually performed, but just stored in the system database.
  • the alert production might be as a result of a query on the various system databases or as a result of monitoring the incoming P2P data in real time.
  • the sending of an alert is integrated into the extraction process, assuring that when data that meets an alert trigger is found, such a trigger is produced and sent to the entity, i.e., to a computer program via an interface or to a human via a GUI.
  • an entity such as a person, organization or machine accesses a system interface, either automatically or manually, and submits a request to create an alert for data with specific criteria from P2P networks 601.
  • the system analyzes the request and builds alert/trigger parameters and the alert trigger definition is saved in the database 602.
  • the system seeks the trigger based on the alert parameters and based on all the system data.
  • the search is made in the structured database 603 or in the unstructured database 604.
  • the system finds matches between the alert parameters and the system data, then triggers an alert 605.
  • the alert is sent to the originator, i.e., the person, organization or machine of reference block 601. Alternatively, a customer can request the trigger manually, but the system can be configured to relay it to the customer's IT system.
  • the alert includes the alert request data and the alert match data.
  • the system also finds incoming trigger matches from the real-time data extraction/processing, as described in previous figures.
  • Fig. 7 is a schematic system block diagram of the method for decoying, according to the principles of the present invention.
  • a decoy data file is created using the patterns database, structured data database and behavior database and that has been distributed on P2P networks randomly or intentionally, in order to follow and monitor its distribution on the P2P network.
  • the file is created in response to a computer program or human request.
  • a person or machine accesses one of the various system interfaces using an automatic or manual process.
  • the system interface may be a GUI or an interface used to integrate two computer programs.
  • the person or machine submits a decoy file request through the various system interfaces.
  • the decoy request is parsed to identify the required patterns and/or behavioral data and/or structured and/or non-structured keywords and their internal relationship.
  • the decoy request is compared with the structured data database to eliminate any data conflicts or possible duplication of the decoy.
  • the relevant patterns are extracted from the patterns database and relevant behavior data is extracted from the behavior database.
  • the decoy file structure and content are determined.
  • the decoy file is rendered along with its metadata using selected patterns and content fragments.
  • the decoy file is distributed to system agents.
  • the decoy file search and download requests are responded to on the P2P network adhering to the P2P protocol and providing the decoy file as a normal file on the P2P Network.
  • the data file decoy is uploaded to the requester using P2P protocol. Any search and/or download requests are logged for the decoy file in the behavior database using download request metadata and other relevant data, such as data regarding the downloaded P2P client: GEO, IP, etc.
  • the system is also able to replace the decoy file with a better version of itself based on the behavior derived from the system.
  • a person or a machine accesses the system manually or automatically and submits decoy request rendered 701.
  • the desired decoy is built based on the request parameters and the patterns data that are stored in the patterns data base 702.
  • the request is verified using the structured data in order to allocate and prevent data duplications and collisions 703.
  • the decoy structure is adjusted based on the behavioral data, in order to increase the exposure of the decoy over the P2P network.
  • the parameters include the geographical area, the decoy file names, etc. 704.
  • a decoy request can actually produce more than just a single file.
  • the decoy is built based on the previous parameters 705.
  • the decoy file is distributed to the relevant system agents 706.
  • System agents receive search and download requests for the decoy file and upload it to the network as requested.
  • the decoy download requests and decoy download activities are stored 707.
  • the system later performs queries for the decoy file to monitor its distribution on the P2P network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé pour fournir un système destiné à un ensemble d'agents résidant sur des ordinateurs connectés à l'Internet sur plusieurs sites, afin d'interagir avec au moins un réseau P2P, afin d'effectuer des recherches et extraire des données du ou des réseaux P2P. Le procédé consiste à collecter des données brutes à partir du ou des réseaux P2P sur la base de critères du système. Le procédé consiste également à extraire des données structurées à partir des données brutes et à collecter, filtrer et analyser un trafic point à point dans le ou les réseaux P2P et à extraire des modèles de comportement pertinents et des statistiques pertinentes de sorte que l'ensemble des agents servent d'interface aux réseaux P2P et interagissent avec ces derniers.
PCT/IL2009/001197 2008-12-18 2009-12-17 Système et procédé permettant à des agents d'interagir avec des réseaux p2p pour exécuter un traitement requis WO2010070646A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13865408P 2008-12-18 2008-12-18
US61/138,654 2008-12-18

Publications (1)

Publication Number Publication Date
WO2010070646A1 true WO2010070646A1 (fr) 2010-06-24

Family

ID=42268379

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2009/001197 WO2010070646A1 (fr) 2008-12-18 2009-12-17 Système et procédé permettant à des agents d'interagir avec des réseaux p2p pour exécuter un traitement requis

Country Status (1)

Country Link
WO (1) WO2010070646A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI467969B (zh) * 2011-11-10 2015-01-01 Nat Univ Chung Hsing 具有自動檔案下載接管功能的頻寬管理系統及其方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020152262A1 (en) * 2001-04-17 2002-10-17 Jed Arkin Method and system for preventing the infringement of intellectual property rights
US6732180B1 (en) * 2000-08-08 2004-05-04 The University Of Tulsa Method to inhibit the identification and retrieval of proprietary media via automated search engines utilized in association with computer compatible communications network
US20050198535A1 (en) * 2004-03-02 2005-09-08 Macrovision Corporation, A Corporation Of Delaware System, method and client user interface for a copy protection service
US20070078769A1 (en) * 2003-07-07 2007-04-05 Stemventures Limited Anti piracy system in a peer-to-peer network
US20080263202A1 (en) * 2005-06-15 2008-10-23 George David A Method and apparatus for reducing spam on peer-to-peer networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6732180B1 (en) * 2000-08-08 2004-05-04 The University Of Tulsa Method to inhibit the identification and retrieval of proprietary media via automated search engines utilized in association with computer compatible communications network
US20020152262A1 (en) * 2001-04-17 2002-10-17 Jed Arkin Method and system for preventing the infringement of intellectual property rights
US20070078769A1 (en) * 2003-07-07 2007-04-05 Stemventures Limited Anti piracy system in a peer-to-peer network
US20050198535A1 (en) * 2004-03-02 2005-09-08 Macrovision Corporation, A Corporation Of Delaware System, method and client user interface for a copy protection service
US20080263202A1 (en) * 2005-06-15 2008-10-23 George David A Method and apparatus for reducing spam on peer-to-peer networks

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI467969B (zh) * 2011-11-10 2015-01-01 Nat Univ Chung Hsing 具有自動檔案下載接管功能的頻寬管理系統及其方法

Similar Documents

Publication Publication Date Title
AU2008262281B2 (en) System and method for advertising on a peer-to-peer network
US8972376B1 (en) Optimized web domains classification based on progressive crawling with clustering
AU2008239682B2 (en) A system and method for creating a list of shared information on a peer-to-peer network
CN103218431B (zh) 一种能识别网页信息自动采集的系统
CN110431817A (zh) 识别恶意网络设备
KR20030048045A (ko) 데이터 네트워크의 정보 검색 및 분석 방법
US20050050028A1 (en) Methods and systems for searching content in distributed computing networks
US8255519B2 (en) Network bookmarking based on network traffic
WO2007071143A1 (fr) Procédé et appareil destinés à émettre des informations réseau
CN104067281A (zh) 按多个时间维度的聚类事件数据
CN108718341B (zh) 数据的共享和搜索的方法
FR2887385A1 (fr) Procede et systeme de reperage et de filtrage d'informations multimedia sur un reseau
CN101808102A (zh) 一种基于云计算的操作记录追踪系统和方法
CN113454621A (zh) 用于从多域收集数据的方法、装置和计算机程序
Suchacka Analysis of aggregated bot and human traffic on e-commerce site
CN115134099A (zh) 基于全流量的网络攻击行为分析方法及装置
Ranjan et al. Approximate matching of persistent lexicon using search-engines for classifying mobile app traffic
WO2022057525A1 (fr) Procédé et dispositif de récupération de données, dispositif électronique et support de stockage
US9973950B2 (en) Technique for data traffic analysis
US20120030164A1 (en) Method and system for gathering and usage of live search trends
JP2016524732A (ja) ピアツーピアネットワークに関連するデータ資産を管理するためのシステムおよび方法
WO2010070646A1 (fr) Système et procédé permettant à des agents d'interagir avec des réseaux p2p pour exécuter un traitement requis
Wang et al. A comprehensive and long-term evaluation of tor v3 onion services
CN111368294B (zh) 病毒文件的识别方法和装置、存储介质、电子装置
Hoßfeld et al. Measurement of BitTorrent swarms and their AS topologies

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09833063

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09833063

Country of ref document: EP

Kind code of ref document: A1