WO2010123359A1 - System and method for removing illegal content offered via the internet - Google Patents

System and method for removing illegal content offered via the internet Download PDF

Info

Publication number
WO2010123359A1
WO2010123359A1 PCT/NL2010/050218 NL2010050218W WO2010123359A1 WO 2010123359 A1 WO2010123359 A1 WO 2010123359A1 NL 2010050218 W NL2010050218 W NL 2010050218W WO 2010123359 A1 WO2010123359 A1 WO 2010123359A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
foregoing
search
removal
provider
Prior art date
Application number
PCT/NL2010/050218
Other languages
English (en)
French (fr)
Inventor
Dennis Christopher De Laat
Original Assignee
Dennis Christopher De Laat
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dennis Christopher De Laat filed Critical Dennis Christopher De Laat
Publication of WO2010123359A1 publication Critical patent/WO2010123359A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/08Payment architectures
    • G06Q20/12Payment architectures specially adapted for electronic shopping systems
    • G06Q20/123Shopping for digital content
    • G06Q20/1235Shopping for digital content with control of digital rights management [DRM]

Definitions

  • the present invention relates to a system and method for removing illegal content offered via the internet, such as music files, film files, game files, books, publications and photos, the (copyright) owner of which has not given permission for offering thereof or for which no copyright fees are paid to the owner.
  • the invention provides for this purpose a system for finding and removing illegal content offered via the internet, comprising input means for at least one search term related to the content to be removed, at least one search engine for searching the internet for the search term, and storing in a database links to content found on the basis of the search term, data processing means adapted to determine a provider hosting the relevant content and to draw up, for a found link, a removal request to the provider to remove the illegal content, and transmitting means adapted to send the removal requests to the provider for each link.
  • the input means preferably comprise an input screen for manual input of at least one search term and/or means for reading a data file with search terms.
  • These latter means can be adapted to read supplied text files or can for instance comprise connections to databases from which content to be located is read.
  • These files can be reused after a period of time in order to establish whether the same content is being offered again, and to take action against it again. In the case of a manual search these actions must be performed again each time.
  • an identification number is generated which is stored in the database and which is used as a reference during the search for and removal of content.
  • the search engine can be adapted to search websites, blogs, RSS feeds, torrents, social networks, Twitter and peer-to-peer data sharing applications. Use can also be made of image recognition software to track down photo files or other images. These are the most frequently applied manifestations of illegal content and they are usually characterized in that the content can be accessed via a (hyper)link. Use can also be made of spoofing for locating illegal content, wherein the search engine pretends to originate from a different domain.
  • the search engine is preferably adapted to search for synonyms and/or cryptic descriptions of a search term. It is for instance possible here to envisage words in which the "a” is replaced by an "@” or the V by a "0", or in which deliberate typing or spelling mistakes are made. In this manner misleadingly named content, such as for instance "The Beetles - Peny L@ne.mp3” is found when a search is made for "Beatles Penny Lane", or "Madd0na.jpg” when a search is made for "Madonna”.
  • the data processing means are adapted to verify the links stored in the database for the actual availability of the content offered via the link.
  • Hyperlinks often do not lead, or not directly, to the content being searched for, but an internet user is directed along various sites with advertising, pop-ups or banners before the actual content can be downloaded.
  • the system acuuumg . ⁇ me ⁇ ic»cm mvcmiuH dete ⁇ nines for this purpose where the actual offered content is hosted.
  • the hosted content can also be protected by means of a "Completely Automated Public
  • a captcha is a response test used in data processing to determine whether or not a user is human. This technique has been developed to distinguish people from computer-driven programs, as applied in an embodiment of the present invention.
  • the downloading of content is only possible by means of a manual operation such as entering a random code which is shown on the page.
  • the system according to the present invention is adapted to determine that a captcha is being used and, if the captcha cannot be solved automatically, places the found URL on a special list for manual processing.
  • the actual hyperlinks are not provided in HTML code but are shown as a graphic representation.
  • the system according to the present invention is hereby prevented from clicking through directly to the content In such a case the system determines that use is being made of an image to provide information and deduces the information on the image by means of optical character recognition (OCR).
  • OCR optical character recognition
  • the data processing means are adapted to filter out links to websites, torrents and peer-to-peer data sharing applications offering legal content.
  • these legitimate providers can be included in the database, after which the data processing means compare the links found by the search engine to the providers included in the database. Removal requests are of course not sent to these legitimate providers.
  • the database can also be adapted to register removal requests accepted by providers. These removal requests can be entered into the database manually or by means of an algorithm which searches the site of the provider for the possibility of submitting a removal request. It has been found that a wide diversity of illegal content is offered via a relatively small group of providers, such as the website Rapidshare. An automated removal request can be generated and sent for quite a large number of links by including the removal requests for these providers in the database.
  • the data processing means can be particularly adapteu ucic i ⁇ r aui ⁇ maicu c ⁇ mpicu ⁇ of a so-called removal page or abuse page of a provider. They can be particularly adapted to send a mail made available for this purpose for download by the provider on the basis of a template completed for the specific content. The content provider usually only takes into consideration removal requests which are drawn up on the basis of such a template.
  • the data processing means can also be adapted to verify whether the content has been removed after a predetermined period of time of sending a removal request. If the content has not been removed a new removal request can be submitted, a demand can be sent, or a notice can be generated, on the basis of which a user of the system can for instance undertake legal action.
  • the system can also provide an overview of illegal content found and/or removal requests sent.
  • a user can hereby quickly obtain a summary of the amount of illegal content offered which corresponds to their search term or search terms, and the effect of sending the removal requests.
  • the database can comprise an overview of owners of content related to a determined keyword, wherein the data processing means can be adapted to send a removal request in the name of the owner.
  • the data processing means send such a request from an e-mail server of the owner in order to prevent the request not being processed, for instance because the IP address from which the removal request is sent is blocked by the provider of the illegal content.
  • Use can also be made of an optionally external service which anonymizes the requests and in this way contributes toward it not being apparent to the content provider that many requests are being sent automatically by the same system.
  • FIG. 1 shows a schematic view of a system 100 according to the present invention.
  • System 100 comprises input means 1 for at least one search term 2 related to the content to be removed, and a search engine Ia for searching the internet 3 for search term 2.
  • Websites 6 and 7, which each comprise content 2' and 2" related to search term 2 are located on the Internet.
  • the search engine stores the search results in the form of links to websites 6 and 7 in database 4.
  • data processing means 5 determine which provider hosts the relevant content and drafts a removal request 8, 9 to the provider to remove the illegal content.
  • transmitting means 5a which are adapted to send removal requests 8, 9 to provider 6, 7 for each link to a website 6, 7. These removal requests are for instance sent from the e- mail server of the owner, or from a central server on which the system according to the present invention is operating, or from a shadow domain.
  • the operation of system 100 is based on searches consisting of batches, each comprising multiple steps.
  • a search comprises a description, a start and end date.
  • Associated with a search is at least a search term for which a search is made by means of existing search engines such as Google.
  • search engines When a search is performed the number of search engines associated with the search is verified. These search engines are all treated individually during the process of retrieving illegal links from the results. Individually, but nevertheless all simultaneously in their own thread (spider effect).
  • Each search engine consists of one or more "operations". These are steps defined within the system which are followed in a predetermined sequence in order to filter the search results.
  • the filtering takes place on the basis of regular expressions which ensure that specific words (links) are filtered from the retrieved results (HTML), for instance:
  • This expression returns all Rapidshare links from an HTML page.
  • the system verifies whether the search term does in fact occur in the search results. If this is the case, the search result is stored in the database, associated with the search query and search engine on which this link was found. The system also associates the search result with a source URL.
  • Search results can occur on multiple search result pages.
  • the search results are therefore grouped and a found link is for instance associated with multiple URLs in which this link was to be found, this also providing the option of finding out on how many websites a specific search result is found.
  • the search results can comprise undesired data (noise data).
  • the content filters can be adapted to filter out this noise data.
  • the filtering comprises of replacing determined words in the search results, muiuug m WIOUKU ac ⁇ ivu results. These cleaned search results are stored in the database.
  • search results are then verified as to whether they are downloadable. If a search result is not downloadable it is removed from the search result table and placed in an invalid search result table, and stored for statistical purposes.
  • the first option is an error message that the search result does not exist (any longer); - the second option is that the search result can be downloaded.
  • an e-mail (removal request) is sent to the associated provider, stating that the search result contains illegal content and must be removed.
  • Such mails are sent in batches, in groups of search results which are hosted by the same provider.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Accounting & Taxation (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Technology Law (AREA)
  • Finance (AREA)
  • Storage Device Security (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)
PCT/NL2010/050218 2009-04-23 2010-04-22 System and method for removing illegal content offered via the internet WO2010123359A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
NL2002783A NL2002783C2 (nl) 2009-04-23 2009-04-23 Systeem en werkwijze voor het doen verwijderen van illegale via het internet aangeboden content.
NL2002783 2009-04-23

Publications (1)

Publication Number Publication Date
WO2010123359A1 true WO2010123359A1 (en) 2010-10-28

Family

ID=41314671

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NL2010/050218 WO2010123359A1 (en) 2009-04-23 2010-04-22 System and method for removing illegal content offered via the internet

Country Status (2)

Country Link
NL (1) NL2002783C2 (nl)
WO (1) WO2010123359A1 (nl)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140250196A1 (en) * 2013-03-01 2014-09-04 Raymond Anthony Joao Apparatus and method for providing and/or for processing information regarding, relating to, or involving, defamatory, derogatory, harrassing, bullying, or other negative or offensive, comments, statements, or postings
US9633220B2 (en) 2012-06-11 2017-04-25 Hewlett-Packard Development Company, L.P. Preventing an unauthorized publication of a media object

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6401118B1 (en) * 1998-06-30 2002-06-04 Online Monitoring Services Method and computer program product for an online monitoring search engine
GB2376326A (en) * 2001-06-04 2002-12-11 Hewlett Packard Co Peer-to-peer network search popularity statistical information collection
US20050050446A1 (en) * 2003-02-10 2005-03-03 Akira Miura Content processing terminal, copyright management system, and methods thereof
WO2008076294A2 (en) * 2006-12-13 2008-06-26 Ricall, Inc. Online music and other copyrighted work search and licensing system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6401118B1 (en) * 1998-06-30 2002-06-04 Online Monitoring Services Method and computer program product for an online monitoring search engine
GB2376326A (en) * 2001-06-04 2002-12-11 Hewlett Packard Co Peer-to-peer network search popularity statistical information collection
US20050050446A1 (en) * 2003-02-10 2005-03-03 Akira Miura Content processing terminal, copyright management system, and methods thereof
WO2008076294A2 (en) * 2006-12-13 2008-06-26 Ricall, Inc. Online music and other copyrighted work search and licensing system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9633220B2 (en) 2012-06-11 2017-04-25 Hewlett-Packard Development Company, L.P. Preventing an unauthorized publication of a media object
US20140250196A1 (en) * 2013-03-01 2014-09-04 Raymond Anthony Joao Apparatus and method for providing and/or for processing information regarding, relating to, or involving, defamatory, derogatory, harrassing, bullying, or other negative or offensive, comments, statements, or postings

Also Published As

Publication number Publication date
NL2002783C2 (nl) 2010-10-26

Similar Documents

Publication Publication Date Title
US9031946B1 (en) Processor engine, integrated circuit and method therefor
US8713010B1 (en) Processor engine, integrated circuit and method therefor
CN101601033B (zh) 响应于模式化查询而生成专业的搜索结果
AU2009277143B2 (en) Federated community search
US8359651B1 (en) Discovering malicious locations in a public computer network
US7860971B2 (en) Anti-spam tool for browser
US20180131708A1 (en) Identifying Fraudulent and Malicious Websites, Domain and Sub-domain Names
US20080235795A1 (en) System and Method for Confirming Digital Content
WO2013044744A1 (zh) 一种下载资源提供方法及装置
RU2413278C1 (ru) Способ отбора информации в сети интернет и использования этой информации в разделяемом веб-сайте и компьютерный сервер для реализации этого способа
US20070239692A1 (en) Logo or image based search engine for presenting search results
WO2014000538A1 (zh) 基于终端访问统计的云网址推荐方法及系统及相关设备
US20160019195A1 (en) Method and system for posting comments on hosted web pages
WO2012094418A1 (en) Ownership resolution system
WO2010123359A1 (en) System and method for removing illegal content offered via the internet
CN103905434A (zh) 一种网络数据处理方法和装置
CN102164156A (zh) 一种资源发布方法及系统
Yang et al. Mingling of clear and muddy water: Understanding and detecting semantic confusion in blackhat seo
CN107784054B (zh) 一种页面发布方法和装置
CN1313956C (zh) 利用实名访问网页的系统和方法
Lim et al. Phishing Vs. Legit: Comparative Analysis of Client-Side Resources of Phishing and Target Brand Websites
US9094452B2 (en) Method and apparatus for locating phishing kits
EP2815334A1 (en) Processor engine, integrated circuit and method for promoting websites in search result lists
CN108804444B (zh) 信息抓取方法和装置
JP2005128922A (ja) スパムメールフィルタリングシステム,方法,およびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10732456

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10732456

Country of ref document: EP

Kind code of ref document: A1