WO2010123359A1 - System and method for removing illegal content offered via the internet - Google Patents
System and method for removing illegal content offered via the internet Download PDFInfo
- Publication number
- WO2010123359A1 WO2010123359A1 PCT/NL2010/050218 NL2010050218W WO2010123359A1 WO 2010123359 A1 WO2010123359 A1 WO 2010123359A1 NL 2010050218 W NL2010050218 W NL 2010050218W WO 2010123359 A1 WO2010123359 A1 WO 2010123359A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- content
- foregoing
- search
- removal
- provider
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000001914 filtration Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 3
- 241000239290 Araneae Species 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000012015 optical character recognition Methods 0.000 description 2
- 241000254173 Coleoptera Species 0.000 description 1
- HJBWJAPEBGSQPR-UHFFFAOYSA-N DMCA Natural products COC1=CC=C(C=CC(O)=O)C=C1OC HJBWJAPEBGSQPR-UHFFFAOYSA-N 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/08—Payment architectures
- G06Q20/12—Payment architectures specially adapted for electronic shopping systems
- G06Q20/123—Shopping for digital content
- G06Q20/1235—Shopping for digital content with control of digital rights management [DRM]
Definitions
- the present invention relates to a system and method for removing illegal content offered via the internet, such as music files, film files, game files, books, publications and photos, the (copyright) owner of which has not given permission for offering thereof or for which no copyright fees are paid to the owner.
- the invention provides for this purpose a system for finding and removing illegal content offered via the internet, comprising input means for at least one search term related to the content to be removed, at least one search engine for searching the internet for the search term, and storing in a database links to content found on the basis of the search term, data processing means adapted to determine a provider hosting the relevant content and to draw up, for a found link, a removal request to the provider to remove the illegal content, and transmitting means adapted to send the removal requests to the provider for each link.
- the input means preferably comprise an input screen for manual input of at least one search term and/or means for reading a data file with search terms.
- These latter means can be adapted to read supplied text files or can for instance comprise connections to databases from which content to be located is read.
- These files can be reused after a period of time in order to establish whether the same content is being offered again, and to take action against it again. In the case of a manual search these actions must be performed again each time.
- an identification number is generated which is stored in the database and which is used as a reference during the search for and removal of content.
- the search engine can be adapted to search websites, blogs, RSS feeds, torrents, social networks, Twitter and peer-to-peer data sharing applications. Use can also be made of image recognition software to track down photo files or other images. These are the most frequently applied manifestations of illegal content and they are usually characterized in that the content can be accessed via a (hyper)link. Use can also be made of spoofing for locating illegal content, wherein the search engine pretends to originate from a different domain.
- the search engine is preferably adapted to search for synonyms and/or cryptic descriptions of a search term. It is for instance possible here to envisage words in which the "a” is replaced by an "@” or the V by a "0", or in which deliberate typing or spelling mistakes are made. In this manner misleadingly named content, such as for instance "The Beetles - Peny L@ne.mp3” is found when a search is made for "Beatles Penny Lane", or "Madd0na.jpg” when a search is made for "Madonna”.
- the data processing means are adapted to verify the links stored in the database for the actual availability of the content offered via the link.
- Hyperlinks often do not lead, or not directly, to the content being searched for, but an internet user is directed along various sites with advertising, pop-ups or banners before the actual content can be downloaded.
- the system acuuumg . ⁇ me ⁇ ic»cm mvcmiuH dete ⁇ nines for this purpose where the actual offered content is hosted.
- the hosted content can also be protected by means of a "Completely Automated Public
- a captcha is a response test used in data processing to determine whether or not a user is human. This technique has been developed to distinguish people from computer-driven programs, as applied in an embodiment of the present invention.
- the downloading of content is only possible by means of a manual operation such as entering a random code which is shown on the page.
- the system according to the present invention is adapted to determine that a captcha is being used and, if the captcha cannot be solved automatically, places the found URL on a special list for manual processing.
- the actual hyperlinks are not provided in HTML code but are shown as a graphic representation.
- the system according to the present invention is hereby prevented from clicking through directly to the content In such a case the system determines that use is being made of an image to provide information and deduces the information on the image by means of optical character recognition (OCR).
- OCR optical character recognition
- the data processing means are adapted to filter out links to websites, torrents and peer-to-peer data sharing applications offering legal content.
- these legitimate providers can be included in the database, after which the data processing means compare the links found by the search engine to the providers included in the database. Removal requests are of course not sent to these legitimate providers.
- the database can also be adapted to register removal requests accepted by providers. These removal requests can be entered into the database manually or by means of an algorithm which searches the site of the provider for the possibility of submitting a removal request. It has been found that a wide diversity of illegal content is offered via a relatively small group of providers, such as the website Rapidshare. An automated removal request can be generated and sent for quite a large number of links by including the removal requests for these providers in the database.
- the data processing means can be particularly adapteu ucic i ⁇ r aui ⁇ maicu c ⁇ mpicu ⁇ of a so-called removal page or abuse page of a provider. They can be particularly adapted to send a mail made available for this purpose for download by the provider on the basis of a template completed for the specific content. The content provider usually only takes into consideration removal requests which are drawn up on the basis of such a template.
- the data processing means can also be adapted to verify whether the content has been removed after a predetermined period of time of sending a removal request. If the content has not been removed a new removal request can be submitted, a demand can be sent, or a notice can be generated, on the basis of which a user of the system can for instance undertake legal action.
- the system can also provide an overview of illegal content found and/or removal requests sent.
- a user can hereby quickly obtain a summary of the amount of illegal content offered which corresponds to their search term or search terms, and the effect of sending the removal requests.
- the database can comprise an overview of owners of content related to a determined keyword, wherein the data processing means can be adapted to send a removal request in the name of the owner.
- the data processing means send such a request from an e-mail server of the owner in order to prevent the request not being processed, for instance because the IP address from which the removal request is sent is blocked by the provider of the illegal content.
- Use can also be made of an optionally external service which anonymizes the requests and in this way contributes toward it not being apparent to the content provider that many requests are being sent automatically by the same system.
- FIG. 1 shows a schematic view of a system 100 according to the present invention.
- System 100 comprises input means 1 for at least one search term 2 related to the content to be removed, and a search engine Ia for searching the internet 3 for search term 2.
- Websites 6 and 7, which each comprise content 2' and 2" related to search term 2 are located on the Internet.
- the search engine stores the search results in the form of links to websites 6 and 7 in database 4.
- data processing means 5 determine which provider hosts the relevant content and drafts a removal request 8, 9 to the provider to remove the illegal content.
- transmitting means 5a which are adapted to send removal requests 8, 9 to provider 6, 7 for each link to a website 6, 7. These removal requests are for instance sent from the e- mail server of the owner, or from a central server on which the system according to the present invention is operating, or from a shadow domain.
- the operation of system 100 is based on searches consisting of batches, each comprising multiple steps.
- a search comprises a description, a start and end date.
- Associated with a search is at least a search term for which a search is made by means of existing search engines such as Google.
- search engines When a search is performed the number of search engines associated with the search is verified. These search engines are all treated individually during the process of retrieving illegal links from the results. Individually, but nevertheless all simultaneously in their own thread (spider effect).
- Each search engine consists of one or more "operations". These are steps defined within the system which are followed in a predetermined sequence in order to filter the search results.
- the filtering takes place on the basis of regular expressions which ensure that specific words (links) are filtered from the retrieved results (HTML), for instance:
- This expression returns all Rapidshare links from an HTML page.
- the system verifies whether the search term does in fact occur in the search results. If this is the case, the search result is stored in the database, associated with the search query and search engine on which this link was found. The system also associates the search result with a source URL.
- Search results can occur on multiple search result pages.
- the search results are therefore grouped and a found link is for instance associated with multiple URLs in which this link was to be found, this also providing the option of finding out on how many websites a specific search result is found.
- the search results can comprise undesired data (noise data).
- the content filters can be adapted to filter out this noise data.
- the filtering comprises of replacing determined words in the search results, muiuug m WIOUKU ac ⁇ ivu results. These cleaned search results are stored in the database.
- search results are then verified as to whether they are downloadable. If a search result is not downloadable it is removed from the search result table and placed in an invalid search result table, and stored for statistical purposes.
- the first option is an error message that the search result does not exist (any longer); - the second option is that the search result can be downloaded.
- an e-mail (removal request) is sent to the associated provider, stating that the search result contains illegal content and must be removed.
- Such mails are sent in batches, in groups of search results which are hosted by the same provider.
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Marketing (AREA)
- Accounting & Taxation (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Technology Law (AREA)
- Finance (AREA)
- Storage Device Security (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
NL2002783A NL2002783C2 (nl) | 2009-04-23 | 2009-04-23 | Systeem en werkwijze voor het doen verwijderen van illegale via het internet aangeboden content. |
NL2002783 | 2009-04-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010123359A1 true WO2010123359A1 (en) | 2010-10-28 |
Family
ID=41314671
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/NL2010/050218 WO2010123359A1 (en) | 2009-04-23 | 2010-04-22 | System and method for removing illegal content offered via the internet |
Country Status (2)
Country | Link |
---|---|
NL (1) | NL2002783C2 (nl) |
WO (1) | WO2010123359A1 (nl) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140250196A1 (en) * | 2013-03-01 | 2014-09-04 | Raymond Anthony Joao | Apparatus and method for providing and/or for processing information regarding, relating to, or involving, defamatory, derogatory, harrassing, bullying, or other negative or offensive, comments, statements, or postings |
US9633220B2 (en) | 2012-06-11 | 2017-04-25 | Hewlett-Packard Development Company, L.P. | Preventing an unauthorized publication of a media object |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6401118B1 (en) * | 1998-06-30 | 2002-06-04 | Online Monitoring Services | Method and computer program product for an online monitoring search engine |
GB2376326A (en) * | 2001-06-04 | 2002-12-11 | Hewlett Packard Co | Peer-to-peer network search popularity statistical information collection |
US20050050446A1 (en) * | 2003-02-10 | 2005-03-03 | Akira Miura | Content processing terminal, copyright management system, and methods thereof |
WO2008076294A2 (en) * | 2006-12-13 | 2008-06-26 | Ricall, Inc. | Online music and other copyrighted work search and licensing system |
-
2009
- 2009-04-23 NL NL2002783A patent/NL2002783C2/nl active
-
2010
- 2010-04-22 WO PCT/NL2010/050218 patent/WO2010123359A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6401118B1 (en) * | 1998-06-30 | 2002-06-04 | Online Monitoring Services | Method and computer program product for an online monitoring search engine |
GB2376326A (en) * | 2001-06-04 | 2002-12-11 | Hewlett Packard Co | Peer-to-peer network search popularity statistical information collection |
US20050050446A1 (en) * | 2003-02-10 | 2005-03-03 | Akira Miura | Content processing terminal, copyright management system, and methods thereof |
WO2008076294A2 (en) * | 2006-12-13 | 2008-06-26 | Ricall, Inc. | Online music and other copyrighted work search and licensing system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9633220B2 (en) | 2012-06-11 | 2017-04-25 | Hewlett-Packard Development Company, L.P. | Preventing an unauthorized publication of a media object |
US20140250196A1 (en) * | 2013-03-01 | 2014-09-04 | Raymond Anthony Joao | Apparatus and method for providing and/or for processing information regarding, relating to, or involving, defamatory, derogatory, harrassing, bullying, or other negative or offensive, comments, statements, or postings |
Also Published As
Publication number | Publication date |
---|---|
NL2002783C2 (nl) | 2010-10-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9031946B1 (en) | Processor engine, integrated circuit and method therefor | |
US8713010B1 (en) | Processor engine, integrated circuit and method therefor | |
CN101601033B (zh) | 响应于模式化查询而生成专业的搜索结果 | |
AU2009277143B2 (en) | Federated community search | |
US8359651B1 (en) | Discovering malicious locations in a public computer network | |
US7860971B2 (en) | Anti-spam tool for browser | |
US20180131708A1 (en) | Identifying Fraudulent and Malicious Websites, Domain and Sub-domain Names | |
US20080235795A1 (en) | System and Method for Confirming Digital Content | |
WO2013044744A1 (zh) | 一种下载资源提供方法及装置 | |
RU2413278C1 (ru) | Способ отбора информации в сети интернет и использования этой информации в разделяемом веб-сайте и компьютерный сервер для реализации этого способа | |
US20070239692A1 (en) | Logo or image based search engine for presenting search results | |
WO2014000538A1 (zh) | 基于终端访问统计的云网址推荐方法及系统及相关设备 | |
US20160019195A1 (en) | Method and system for posting comments on hosted web pages | |
WO2012094418A1 (en) | Ownership resolution system | |
WO2010123359A1 (en) | System and method for removing illegal content offered via the internet | |
CN103905434A (zh) | 一种网络数据处理方法和装置 | |
CN102164156A (zh) | 一种资源发布方法及系统 | |
Yang et al. | Mingling of clear and muddy water: Understanding and detecting semantic confusion in blackhat seo | |
CN107784054B (zh) | 一种页面发布方法和装置 | |
CN1313956C (zh) | 利用实名访问网页的系统和方法 | |
Lim et al. | Phishing Vs. Legit: Comparative Analysis of Client-Side Resources of Phishing and Target Brand Websites | |
US9094452B2 (en) | Method and apparatus for locating phishing kits | |
EP2815334A1 (en) | Processor engine, integrated circuit and method for promoting websites in search result lists | |
CN108804444B (zh) | 信息抓取方法和装置 | |
JP2005128922A (ja) | スパムメールフィルタリングシステム,方法,およびプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10732456 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10732456 Country of ref document: EP Kind code of ref document: A1 |