CL2008001189A1 - Metodo y sistema para procesar paginas web a partir de un clasificador de pagina web spam; y sistema para construir dicho clasificador. - Google Patents
Metodo y sistema para procesar paginas web a partir de un clasificador de pagina web spam; y sistema para construir dicho clasificador.Info
- Publication number
- CL2008001189A1 CL2008001189A1 CL2008001189A CL2008001189A CL2008001189A1 CL 2008001189 A1 CL2008001189 A1 CL 2008001189A1 CL 2008001189 A CL2008001189 A CL 2008001189A CL 2008001189 A CL2008001189 A CL 2008001189A CL 2008001189 A1 CL2008001189 A1 CL 2008001189A1
- Authority
- CL
- Chile
- Prior art keywords
- classifier
- build
- web pages
- process web
- spam website
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Método y sistema para procesar páginas web a partir de un clasificador de página web spam; y sistema para construir dicho clasificador.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/742,156 US7853589B2 (en) | 2007-04-30 | 2007-04-30 | Web spam page classification using query-dependent data |
Publications (1)
Publication Number | Publication Date |
---|---|
CL2008001189A1 true CL2008001189A1 (es) | 2008-12-26 |
Family
ID=39888207
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CL2008001189A CL2008001189A1 (es) | 2007-04-30 | 2008-04-24 | Metodo y sistema para procesar paginas web a partir de un clasificador de pagina web spam; y sistema para construir dicho clasificador. |
Country Status (4)
Country | Link |
---|---|
US (1) | US7853589B2 (es) |
CL (1) | CL2008001189A1 (es) |
TW (1) | TWI437452B (es) |
WO (1) | WO2008134172A1 (es) |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20090046862A (ko) * | 2006-07-24 | 2009-05-11 | 차차 써치 인코포레이티드 | 정보 검색 시스템에서 팟캐스팅 및 비디오 훈련에 대한 방법, 시스템 및 컴퓨터 판독가능한 저장부 |
US7680745B2 (en) * | 2007-01-29 | 2010-03-16 | 4Homemedia, Inc. | Automatic configuration and control of devices using metadata |
US8458165B2 (en) * | 2007-06-28 | 2013-06-04 | Oracle International Corporation | System and method for applying ranking SVM in query relaxation |
US8078617B1 (en) * | 2009-01-20 | 2011-12-13 | Google Inc. | Model based ad targeting |
US8346800B2 (en) * | 2009-04-02 | 2013-01-01 | Microsoft Corporation | Content-based information retrieval |
US8219539B2 (en) * | 2009-04-07 | 2012-07-10 | Microsoft Corporation | Search queries with shifting intent |
US8935258B2 (en) * | 2009-06-15 | 2015-01-13 | Microsoft Corporation | Identification of sample data items for re-judging |
TWI601024B (zh) * | 2009-07-06 | 2017-10-01 | Alibaba Group Holding Ltd | Sampling methods, systems and equipment |
US20110040769A1 (en) * | 2009-08-13 | 2011-02-17 | Yahoo! Inc. | Query-URL N-Gram Features in Web Ranking |
US9020936B2 (en) * | 2009-08-14 | 2015-04-28 | Microsoft Technology Licensing, Llc | Using categorical metadata to rank search results |
US9576251B2 (en) * | 2009-11-13 | 2017-02-21 | Hewlett Packard Enterprise Development Lp | Method and system for processing web activity data |
TWI404374B (zh) * | 2009-12-11 | 2013-08-01 | Univ Nat Taiwan Science Tech | 用以訓練偵測垃圾網站之分類器之方法 |
US8639773B2 (en) * | 2010-06-17 | 2014-01-28 | Microsoft Corporation | Discrepancy detection for web crawling |
US8706738B2 (en) * | 2010-08-13 | 2014-04-22 | Demand Media, Inc. | Systems, methods and machine readable mediums to select a title for content production |
JP4939637B2 (ja) * | 2010-08-20 | 2012-05-30 | 楽天株式会社 | 情報提供装置、情報提供方法、プログラム、ならびに、情報記録媒体 |
US8606769B1 (en) * | 2010-12-07 | 2013-12-10 | Conductor, Inc. | Ranking a URL based on a location in a search engine results page |
US8762365B1 (en) * | 2011-08-05 | 2014-06-24 | Amazon Technologies, Inc. | Classifying network sites using search queries |
US8655883B1 (en) * | 2011-09-27 | 2014-02-18 | Google Inc. | Automatic detection of similar business updates by using similarity to past rejected updates |
KR101510647B1 (ko) * | 2011-10-07 | 2015-04-10 | 한국전자통신연구원 | 이슈 템플릿 추출 기반의 웹 동향 분석 방법 및 장치 |
US9244931B2 (en) * | 2011-10-11 | 2016-01-26 | Microsoft Technology Licensing, Llc | Time-aware ranking adapted to a search engine application |
US8868536B1 (en) | 2012-01-04 | 2014-10-21 | Google Inc. | Real time map spam detection |
US9659095B2 (en) * | 2012-03-04 | 2017-05-23 | International Business Machines Corporation | Managing search-engine-optimization content in web pages |
CN102801709B (zh) * | 2012-06-28 | 2015-03-04 | 北京奇虎科技有限公司 | 一种钓鱼网站识别系统及方法 |
US9483566B2 (en) * | 2013-01-23 | 2016-11-01 | Google Inc. | System and method for determining the legitimacy of a listing |
US9405803B2 (en) * | 2013-04-23 | 2016-08-02 | Google Inc. | Ranking signals in mixed corpora environments |
US20150039599A1 (en) * | 2013-08-01 | 2015-02-05 | Go Daddy Operating Company, LLC | Methods and systems for recommending top level and second level domains |
US10530671B2 (en) * | 2015-01-15 | 2020-01-07 | The University Of North Carolina At Chapel Hill | Methods, systems, and computer readable media for generating and using a web page classification model |
US10229219B2 (en) * | 2015-05-01 | 2019-03-12 | Facebook, Inc. | Systems and methods for demotion of content items in a feed |
US11675795B2 (en) * | 2015-05-15 | 2023-06-13 | Yahoo Assets Llc | Method and system for ranking search content |
US11609949B2 (en) | 2018-11-20 | 2023-03-21 | Google Llc | Methods, systems, and media for modifying search results based on search query risk |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5826260A (en) * | 1995-12-11 | 1998-10-20 | International Business Machines Corporation | Information retrieval system and method for displaying and ordering information based on query element contribution |
US7117358B2 (en) * | 1997-07-24 | 2006-10-03 | Tumbleweed Communications Corp. | Method and system for filtering communication |
US6785671B1 (en) * | 1999-12-08 | 2004-08-31 | Amazon.Com, Inc. | System and method for locating web-based product offerings |
US6480837B1 (en) * | 1999-12-16 | 2002-11-12 | International Business Machines Corporation | Method, system, and program for ordering search results using a popularity weighting |
US7188106B2 (en) * | 2001-05-01 | 2007-03-06 | International Business Machines Corporation | System and method for aggregating ranking results from various sources to improve the results of web searching |
US6795820B2 (en) * | 2001-06-20 | 2004-09-21 | Nextpage, Inc. | Metasearch technique that ranks documents obtained from multiple collections |
US6978274B1 (en) * | 2001-08-31 | 2005-12-20 | Attenex Corporation | System and method for dynamically evaluating latent concepts in unstructured documents |
US7743045B2 (en) * | 2005-08-10 | 2010-06-22 | Google Inc. | Detecting spam related and biased contexts for programmable search engines |
KR100486821B1 (ko) | 2003-02-08 | 2005-04-29 | 디프소프트 주식회사 | 링크 유알엘 접속을 통한 스팸메일 자동 차단 방법 |
US7219148B2 (en) * | 2003-03-03 | 2007-05-15 | Microsoft Corporation | Feedback loop for spam prevention |
US7197497B2 (en) * | 2003-04-25 | 2007-03-27 | Overture Services, Inc. | Method and apparatus for machine learning a document relevance function |
US20050015626A1 (en) * | 2003-07-15 | 2005-01-20 | Chasin C. Scott | System and method for identifying and filtering junk e-mail messages or spam based on URL content |
US20050120019A1 (en) * | 2003-11-29 | 2005-06-02 | International Business Machines Corporation | Method and apparatus for the automatic identification of unsolicited e-mail messages (SPAM) |
KR20040103763A (ko) | 2004-01-15 | 2004-12-09 | 엔에이치엔(주) | 검색 엔진에서 등록된 웹사이트를 관리하기 위한 방법 |
US20050216564A1 (en) * | 2004-03-11 | 2005-09-29 | Myers Gregory K | Method and apparatus for analysis of electronic communications containing imagery |
US7349901B2 (en) * | 2004-05-21 | 2008-03-25 | Microsoft Corporation | Search engine spam detection using external data |
US7664819B2 (en) * | 2004-06-29 | 2010-02-16 | Microsoft Corporation | Incremental anti-spam lookup and update service |
US7580921B2 (en) * | 2004-07-26 | 2009-08-25 | Google Inc. | Phrase identification in an information retrieval system |
US7533092B2 (en) * | 2004-10-28 | 2009-05-12 | Yahoo! Inc. | Link-based spam detection |
US7716198B2 (en) * | 2004-12-21 | 2010-05-11 | Microsoft Corporation | Ranking search results using feature extraction |
US7962510B2 (en) * | 2005-02-11 | 2011-06-14 | Microsoft Corporation | Using content analysis to detect spam web pages |
US7562304B2 (en) * | 2005-05-03 | 2009-07-14 | Mcafee, Inc. | Indicating website reputations during website manipulation of user information |
US7769751B1 (en) * | 2006-01-17 | 2010-08-03 | Google Inc. | Method and apparatus for classifying documents based on user inputs |
-
2007
- 2007-04-30 US US11/742,156 patent/US7853589B2/en not_active Expired - Fee Related
-
2008
- 2008-03-28 WO PCT/US2008/058637 patent/WO2008134172A1/en active Application Filing
- 2008-04-24 CL CL2008001189A patent/CL2008001189A1/es unknown
- 2008-04-24 TW TW097115108A patent/TWI437452B/zh not_active IP Right Cessation
Also Published As
Publication number | Publication date |
---|---|
US20080270376A1 (en) | 2008-10-30 |
US7853589B2 (en) | 2010-12-14 |
TW200849045A (en) | 2008-12-16 |
TWI437452B (zh) | 2014-05-11 |
WO2008134172A1 (en) | 2008-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CL2008001189A1 (es) | Metodo y sistema para procesar paginas web a partir de un clasificador de pagina web spam; y sistema para construir dicho clasificador. | |
BRPI0815494A2 (pt) | Método implementado por computador, e, sistema. | |
BRPI1012867A2 (pt) | metodo, meio legivel por computador, computador servidor, e, sistema | |
BRPI0722055A2 (pt) | Método, meio legível por computador, computador servidor, sistema, e, telefone. | |
CL2007003111A1 (es) | Metodo y sistema implementado por computador para presentarle a un usuario el contexto de negocio correspondiente a un documento no estructurado. | |
BRPI0817477A2 (pt) | Método computadorizado, meio legível por máquina, aparelho, e, sistema computadorizado. | |
CL2011000216A1 (es) | Sistema y metodo para direccionar comandos en un sistema informatico modularizado. | |
BRPI0810808A2 (pt) | Método, meio legível por computador, computador servidor, terminal de ponto de venda, e, sistema | |
BRPI0917120A2 (pt) | método, e, meio legível por computador. | |
BRPI0915911A2 (pt) | método e sistema para execução de aplicações usando módulos de código nativo | |
BRPI0815590A2 (pt) | Método, meio legível por computador, computador servidor, sistema e dispositivo eletrônico. | |
BRPI0822998A2 (pt) | Sistema de producao de material organico, utilizando material de biomassa e metodo | |
BRPI1008645A2 (pt) | método, meio legível por computador, e, computador servidor | |
BR112012002417A2 (pt) | sistema e método para adicionar propaganda em um sistema de propaganda baseada em localização. | |
GB2488298A (en) | System and method for facilitating affiliate marketing relationships | |
BRPI0818556A8 (pt) | método e sistema para gerar recomendações de itens de conteúdo | |
BRPI0811113A2 (pt) | Processo, aparelho e sistema para estabelecer conexão. | |
BRPI0818423A2 (pt) | Sistema, método, e, mídia legível por computador | |
BRPI1011767A2 (pt) | método, meio legível por computador, computador servidor, e, sistema | |
BRPI0916298A2 (pt) | método para produzir óleo base e combustível diesel, e, sistema. | |
BRPI0820122A2 (pt) | Método de otimização de produção de poços, sistema, e método. | |
BR112013007260A2 (pt) | "método implantado por computador, produto de progama de computador incorporado a um meio legível or computador e sistema para análise de tráfego de aplicação habilitada para estados de web" | |
BRPI0922309A2 (pt) | método, meio legível por computador, e, aparelho. | |
BRPI0906086A2 (pt) | Método, sistemas e meio legível por computador. | |
BRPI0817629A2 (pt) | Método para a produção controlada de documentos de segurança e sistema para realização do método. |