CL2008001189A1 - Metodo y sistema para procesar paginas web a partir de un clasificador de pagina web spam; y sistema para construir dicho clasificador. - Google Patents

Metodo y sistema para procesar paginas web a partir de un clasificador de pagina web spam; y sistema para construir dicho clasificador.

Info

Publication number
CL2008001189A1
CL2008001189A1 CL2008001189A CL2008001189A CL2008001189A1 CL 2008001189 A1 CL2008001189 A1 CL 2008001189A1 CL 2008001189 A CL2008001189 A CL 2008001189A CL 2008001189 A CL2008001189 A CL 2008001189A CL 2008001189 A1 CL2008001189 A1 CL 2008001189A1
Authority
CL
Chile
Prior art keywords
classifier
build
web pages
process web
spam website
Prior art date
Application number
CL2008001189A
Other languages
English (en)
Inventor
Svore Krysta
Burges Chris
Original Assignee
Microsoft Corp Soc Organizada Bajo Las Leyes Del Estado De Washington
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp Soc Organizada Bajo Las Leyes Del Estado De Washington filed Critical Microsoft Corp Soc Organizada Bajo Las Leyes Del Estado De Washington
Publication of CL2008001189A1 publication Critical patent/CL2008001189A1/es

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Método y sistema para procesar páginas web a partir de un clasificador de página web spam; y sistema para construir dicho clasificador.
CL2008001189A 2007-04-30 2008-04-24 Metodo y sistema para procesar paginas web a partir de un clasificador de pagina web spam; y sistema para construir dicho clasificador. CL2008001189A1 (es)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/742,156 US7853589B2 (en) 2007-04-30 2007-04-30 Web spam page classification using query-dependent data

Publications (1)

Publication Number Publication Date
CL2008001189A1 true CL2008001189A1 (es) 2008-12-26

Family

ID=39888207

Family Applications (1)

Application Number Title Priority Date Filing Date
CL2008001189A CL2008001189A1 (es) 2007-04-30 2008-04-24 Metodo y sistema para procesar paginas web a partir de un clasificador de pagina web spam; y sistema para construir dicho clasificador.

Country Status (4)

Country Link
US (1) US7853589B2 (es)
CL (1) CL2008001189A1 (es)
TW (1) TWI437452B (es)
WO (1) WO2008134172A1 (es)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090046862A (ko) * 2006-07-24 2009-05-11 차차 써치 인코포레이티드 정보 검색 시스템에서 팟캐스팅 및 비디오 훈련에 대한 방법, 시스템 및 컴퓨터 판독가능한 저장부
US7680745B2 (en) * 2007-01-29 2010-03-16 4Homemedia, Inc. Automatic configuration and control of devices using metadata
US8458165B2 (en) * 2007-06-28 2013-06-04 Oracle International Corporation System and method for applying ranking SVM in query relaxation
US8078617B1 (en) * 2009-01-20 2011-12-13 Google Inc. Model based ad targeting
US8346800B2 (en) * 2009-04-02 2013-01-01 Microsoft Corporation Content-based information retrieval
US8219539B2 (en) * 2009-04-07 2012-07-10 Microsoft Corporation Search queries with shifting intent
US8935258B2 (en) * 2009-06-15 2015-01-13 Microsoft Corporation Identification of sample data items for re-judging
TWI601024B (zh) * 2009-07-06 2017-10-01 Alibaba Group Holding Ltd Sampling methods, systems and equipment
US20110040769A1 (en) * 2009-08-13 2011-02-17 Yahoo! Inc. Query-URL N-Gram Features in Web Ranking
US9020936B2 (en) * 2009-08-14 2015-04-28 Microsoft Technology Licensing, Llc Using categorical metadata to rank search results
US9576251B2 (en) * 2009-11-13 2017-02-21 Hewlett Packard Enterprise Development Lp Method and system for processing web activity data
TWI404374B (zh) * 2009-12-11 2013-08-01 Univ Nat Taiwan Science Tech 用以訓練偵測垃圾網站之分類器之方法
US8639773B2 (en) * 2010-06-17 2014-01-28 Microsoft Corporation Discrepancy detection for web crawling
US8706738B2 (en) * 2010-08-13 2014-04-22 Demand Media, Inc. Systems, methods and machine readable mediums to select a title for content production
JP4939637B2 (ja) * 2010-08-20 2012-05-30 楽天株式会社 情報提供装置、情報提供方法、プログラム、ならびに、情報記録媒体
US8606769B1 (en) * 2010-12-07 2013-12-10 Conductor, Inc. Ranking a URL based on a location in a search engine results page
US8762365B1 (en) * 2011-08-05 2014-06-24 Amazon Technologies, Inc. Classifying network sites using search queries
US8655883B1 (en) * 2011-09-27 2014-02-18 Google Inc. Automatic detection of similar business updates by using similarity to past rejected updates
KR101510647B1 (ko) * 2011-10-07 2015-04-10 한국전자통신연구원 이슈 템플릿 추출 기반의 웹 동향 분석 방법 및 장치
US9244931B2 (en) * 2011-10-11 2016-01-26 Microsoft Technology Licensing, Llc Time-aware ranking adapted to a search engine application
US8868536B1 (en) 2012-01-04 2014-10-21 Google Inc. Real time map spam detection
US9659095B2 (en) * 2012-03-04 2017-05-23 International Business Machines Corporation Managing search-engine-optimization content in web pages
CN102801709B (zh) * 2012-06-28 2015-03-04 北京奇虎科技有限公司 一种钓鱼网站识别系统及方法
US9483566B2 (en) * 2013-01-23 2016-11-01 Google Inc. System and method for determining the legitimacy of a listing
US9405803B2 (en) * 2013-04-23 2016-08-02 Google Inc. Ranking signals in mixed corpora environments
US20150039599A1 (en) * 2013-08-01 2015-02-05 Go Daddy Operating Company, LLC Methods and systems for recommending top level and second level domains
US10530671B2 (en) * 2015-01-15 2020-01-07 The University Of North Carolina At Chapel Hill Methods, systems, and computer readable media for generating and using a web page classification model
US10229219B2 (en) * 2015-05-01 2019-03-12 Facebook, Inc. Systems and methods for demotion of content items in a feed
US11675795B2 (en) * 2015-05-15 2023-06-13 Yahoo Assets Llc Method and system for ranking search content
US11609949B2 (en) 2018-11-20 2023-03-21 Google Llc Methods, systems, and media for modifying search results based on search query risk

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5826260A (en) * 1995-12-11 1998-10-20 International Business Machines Corporation Information retrieval system and method for displaying and ordering information based on query element contribution
US7117358B2 (en) * 1997-07-24 2006-10-03 Tumbleweed Communications Corp. Method and system for filtering communication
US6785671B1 (en) * 1999-12-08 2004-08-31 Amazon.Com, Inc. System and method for locating web-based product offerings
US6480837B1 (en) * 1999-12-16 2002-11-12 International Business Machines Corporation Method, system, and program for ordering search results using a popularity weighting
US7188106B2 (en) * 2001-05-01 2007-03-06 International Business Machines Corporation System and method for aggregating ranking results from various sources to improve the results of web searching
US6795820B2 (en) * 2001-06-20 2004-09-21 Nextpage, Inc. Metasearch technique that ranks documents obtained from multiple collections
US6978274B1 (en) * 2001-08-31 2005-12-20 Attenex Corporation System and method for dynamically evaluating latent concepts in unstructured documents
US7743045B2 (en) * 2005-08-10 2010-06-22 Google Inc. Detecting spam related and biased contexts for programmable search engines
KR100486821B1 (ko) 2003-02-08 2005-04-29 디프소프트 주식회사 링크 유알엘 접속을 통한 스팸메일 자동 차단 방법
US7219148B2 (en) * 2003-03-03 2007-05-15 Microsoft Corporation Feedback loop for spam prevention
US7197497B2 (en) * 2003-04-25 2007-03-27 Overture Services, Inc. Method and apparatus for machine learning a document relevance function
US20050015626A1 (en) * 2003-07-15 2005-01-20 Chasin C. Scott System and method for identifying and filtering junk e-mail messages or spam based on URL content
US20050120019A1 (en) * 2003-11-29 2005-06-02 International Business Machines Corporation Method and apparatus for the automatic identification of unsolicited e-mail messages (SPAM)
KR20040103763A (ko) 2004-01-15 2004-12-09 엔에이치엔(주) 검색 엔진에서 등록된 웹사이트를 관리하기 위한 방법
US20050216564A1 (en) * 2004-03-11 2005-09-29 Myers Gregory K Method and apparatus for analysis of electronic communications containing imagery
US7349901B2 (en) * 2004-05-21 2008-03-25 Microsoft Corporation Search engine spam detection using external data
US7664819B2 (en) * 2004-06-29 2010-02-16 Microsoft Corporation Incremental anti-spam lookup and update service
US7580921B2 (en) * 2004-07-26 2009-08-25 Google Inc. Phrase identification in an information retrieval system
US7533092B2 (en) * 2004-10-28 2009-05-12 Yahoo! Inc. Link-based spam detection
US7716198B2 (en) * 2004-12-21 2010-05-11 Microsoft Corporation Ranking search results using feature extraction
US7962510B2 (en) * 2005-02-11 2011-06-14 Microsoft Corporation Using content analysis to detect spam web pages
US7562304B2 (en) * 2005-05-03 2009-07-14 Mcafee, Inc. Indicating website reputations during website manipulation of user information
US7769751B1 (en) * 2006-01-17 2010-08-03 Google Inc. Method and apparatus for classifying documents based on user inputs

Also Published As

Publication number Publication date
US20080270376A1 (en) 2008-10-30
US7853589B2 (en) 2010-12-14
TW200849045A (en) 2008-12-16
TWI437452B (zh) 2014-05-11
WO2008134172A1 (en) 2008-11-06

Similar Documents

Publication Publication Date Title
CL2008001189A1 (es) Metodo y sistema para procesar paginas web a partir de un clasificador de pagina web spam; y sistema para construir dicho clasificador.
BRPI0815494A2 (pt) Método implementado por computador, e, sistema.
BRPI1012867A2 (pt) metodo, meio legivel por computador, computador servidor, e, sistema
BRPI0722055A2 (pt) Método, meio legível por computador, computador servidor, sistema, e, telefone.
CL2007003111A1 (es) Metodo y sistema implementado por computador para presentarle a un usuario el contexto de negocio correspondiente a un documento no estructurado.
BRPI0817477A2 (pt) Método computadorizado, meio legível por máquina, aparelho, e, sistema computadorizado.
CL2011000216A1 (es) Sistema y metodo para direccionar comandos en un sistema informatico modularizado.
BRPI0810808A2 (pt) Método, meio legível por computador, computador servidor, terminal de ponto de venda, e, sistema
BRPI0917120A2 (pt) método, e, meio legível por computador.
BRPI0915911A2 (pt) método e sistema para execução de aplicações usando módulos de código nativo
BRPI0815590A2 (pt) Método, meio legível por computador, computador servidor, sistema e dispositivo eletrônico.
BRPI0822998A2 (pt) Sistema de producao de material organico, utilizando material de biomassa e metodo
BRPI1008645A2 (pt) método, meio legível por computador, e, computador servidor
BR112012002417A2 (pt) sistema e método para adicionar propaganda em um sistema de propaganda baseada em localização.
GB2488298A (en) System and method for facilitating affiliate marketing relationships
BRPI0818556A8 (pt) método e sistema para gerar recomendações de itens de conteúdo
BRPI0811113A2 (pt) Processo, aparelho e sistema para estabelecer conexão.
BRPI0818423A2 (pt) Sistema, método, e, mídia legível por computador
BRPI1011767A2 (pt) método, meio legível por computador, computador servidor, e, sistema
BRPI0916298A2 (pt) método para produzir óleo base e combustível diesel, e, sistema.
BRPI0820122A2 (pt) Método de otimização de produção de poços, sistema, e método.
BR112013007260A2 (pt) "método implantado por computador, produto de progama de computador incorporado a um meio legível or computador e sistema para análise de tráfego de aplicação habilitada para estados de web"
BRPI0922309A2 (pt) método, meio legível por computador, e, aparelho.
BRPI0906086A2 (pt) Método, sistemas e meio legível por computador.
BRPI0817629A2 (pt) Método para a produção controlada de documentos de segurança e sistema para realização do método.