MX336054B - Sistema y metodo para induccion de envoltorio automatico utilizando secuencias objetivo. - Google Patents

Sistema y metodo para induccion de envoltorio automatico utilizando secuencias objetivo.

Info

Publication number
MX336054B
MX336054B MX2013013345A MX2013013345A MX336054B MX 336054 B MX336054 B MX 336054B MX 2013013345 A MX2013013345 A MX 2013013345A MX 2013013345 A MX2013013345 A MX 2013013345A MX 336054 B MX336054 B MX 336054B
Authority
MX
Mexico
Prior art keywords
domains
domain
wrapper
target strings
wrappers
Prior art date
Application number
MX2013013345A
Other languages
English (en)
Other versions
MX2013013345A (es
Inventor
Siva Kalyana Pavan Kumar Mallapragada Naga Surya
Original Assignee
Homer Tlc Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Homer Tlc Inc filed Critical Homer Tlc Inc
Publication of MX2013013345A publication Critical patent/MX2013013345A/es
Publication of MX336054B publication Critical patent/MX336054B/es

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Se inducen envoltorios para múltiples dominios en donde, para una secuencia objetivo dada que tiene distribución relativamente universal a través de dominios de interés, se pueden definir un primer envoltorio y entrenarse para un dominio particular. Las secuencias objetivo extraídas de ese domino pueden ser utilizadas para buscar documentos en otros dominios. Se pueden aprender nuevos envoltorios para otros dominios que también contienen las secuencias objetivo. Además, se puede aprender un primer envoltorio para un dominio dado utilizando una cantidad limitada de datos de entrenamiento a partir de ese dominio individual. El primer envoltorio después es aplicado a todas las páginas en el dominio para extraer la información relevante. Un poco de las nuevas palabras extraídas son después buscadas contra la colección de documento para obtener una lista de dominios que contienen las palabras extraídas. La información actualizada puede ser utilizada como datos de entrenamiento para aprender nuevos envoltorios en esos dominios.
MX2013013345A 2012-11-14 2013-11-14 Sistema y metodo para induccion de envoltorio automatico utilizando secuencias objetivo. MX336054B (es)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US201261726165P 2012-11-14 2012-11-14

Publications (2)

Publication Number Publication Date
MX2013013345A MX2013013345A (es) 2014-08-06
MX336054B true MX336054B (es) 2016-01-07

Family

ID=50682755

Family Applications (1)

Application Number Title Priority Date Filing Date
MX2013013345A MX336054B (es) 2012-11-14 2013-11-14 Sistema y metodo para induccion de envoltorio automatico utilizando secuencias objetivo.

Country Status (3)

Country Link
US (1) US9223871B2 (es)
CA (1) CA2833356C (es)
MX (1) MX336054B (es)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10664534B2 (en) 2012-11-14 2020-05-26 Home Depot Product Authority, Llc System and method for automatic product matching
US10504127B2 (en) 2012-11-15 2019-12-10 Home Depot Product Authority, Llc System and method for classifying relevant competitors
US9928515B2 (en) 2012-11-15 2018-03-27 Home Depot Product Authority, Llc System and method for competitive product assortment
US10290012B2 (en) 2012-11-28 2019-05-14 Home Depot Product Authority, Llc System and method for price testing and optimization
CN108512876B (zh) * 2017-02-27 2020-11-10 腾讯科技(深圳)有限公司 数据的推送方法及装置
GB201711315D0 (en) * 2017-07-13 2017-08-30 Univ Oxford Innovation Ltd Method for automatically generating a wrapper for extracting web data, and a computer system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6085186A (en) * 1996-09-20 2000-07-04 Netbot, Inc. Method and system using information written in a wrapper description language to execute query on a network
US6606625B1 (en) * 1999-06-03 2003-08-12 University Of Southern California Wrapper induction by hierarchical data analysis
US7519621B2 (en) 2004-05-04 2009-04-14 Pagebites, Inc. Extracting information from Web pages
US9489366B2 (en) 2010-02-19 2016-11-08 Microsoft Technology Licensing, Llc Interactive synchronization of web data and spreadsheets

Also Published As

Publication number Publication date
MX2013013345A (es) 2014-08-06
CA2833356C (en) 2018-04-03
US20140136568A1 (en) 2014-05-15
US9223871B2 (en) 2015-12-29
CA2833356A1 (en) 2014-05-14

Similar Documents

Publication Publication Date Title
MX336054B (es) Sistema y metodo para induccion de envoltorio automatico utilizando secuencias objetivo.
WO2007056535A3 (en) Method and apparatus for timed tagging of media content
WO2009140272A3 (en) Search results with most clicked next objects
WO2014085832A3 (en) Event investigation within an online research system
WO2013163644A3 (en) Updating a search index used to facilitate application searches
WO2014085776A3 (en) Web search ranking
WO2011031773A3 (en) System and method to research documents in online libraries
MX2017008687A (es) Gestión de bache de sonido para suministro de contenido.
GB2489863A (en) Indexing documents
De Jong Fauna Europaea
WO2013002940A3 (en) Method and apparatus for creating a search index for a composite document and searching same
WO2011088521A3 (en) Improved searching using semantic keys
EA201490421A1 (ru) Поиск и создание настраиваемого контента
Gao Rill and gully development processes
WO2014049310A3 (en) Method and apparatuses for interactive searching of electronic documents
Junping et al. Detrital Zircon Fission Track Thermochronology in Kasama—Nondo, Northeastern Zambia
Sugiuchi et al. The trend of library and information science research in Japan: A content analysis of research articles
Cao et al. Automatic extraction technique of residential areas in high resolution remote sensing image
Park A study on priorities of the Key Competence of Port Logistics Enterprise using AHP Method
WO2012043981A3 (ko) 콘텐츠 데이터의 메타 정보 생성 방법 및 장치
GB2561177A8 (en) Method for identification of digital content
Gradziuk et al. Rocky road to a level playing field in EU–China investment and trade relations
Rastgar et al. Field Investigation of the Model of Transparent Governance Based on the Teachings of Nahj Al-Balagheh
Song et al. Liver segmentation on abdominal CT images
Panarin New ApproAch to rANkiNg of JurAssic sedimeNtAry complexes of the NortherN pArt of the west siberiAN petroleum bAsiN