MX336054B - Sistema y metodo para induccion de envoltorio automatico utilizando secuencias objetivo. - Google Patents
Sistema y metodo para induccion de envoltorio automatico utilizando secuencias objetivo.Info
- Publication number
- MX336054B MX336054B MX2013013345A MX2013013345A MX336054B MX 336054 B MX336054 B MX 336054B MX 2013013345 A MX2013013345 A MX 2013013345A MX 2013013345 A MX2013013345 A MX 2013013345A MX 336054 B MX336054 B MX 336054B
- Authority
- MX
- Mexico
- Prior art keywords
- domains
- domain
- wrapper
- target strings
- wrappers
- Prior art date
Links
- 230000006698 induction Effects 0.000 title 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Se inducen envoltorios para múltiples dominios en donde, para una secuencia objetivo dada que tiene distribución relativamente universal a través de dominios de interés, se pueden definir un primer envoltorio y entrenarse para un dominio particular. Las secuencias objetivo extraídas de ese domino pueden ser utilizadas para buscar documentos en otros dominios. Se pueden aprender nuevos envoltorios para otros dominios que también contienen las secuencias objetivo. Además, se puede aprender un primer envoltorio para un dominio dado utilizando una cantidad limitada de datos de entrenamiento a partir de ese dominio individual. El primer envoltorio después es aplicado a todas las páginas en el dominio para extraer la información relevante. Un poco de las nuevas palabras extraídas son después buscadas contra la colección de documento para obtener una lista de dominios que contienen las palabras extraídas. La información actualizada puede ser utilizada como datos de entrenamiento para aprender nuevos envoltorios en esos dominios.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261726165P | 2012-11-14 | 2012-11-14 |
Publications (2)
Publication Number | Publication Date |
---|---|
MX2013013345A MX2013013345A (es) | 2014-08-06 |
MX336054B true MX336054B (es) | 2016-01-07 |
Family
ID=50682755
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
MX2013013345A MX336054B (es) | 2012-11-14 | 2013-11-14 | Sistema y metodo para induccion de envoltorio automatico utilizando secuencias objetivo. |
Country Status (3)
Country | Link |
---|---|
US (1) | US9223871B2 (es) |
CA (1) | CA2833356C (es) |
MX (1) | MX336054B (es) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10664534B2 (en) | 2012-11-14 | 2020-05-26 | Home Depot Product Authority, Llc | System and method for automatic product matching |
US10504127B2 (en) | 2012-11-15 | 2019-12-10 | Home Depot Product Authority, Llc | System and method for classifying relevant competitors |
US9928515B2 (en) | 2012-11-15 | 2018-03-27 | Home Depot Product Authority, Llc | System and method for competitive product assortment |
US10290012B2 (en) | 2012-11-28 | 2019-05-14 | Home Depot Product Authority, Llc | System and method for price testing and optimization |
CN108512876B (zh) * | 2017-02-27 | 2020-11-10 | 腾讯科技(深圳)有限公司 | 数据的推送方法及装置 |
GB201711315D0 (en) * | 2017-07-13 | 2017-08-30 | Univ Oxford Innovation Ltd | Method for automatically generating a wrapper for extracting web data, and a computer system |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6085186A (en) * | 1996-09-20 | 2000-07-04 | Netbot, Inc. | Method and system using information written in a wrapper description language to execute query on a network |
US6606625B1 (en) * | 1999-06-03 | 2003-08-12 | University Of Southern California | Wrapper induction by hierarchical data analysis |
US7519621B2 (en) | 2004-05-04 | 2009-04-14 | Pagebites, Inc. | Extracting information from Web pages |
US9489366B2 (en) | 2010-02-19 | 2016-11-08 | Microsoft Technology Licensing, Llc | Interactive synchronization of web data and spreadsheets |
-
2013
- 2013-03-15 US US13/837,961 patent/US9223871B2/en active Active
- 2013-11-14 MX MX2013013345A patent/MX336054B/es unknown
- 2013-11-14 CA CA2833356A patent/CA2833356C/en active Active
Also Published As
Publication number | Publication date |
---|---|
MX2013013345A (es) | 2014-08-06 |
CA2833356C (en) | 2018-04-03 |
US20140136568A1 (en) | 2014-05-15 |
US9223871B2 (en) | 2015-12-29 |
CA2833356A1 (en) | 2014-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
MX336054B (es) | Sistema y metodo para induccion de envoltorio automatico utilizando secuencias objetivo. | |
WO2007056535A3 (en) | Method and apparatus for timed tagging of media content | |
WO2009140272A3 (en) | Search results with most clicked next objects | |
WO2014085832A3 (en) | Event investigation within an online research system | |
WO2013163644A3 (en) | Updating a search index used to facilitate application searches | |
WO2014085776A3 (en) | Web search ranking | |
WO2011031773A3 (en) | System and method to research documents in online libraries | |
MX2017008687A (es) | Gestión de bache de sonido para suministro de contenido. | |
GB2489863A (en) | Indexing documents | |
De Jong | Fauna Europaea | |
WO2013002940A3 (en) | Method and apparatus for creating a search index for a composite document and searching same | |
WO2011088521A3 (en) | Improved searching using semantic keys | |
EA201490421A1 (ru) | Поиск и создание настраиваемого контента | |
Gao | Rill and gully development processes | |
WO2014049310A3 (en) | Method and apparatuses for interactive searching of electronic documents | |
Junping et al. | Detrital Zircon Fission Track Thermochronology in Kasama—Nondo, Northeastern Zambia | |
Sugiuchi et al. | The trend of library and information science research in Japan: A content analysis of research articles | |
Cao et al. | Automatic extraction technique of residential areas in high resolution remote sensing image | |
Park | A study on priorities of the Key Competence of Port Logistics Enterprise using AHP Method | |
WO2012043981A3 (ko) | 콘텐츠 데이터의 메타 정보 생성 방법 및 장치 | |
GB2561177A8 (en) | Method for identification of digital content | |
Gradziuk et al. | Rocky road to a level playing field in EU–China investment and trade relations | |
Rastgar et al. | Field Investigation of the Model of Transparent Governance Based on the Teachings of Nahj Al-Balagheh | |
Song et al. | Liver segmentation on abdominal CT images | |
Panarin | New ApproAch to rANkiNg of JurAssic sedimeNtAry complexes of the NortherN pArt of the west siberiAN petroleum bAsiN |