BRPI0403013A - Segmentação de documento baseada em visão - Google Patents
Segmentação de documento baseada em visãoInfo
- Publication number
- BRPI0403013A BRPI0403013A BR0403013-3A BRPI0403013A BRPI0403013A BR PI0403013 A BRPI0403013 A BR PI0403013A BR PI0403013 A BRPI0403013 A BR PI0403013A BR PI0403013 A BRPI0403013 A BR PI0403013A
- Authority
- BR
- Brazil
- Prior art keywords
- document
- vision
- based document
- segmentation
- parts
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/117—Tagging; Marking up; Designating a block; Setting of attributes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/14—Tree-structured documents
- G06F40/143—Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Document Processing Apparatus (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
"SEGMENTAçãO DE DOCUMENTO BASEADA EM VISãO". Trata-se de uma segmentação de documento baseada em visão que identifica uma ou mais partes de conteúdo semântico de um documento. A uma ou mais partes são identificadas por se identificar uma pluralidade de blocos visuais no documento e por se detectar um ou mais separadores entre os blocos visuais da pluralidade de blocos visuais. Uma estrutura de conteúdo para o documento é construída baseado pelo menos em parte na pluralidade de blocos visuais e no um ou mais separadores e a estrutura de conteúdo identifica a uma ou mais partes de conteúdo semântico do documento. A estrutura de conteúdo obtida utilizando a segmentação de documento baseada em visão pode opcionalmente ser utilizada durante a recuperação de documento.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/628,766 US7428700B2 (en) | 2003-07-28 | 2003-07-28 | Vision-based document segmentation |
Publications (1)
Publication Number | Publication Date |
---|---|
BRPI0403013A true BRPI0403013A (pt) | 2005-03-22 |
Family
ID=33541464
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
BR0403013-3A BRPI0403013A (pt) | 2003-07-28 | 2004-07-23 | Segmentação de documento baseada em visão |
Country Status (12)
Country | Link |
---|---|
US (2) | US7428700B2 (pt) |
EP (1) | EP1503300A3 (pt) |
JP (1) | JP2005050344A (pt) |
KR (1) | KR20050013949A (pt) |
CN (1) | CN1577328A (pt) |
AU (1) | AU2004203057A1 (pt) |
BR (1) | BRPI0403013A (pt) |
CA (1) | CA2472664A1 (pt) |
MX (1) | MXPA04006932A (pt) |
RU (1) | RU2004123222A (pt) |
TW (1) | TW200508896A (pt) |
ZA (1) | ZA200405370B (pt) |
Families Citing this family (54)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7421651B2 (en) * | 2004-12-30 | 2008-09-02 | Google Inc. | Document segmentation based on visual gaps |
US7796837B2 (en) * | 2005-09-22 | 2010-09-14 | Google Inc. | Processing an image map for display on computing device |
US8176414B1 (en) * | 2005-09-30 | 2012-05-08 | Google Inc. | Document division method and system |
US8234392B2 (en) * | 2006-11-17 | 2012-07-31 | Apple Inc. | Methods and apparatuses for providing a hardware accelerated web engine |
US8949215B2 (en) * | 2007-02-28 | 2015-02-03 | Microsoft Corporation | GUI based web search |
US7895148B2 (en) * | 2007-04-30 | 2011-02-22 | Microsoft Corporation | Classifying functions of web blocks based on linguistic features |
EP2019361A1 (en) * | 2007-07-26 | 2009-01-28 | Siemens Aktiengesellschaft | A method and apparatus for extraction of textual content from hypertext web documents |
KR100907709B1 (ko) * | 2007-11-22 | 2009-07-14 | 한양대학교 산학협력단 | 블록 그룹핑을 이용한 정보 추출 장치 및 그 방법 |
US8301998B2 (en) | 2007-12-14 | 2012-10-30 | Ebay Inc. | Identification of content in an electronic document |
US8601393B2 (en) * | 2008-01-28 | 2013-12-03 | Fuji Xerox Co., Ltd. | System and method for supporting document navigation on mobile devices using segmentation and keyphrase summarization |
CN101515272B (zh) * | 2008-02-18 | 2012-10-24 | 株式会社理光 | 提取网页内容的方法和装置 |
US20090248707A1 (en) * | 2008-03-25 | 2009-10-01 | Yahoo! Inc. | Site-specific information-type detection methods and systems |
CN102132265A (zh) * | 2008-06-26 | 2011-07-20 | 惠普开发有限公司 | 远程分配的本地资源的自管理 |
JP5100543B2 (ja) * | 2008-07-11 | 2012-12-19 | キヤノン株式会社 | 文書管理装置、文書管理方法、及びコンピュータプログラム |
US8473467B2 (en) * | 2009-01-02 | 2013-06-25 | Apple Inc. | Content profiling to dynamically configure content processing |
US20140033024A1 (en) * | 2009-04-07 | 2014-01-30 | Adobe Systems Incorporated | Multi-item page layout modifications by gap editing |
CN101937438B (zh) * | 2009-06-30 | 2013-06-05 | 富士通株式会社 | 网页内容提取方法和装置 |
WO2011041795A1 (en) * | 2009-10-02 | 2011-04-07 | Aravind Musuluri | System and method for block segmenting, identifying and indexing visual elements, and searching documents |
WO2011072434A1 (en) * | 2009-12-14 | 2011-06-23 | Hewlett-Packard Development Company,L.P. | System and method for web content extraction |
CN102906733B (zh) * | 2010-02-12 | 2016-08-17 | 比蓝阅读器公司 | 文本连续性的指示符 |
US9026907B2 (en) | 2010-02-12 | 2015-05-05 | Nicholas Lum | Indicators of text continuity |
US8380719B2 (en) * | 2010-06-18 | 2013-02-19 | Microsoft Corporation | Semantic content searching |
WO2012000185A1 (en) * | 2010-06-30 | 2012-01-05 | Hewlett-Packard Development Company,L.P. | Method and system of determining similarity between elements of electronic document |
US8606789B2 (en) * | 2010-07-02 | 2013-12-10 | Xerox Corporation | Method for layout based document zone querying |
US8874581B2 (en) | 2010-07-29 | 2014-10-28 | Microsoft Corporation | Employing topic models for semantic class mining |
US8867837B2 (en) * | 2010-07-30 | 2014-10-21 | Hewlett-Packard Development Company, L.P. | Detecting separator lines in a web page |
US9317622B1 (en) * | 2010-08-17 | 2016-04-19 | Amazon Technologies, Inc. | Methods and systems for fragmenting and recombining content structured language data content to reduce latency of processing and rendering operations |
EP2431889A1 (en) * | 2010-09-01 | 2012-03-21 | Axel Springer Digital TV Guide GmbH | Content transformation for lean-back entertainment |
US20130205202A1 (en) * | 2010-10-26 | 2013-08-08 | Jun Xiao | Transformation of a Document into Interactive Media Content |
US8442998B2 (en) | 2011-01-18 | 2013-05-14 | Apple Inc. | Storage of a document using multiple representations |
US9535888B2 (en) * | 2012-03-30 | 2017-01-03 | Bmenu As | System, method, software arrangement and computer-accessible medium for a generator that automatically identifies regions of interest in electronic documents for transcoding |
US9524274B2 (en) * | 2013-06-06 | 2016-12-20 | Xerox Corporation | Methods and systems for generation of document structures based on sequential constraints |
KR101429466B1 (ko) * | 2012-11-19 | 2014-08-13 | 네이버 주식회사 | 동적 페이지 분할을 이용한 웹페이지 제공 방법 및 시스템 |
US9031894B2 (en) * | 2013-02-19 | 2015-05-12 | Microsoft Technology Licensing, Llc | Parsing and rendering structured images |
CA2875926C (en) | 2013-06-14 | 2019-07-30 | Wavelight Gmbh | Automatic machine settings for customized refractive surgery |
US9817823B2 (en) | 2013-09-17 | 2017-11-14 | International Business Machines Corporation | Active knowledge guidance based on deep document analysis |
US10198408B1 (en) * | 2013-10-01 | 2019-02-05 | Go Daddy Operating Company, LLC | System and method for converting and importing web site content |
US9672195B2 (en) * | 2013-12-24 | 2017-06-06 | Xerox Corporation | Method and system for page construct detection based on sequential regularities |
WO2015164278A1 (en) * | 2014-04-20 | 2015-10-29 | Aravind Musuluri | System and method for variable presentation semantics of search results in a search environment |
RU2595557C2 (ru) * | 2014-09-17 | 2016-08-27 | Общество с ограниченной ответственностью "Аби Девелопмент" | Выявление снимков экрана на изображениях документов |
US10838699B2 (en) | 2017-01-18 | 2020-11-17 | Oracle International Corporation | Generating data mappings for user interface screens and screen components for an application |
US10733754B2 (en) * | 2017-01-18 | 2020-08-04 | Oracle International Corporation | Generating a graphical user interface model from an image |
US10891419B2 (en) | 2017-10-27 | 2021-01-12 | International Business Machines Corporation | Displaying electronic text-based messages according to their typographic features |
US10769056B2 (en) | 2018-02-26 | 2020-09-08 | The Ultimate Software Group, Inc. | System for autonomously testing a computer system |
US11954461B2 (en) | 2018-02-26 | 2024-04-09 | Ukg Inc. | Autonomously delivering software features |
US10747651B1 (en) | 2018-05-31 | 2020-08-18 | The Ultimate Software Group, Inc. | System for optimizing system resources and runtime during a testing procedure |
US11010284B1 (en) | 2018-05-31 | 2021-05-18 | The Ultimate Software Group, Inc. | System for understanding navigational semantics via hypothesis generation and contextual analysis |
US10977155B1 (en) | 2018-05-31 | 2021-04-13 | The Ultimate Software Group, Inc. | System for providing autonomous discovery of field or navigation constraints |
US10599767B1 (en) | 2018-05-31 | 2020-03-24 | The Ultimate Software Group, Inc. | System for providing intelligent part of speech processing of complex natural language |
US11113175B1 (en) | 2018-05-31 | 2021-09-07 | The Ultimate Software Group, Inc. | System for discovering semantic relationships in computer programs |
CN109298819B (zh) * | 2018-09-21 | 2021-03-16 | Oppo广东移动通信有限公司 | 选择对象的方法、装置、终端及存储介质 |
US11176310B2 (en) * | 2019-04-01 | 2021-11-16 | Adobe Inc. | Facilitating dynamic document layout by determining reading order using document content stream cues |
US11194953B1 (en) * | 2020-04-29 | 2021-12-07 | Indico | Graphical user interface systems for generating hierarchical data extraction training dataset |
CN112347353B (zh) * | 2020-11-06 | 2024-05-24 | 同方知网(北京)技术有限公司 | 一种网页去噪的方法 |
Family Cites Families (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5619709A (en) * | 1993-09-20 | 1997-04-08 | Hnc, Inc. | System and method of context vector generation and retrieval |
US5802515A (en) * | 1996-06-11 | 1998-09-01 | Massachusetts Institute Of Technology | Randomized query generation and document relevance ranking for robust information retrieval from a database |
US6125361A (en) | 1998-04-10 | 2000-09-26 | International Business Machines Corporation | Feature diffusion across hyperlinks |
JP3913985B2 (ja) * | 1999-04-14 | 2007-05-09 | 富士通株式会社 | 文書画像中の基本成分に基づく文字列抽出装置および方法 |
US6880122B1 (en) * | 1999-05-13 | 2005-04-12 | Hewlett-Packard Development Company, L.P. | Segmenting a document into regions associated with a data type, and assigning pipelines to process such regions |
US6754885B1 (en) * | 1999-05-17 | 2004-06-22 | Invensys Systems, Inc. | Methods and apparatus for controlling object appearance in a process control configuration system |
JP3594228B2 (ja) * | 1999-07-01 | 2004-11-24 | シャープ株式会社 | 枠消し装置、枠消し方法、およびオーサリング装置 |
US7346604B1 (en) * | 1999-10-15 | 2008-03-18 | Hewlett-Packard Development Company, L.P. | Method for ranking hypertext search results by analysis of hyperlinks from expert documents and keyword scope |
US6963867B2 (en) * | 1999-12-08 | 2005-11-08 | A9.Com, Inc. | Search query processing to provide category-ranked presentation of search results |
US6584465B1 (en) * | 2000-02-25 | 2003-06-24 | Eastman Kodak Company | Method and system for search and retrieval of similar patterns |
JP3729017B2 (ja) * | 2000-03-27 | 2005-12-21 | コニカミノルタビジネステクノロジーズ株式会社 | 画像処理装置 |
US20020123994A1 (en) * | 2000-04-26 | 2002-09-05 | Yves Schabes | System for fulfilling an information need using extended matching techniques |
JP3425408B2 (ja) * | 2000-05-31 | 2003-07-14 | 株式会社東芝 | 文書読取装置 |
US7003513B2 (en) * | 2000-07-04 | 2006-02-21 | International Business Machines Corporation | Method and system of weighted context feedback for result improvement in information retrieval |
JP3703080B2 (ja) * | 2000-07-27 | 2005-10-05 | インターナショナル・ビジネス・マシーンズ・コーポレーション | ウェブコンテンツを簡略化するための方法、システムおよび媒体 |
JP3995185B2 (ja) * | 2000-07-28 | 2007-10-24 | 株式会社リコー | 枠認識装置及び記録媒体 |
US6567103B1 (en) * | 2000-08-02 | 2003-05-20 | Verity, Inc. | Graphical search results system and method |
US7356530B2 (en) * | 2001-01-10 | 2008-04-08 | Looksmart, Ltd. | Systems and methods of retrieving relevant information |
US6978420B2 (en) * | 2001-02-12 | 2005-12-20 | Aplix Research, Inc. | Hierarchical document cross-reference system and method |
US7076483B2 (en) * | 2001-08-27 | 2006-07-11 | Xyleme Sa | Ranking nodes in a graph |
TW533142B (en) * | 2001-09-12 | 2003-05-21 | Basevision Technology Corp | Composing device and method for name card |
US20040013302A1 (en) | 2001-12-04 | 2004-01-22 | Yue Ma | Document classification and labeling using layout graph matching |
US7010746B2 (en) * | 2002-07-23 | 2006-03-07 | Xerox Corporation | System and method for constraint-based document generation |
US6947930B2 (en) * | 2003-03-21 | 2005-09-20 | Overture Services, Inc. | Systems and methods for interactive search query refinement |
-
2003
- 2003-07-28 US US10/628,766 patent/US7428700B2/en not_active Expired - Fee Related
-
2004
- 2004-06-28 TW TW093118839A patent/TW200508896A/zh unknown
- 2004-06-28 CA CA002472664A patent/CA2472664A1/en not_active Abandoned
- 2004-07-02 EP EP04015636A patent/EP1503300A3/en not_active Withdrawn
- 2004-07-06 AU AU2004203057A patent/AU2004203057A1/en not_active Abandoned
- 2004-07-06 ZA ZA200405370A patent/ZA200405370B/xx unknown
- 2004-07-16 MX MXPA04006932A patent/MXPA04006932A/es active IP Right Grant
- 2004-07-23 BR BR0403013-3A patent/BRPI0403013A/pt not_active IP Right Cessation
- 2004-07-27 RU RU2004123222/09A patent/RU2004123222A/ru not_active Application Discontinuation
- 2004-07-27 KR KR1020040058540A patent/KR20050013949A/ko not_active Application Discontinuation
- 2004-07-28 JP JP2004220868A patent/JP2005050344A/ja not_active Withdrawn
- 2004-07-28 CN CNA2004100556979A patent/CN1577328A/zh active Pending
-
2006
- 2006-01-09 US US11/275,488 patent/US7613995B2/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
TW200508896A (en) | 2005-03-01 |
RU2004123222A (ru) | 2006-01-27 |
CN1577328A (zh) | 2005-02-09 |
ZA200405370B (en) | 2005-03-15 |
JP2005050344A (ja) | 2005-02-24 |
AU2004203057A1 (en) | 2005-02-17 |
EP1503300A2 (en) | 2005-02-02 |
MXPA04006932A (es) | 2005-03-23 |
US7613995B2 (en) | 2009-11-03 |
US20050028077A1 (en) | 2005-02-03 |
KR20050013949A (ko) | 2005-02-05 |
US20060106798A1 (en) | 2006-05-18 |
US7428700B2 (en) | 2008-09-23 |
CA2472664A1 (en) | 2005-01-28 |
EP1503300A3 (en) | 2006-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
BRPI0403013A (pt) | Segmentação de documento baseada em visão | |
BRPI0412184A (pt) | renderização de anúncios com documentos tendo um ou mais tópicos utilizando informação de interesse de tópico do usuário | |
BR9917009A (pt) | Método para garantir a segurança de informações eletrônicas | |
BR0200282A (pt) | Sistema e método para determinar exigências especìficas a partir de documentos de exigências gerais | |
BR9905978A (pt) | Identificação automática de linguagem que usa tanto a informação de n-grama como a de palavra | |
BR0215411A (pt) | Métodos para processar dados e para separar registros previamente conjugados, e, meio legìvel por computador | |
BR0304234A (pt) | Codificação de interação embutida passiva | |
BR0108429A (pt) | Sistema para facilitar o processamento e gerenciamento de transações | |
BRPI0503781A (pt) | métodos de identificação e de computação para identificação de frases relacionadas numa coleção de documentos e produto de programa de computador | |
EP1347395A3 (en) | Systems and methods for determining the topic structure of a portion of text | |
BRPI0410112A (pt) | método, sistema e produto de programa de computador para mapeamento de dados de exibição | |
BR9804282A (pt) | Método e aparelho para realizar uma consulta de ligação em um sistema de base de dados. | |
EP1626356A3 (en) | Method and system for summarizing a document | |
BRPI0506292A (pt) | sistemas para definir automaticamente requisitos de forma aceitáveis para um objeto, para checar automaticamente se um objeto está em conformidade com requisitos de forma aceitáveis e para definir automaticamente requisitos de forma aceitáveis para um objeto, método de determinar automaticamente se a forma de um objeto está em conformidade com requisitos de forma aceitáveis, e, meio/sinal legìvel por computador | |
BRPI0500724A (pt) | Descoberta do objetivo do usuário | |
BRPI0520173A2 (pt) | composiÇço de consultas usando autolistas | |
BR9907501A (pt) | Segmentação de dados de imagem em blocos e eliminação de alguns antes da compressão | |
BR9909896A (pt) | Processamento de vistas pré-computadas | |
BRPI0518429A2 (pt) | ponta de inserÇço | |
EP1217533A3 (en) | Method and computer system for part-of-speech tagging of incomplete sentences | |
BRPI0413097A (pt) | métodos e sistemas para determinar um significado de um documento para comparar o documento ao conteúdo | |
BR0000068A (pt) | Sistema e método para iniciar a operação de um sistema de computador | |
BR0314070A (pt) | Métodos para a verificação de movimento de fluido | |
BR0008195A (pt) | Sistema para a gerência de um número grande de embalagens de uso repetido e reutilizáveis, com um código especialmente apropriado para isso | |
BR9805590A (pt) | Pellets de negro de fumo e um processo para a sua produção. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
B08F | Application dismissed because of non-payment of annual fees [chapter 8.6 patent gazette] |
Free format text: CONFORME ARTIGO 10O DA RESOLUCAO 124/06, CABE SER ARQUIVADO REFERENTE AO NAO RECOLHIMENTO DA 7A ANUIDADE. |
|
B08K | Patent lapsed as no evidence of payment of the annual fee has been furnished to inpi [chapter 8.11 patent gazette] |
Free format text: REFERENTE AO DESPACHO 8.6 PUBLICADO NA RPI 2143 DE 31/01/2012. |