CN103176956B - 用于提取文档结构的方法和装置 - Google Patents
用于提取文档结构的方法和装置 Download PDFInfo
- Publication number
- CN103176956B CN103176956B CN201110438858.2A CN201110438858A CN103176956B CN 103176956 B CN103176956 B CN 103176956B CN 201110438858 A CN201110438858 A CN 201110438858A CN 103176956 B CN103176956 B CN 103176956B
- Authority
- CN
- China
- Prior art keywords
- page
- row
- list
- entry
- references information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 206010008190 Cerebrovascular accident Diseases 0.000 claims description 21
- 208000006011 Stroke Diseases 0.000 claims description 21
- 239000000284 extract Substances 0.000 claims description 21
- 238000012217 deletion Methods 0.000 claims description 2
- 230000037430 deletion Effects 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Abstract
Description
Claims (8)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110438858.2A CN103176956B (zh) | 2011-12-21 | 2011-12-21 | 用于提取文档结构的方法和装置 |
US13/725,879 US9418051B2 (en) | 2011-12-21 | 2012-12-21 | Methods and devices for extracting document structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110438858.2A CN103176956B (zh) | 2011-12-21 | 2011-12-21 | 用于提取文档结构的方法和装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103176956A CN103176956A (zh) | 2013-06-26 |
CN103176956B true CN103176956B (zh) | 2016-08-03 |
Family
ID=48636842
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110438858.2A Active CN103176956B (zh) | 2011-12-21 | 2011-12-21 | 用于提取文档结构的方法和装置 |
Country Status (2)
Country | Link |
---|---|
US (1) | US9418051B2 (zh) |
CN (1) | CN103176956B (zh) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10204095B1 (en) * | 2015-02-10 | 2019-02-12 | West Corporation | Processing and delivery of private electronic documents |
CN107633039A (zh) * | 2017-09-13 | 2018-01-26 | 张贝贝 | 一种按涉及股权转让主题的pdf文件切割方法 |
CN107633040A (zh) * | 2017-09-13 | 2018-01-26 | 张贝贝 | 一种按涉及重大重组主题的pdf文件切割方法 |
CN108446264B (zh) * | 2018-03-26 | 2022-02-15 | 阿博茨德(北京)科技有限公司 | Pdf文档中的表格矢量解析方法及装置 |
CN111767254B (zh) * | 2020-07-07 | 2021-01-05 | 江苏中威科技软件系统有限公司 | 基于版式数据流文件技术的多文件阅读装置及其方法 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101354727A (zh) * | 2008-09-24 | 2009-01-28 | 北京大学 | 一种建立数字文档目录与正文之间链接的方法及装置 |
CN101937462A (zh) * | 2010-09-03 | 2011-01-05 | 中国科学院声学研究所 | 文献自动评价方法及系统 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7149957B2 (en) * | 2001-11-19 | 2006-12-12 | Ricoh Company, Ltd. | Techniques for retrieving multimedia information using a paper-based interface |
US7142728B2 (en) * | 2002-05-17 | 2006-11-28 | Science Applications International Corporation | Method and system for extracting information from a document |
US8201085B2 (en) * | 2007-06-21 | 2012-06-12 | Thomson Reuters Global Resources | Method and system for validating references |
US20060242180A1 (en) * | 2003-07-23 | 2006-10-26 | Graf James A | Extracting data from semi-structured text documents |
JP4314221B2 (ja) * | 2005-07-28 | 2009-08-12 | 株式会社東芝 | 構造化文書記憶装置、構造化文書検索装置、構造化文書システム、方法およびプログラム |
US20070124319A1 (en) * | 2005-11-28 | 2007-05-31 | Microsoft Corporation | Metadata generation for rich media |
JP5248845B2 (ja) * | 2006-12-13 | 2013-07-31 | キヤノン株式会社 | 文書処理装置、文書処理方法、プログラムおよび記憶媒体 |
US8495042B2 (en) * | 2007-10-10 | 2013-07-23 | Iti Scotland Limited | Information extraction apparatus and methods |
US9002100B2 (en) * | 2008-04-02 | 2015-04-07 | Xerox Corporation | Model uncertainty visualization for active learning |
US8655803B2 (en) * | 2008-12-17 | 2014-02-18 | Xerox Corporation | Method of feature extraction from noisy documents |
US20130205202A1 (en) * | 2010-10-26 | 2013-08-08 | Jun Xiao | Transformation of a Document into Interactive Media Content |
-
2011
- 2011-12-21 CN CN201110438858.2A patent/CN103176956B/zh active Active
-
2012
- 2012-12-21 US US13/725,879 patent/US9418051B2/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101354727A (zh) * | 2008-09-24 | 2009-01-28 | 北京大学 | 一种建立数字文档目录与正文之间链接的方法及装置 |
CN101937462A (zh) * | 2010-09-03 | 2011-01-05 | 中国科学院声学研究所 | 文献自动评价方法及系统 |
Non-Patent Citations (3)
Title |
---|
SemreX中基于语义的文档参考文献元数据信息提取;郭志鑫等;《计算机研究与发展》;20061231;第43卷(第8期);1368-1374 * |
信息文档结构信任模式的提取及逻辑描述;陈路瑶等;《计算机应用研究》;20101231;第27卷(第12期);4624-2629 * |
论文元数据信息的自动抽取;李朝光等;《计算机工程与应用》;20021231(第21期);189-191 * |
Also Published As
Publication number | Publication date |
---|---|
US9418051B2 (en) | 2016-08-16 |
CN103176956A (zh) | 2013-06-26 |
US20130167018A1 (en) | 2013-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111325110B (zh) | 基于ocr的表格版式恢复方法、装置及存储介质 | |
CN106250830B (zh) | 数字图书结构化分析处理方法 | |
US9798925B2 (en) | Method for identifying PDF document | |
US8452132B2 (en) | Automatic file name generation in OCR systems | |
US11900644B2 (en) | Document image analysis apparatus, document image analysis method and program thereof | |
US20150095769A1 (en) | Layout Analysis Method And System | |
CN103176956B (zh) | 用于提取文档结构的方法和装置 | |
CN101375278A (zh) | 用于处理注释的策略 | |
JP4785655B2 (ja) | 文書処理装置及び文書処理方法 | |
CN110704570A (zh) | 一种连续页版式文档结构化信息提取方法 | |
WO2011136766A1 (en) | System and method for automatically providing a graphical layout based on an example graphic layout | |
TW200416583A (en) | Definition data generation method of account book voucher and processing device of account book voucher | |
CN111310426A (zh) | 基于ocr的表格版式恢复方法、装置及存储介质 | |
CN104951429A (zh) | 版式电子文档的页眉页脚识别方法及装置 | |
JP5380040B2 (ja) | 文書処理装置 | |
CN111340020A (zh) | 一种公式识别方法、装置、设备及存储介质 | |
Palfray et al. | Logical segmentation for article extraction in digitized old newspapers | |
US9049400B2 (en) | Image processing apparatus, and image processing method and program | |
Nayak et al. | Odia running text recognition using moment-based feature extraction and mean distance classification technique | |
JP2008129793A (ja) | 文書処理システムおよび装置および方法、およびプログラムを記録した記録媒体 | |
Berg et al. | Towards high-quality text stream extraction from PDF. Technical background to the ACL 2012 Contributed Task | |
JP2008108114A (ja) | 文書処理装置および文書処理方法 | |
US20140177951A1 (en) | Method, apparatus, and storage medium having computer executable instructions for processing of an electronic document | |
JPH08320914A (ja) | 表認識方法および装置 | |
CN114564915A (zh) | 文本排版方法、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
ASS | Succession or assignment of patent right |
Owner name: FOUNDER INFORMATION INDUSTRY HOLDING CO., LTD. BEI Free format text: FORMER OWNER: BEIJING FOUNDER APABI TECHNOLOGY CO., LTD. Effective date: 20130912 |
|
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20130912 Address after: 100871 Beijing, Haidian District into the house road, founder of the building on the 5 floor, No. 298 Applicant after: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Applicant after: FOUNDER INFORMATION INDUSTRY HOLDINGS Co.,Ltd. Applicant after: FOUNDER APABI TECHNOLOGY Ltd. Address before: 100871 Beijing, Haidian District into the house road, founder of the building on the 5 floor, No. 298 Applicant before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Applicant before: FOUNDER APABI TECHNOLOGY Ltd. |
|
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: 100871, Beijing, Haidian District Cheng Fu Road 298, founder building, 9 floor Patentee after: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Patentee after: PKU FOUNDER INFORMATION INDUSTRY GROUP CO.,LTD. Patentee after: FOUNDER APABI TECHNOLOGY Ltd. Address before: 100871, Beijing, Haidian District Cheng Fu Road 298, founder building, 5 floor Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Patentee before: FOUNDER INFORMATION INDUSTRY HOLDINGS Co.,Ltd. Patentee before: FOUNDER APABI TECHNOLOGY Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220919 Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031 Patentee after: New founder holdings development Co.,Ltd. Patentee after: FOUNDER APABI TECHNOLOGY Ltd. Address before: 100871, Beijing, Haidian District Cheng Fu Road 298, founder building, 9 floor Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Patentee before: PKU FOUNDER INFORMATION INDUSTRY GROUP CO.,LTD. Patentee before: FOUNDER APABI TECHNOLOGY Ltd. |