CN102541929B - 提取版式文档目录的方法及装置 - Google Patents
提取版式文档目录的方法及装置 Download PDFInfo
- Publication number
- CN102541929B CN102541929B CN201010615308.9A CN201010615308A CN102541929B CN 102541929 B CN102541929 B CN 102541929B CN 201010615308 A CN201010615308 A CN 201010615308A CN 102541929 B CN102541929 B CN 102541929B
- Authority
- CN
- China
- Prior art keywords
- page
- catalogue
- page number
- number piece
- piece
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 239000000284 extract Substances 0.000 claims description 28
- 238000000605 extraction Methods 0.000 claims description 27
- 238000007621 cluster analysis Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 description 31
- 238000010586 diagram Methods 0.000 description 20
- 238000010224 classification analysis Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000007373 indentation Methods 0.000 description 2
- 101100136092 Drosophila melanogaster peng gene Proteins 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 230000013011 mating Effects 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
Description
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201010615308.9A CN102541929B (zh) | 2010-12-22 | 2010-12-22 | 提取版式文档目录的方法及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201010615308.9A CN102541929B (zh) | 2010-12-22 | 2010-12-22 | 提取版式文档目录的方法及装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102541929A CN102541929A (zh) | 2012-07-04 |
CN102541929B true CN102541929B (zh) | 2014-04-02 |
Family
ID=46348845
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201010615308.9A Active CN102541929B (zh) | 2010-12-22 | 2010-12-22 | 提取版式文档目录的方法及装置 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102541929B (zh) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102841888B (zh) * | 2012-09-14 | 2015-10-14 | 《中国学术期刊(光盘版)》电子杂志社有限公司 | 一种快速排版系统及方法 |
CN103778141A (zh) * | 2012-10-23 | 2014-05-07 | 南开大学 | 一种混合pdf图书目录自动抽取算法 |
CN104424214B (zh) * | 2013-08-22 | 2017-10-27 | 北大方正集团有限公司 | 一种自定义提取目录内容的方法和装置 |
CN105630748A (zh) * | 2014-10-31 | 2016-06-01 | 富士通株式会社 | 信息处理设备和信息处理方法 |
CN104536948A (zh) * | 2014-12-10 | 2015-04-22 | 百度在线网络技术(北京)有限公司 | 版式文档的处理方法及装置 |
CN104699666B (zh) * | 2015-01-30 | 2017-09-01 | 浙江大学 | 基于近邻传播模型从图书目录中学习层次结构的方法 |
CN107291682B (zh) * | 2016-03-30 | 2020-12-08 | 同方知网(北京)技术有限公司 | 一种基于跳转处理及双重校验的多篇电子文档分篇算法 |
CN106951540B (zh) * | 2017-03-23 | 2018-01-12 | 掌阅科技股份有限公司 | 文件目录的生成方法、装置、服务器及计算机存储介质 |
CN107358208B (zh) * | 2017-07-14 | 2018-07-13 | 北京神州泰岳软件股份有限公司 | 一种pdf文档结构化信息提取方法及装置 |
CN111144069B (zh) * | 2019-12-30 | 2021-12-03 | 北大方正集团有限公司 | 一种基于表格的目录排版方法、装置及存储介质 |
CN111553366B (zh) * | 2020-04-30 | 2023-05-16 | 广东小天才科技有限公司 | 一种题目匹配的方法和系统 |
CN111767254B (zh) * | 2020-07-07 | 2021-01-05 | 江苏中威科技软件系统有限公司 | 基于版式数据流文件技术的多文件阅读装置及其方法 |
CN112632968B (zh) * | 2020-12-18 | 2024-02-13 | 万兴科技(湖南)有限公司 | Pdf目录识别方法及电子设备、计算机可读存储介质 |
CN114997138A (zh) * | 2022-06-20 | 2022-09-02 | 壹沓科技(上海)有限公司 | 一种化学品说明书解析方法、装置、设备及可读存储介质 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101458680A (zh) * | 2008-09-03 | 2009-06-17 | 北京大学 | 一种自动识别数字文档目录的方法及装置 |
CN101751379A (zh) * | 2008-12-02 | 2010-06-23 | 北大方正集团有限公司 | 一种电子报纸文档制作的方法和设备 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050289161A1 (en) * | 2004-06-29 | 2005-12-29 | The Boeing Company | Integrated document directory generator apparatus and methods |
-
2010
- 2010-12-22 CN CN201010615308.9A patent/CN102541929B/zh active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101458680A (zh) * | 2008-09-03 | 2009-06-17 | 北京大学 | 一种自动识别数字文档目录的方法及装置 |
CN101751379A (zh) * | 2008-12-02 | 2010-06-23 | 北大方正集团有限公司 | 一种电子报纸文档制作的方法和设备 |
Also Published As
Publication number | Publication date |
---|---|
CN102541929A (zh) | 2012-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102541929B (zh) | 提取版式文档目录的方法及装置 | |
CN106250830B (zh) | 数字图书结构化分析处理方法 | |
US6178417B1 (en) | Method and means of matching documents based on text genre | |
CN102129451B (zh) | 图像检索系统中数据聚类方法 | |
Purandare et al. | Word sense discrimination by clustering contexts in vector and similarity spaces | |
CN105095368B (zh) | 一种对新闻信息进行排序的方法及装置 | |
CN111797239B (zh) | 应用程序的分类方法、装置及终端设备 | |
CN106815263A (zh) | 法律条文的搜索方法及装置 | |
CN102567483A (zh) | 多特征融合的人脸图像搜索方法和系统 | |
Harit et al. | Table detection in document images using header and trailer patterns | |
KR20070102035A (ko) | 문서 분류 시스템 및 그 방법 | |
US20050050086A1 (en) | Apparatus and method for multimedia object retrieval | |
CN103778141A (zh) | 一种混合pdf图书目录自动抽取算法 | |
CN110162632A (zh) | 一种新闻专题事件发现的方法 | |
CN109299235A (zh) | 知识库搜索方法、装置及计算机可读存储介质 | |
CN107291682A (zh) | 一种基于跳转处理及双重校验的多篇电子文档分篇算法 | |
CN105630975A (zh) | 一种信息处理方法和电子设备 | |
CN112035723A (zh) | 资源库的确定方法和装置、存储介质及电子装置 | |
CN115238154A (zh) | 搜索引擎优化系统 | |
Liu et al. | Improving the table boundary detection in pdfs by fixing the sequence error of the sparse lines | |
CN109241315B (zh) | 一种基于深度学习的快速人脸检索方法 | |
CN111191587B (zh) | 一种行人重识别方法及系统 | |
Kulkarni et al. | Knowledge discovery in text mining using association rule extraction | |
TWI396990B (zh) | 引用文獻記錄擷取系統、方法及程式產品 | |
US20020085755A1 (en) | Method for region analysis of document image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
ASS | Succession or assignment of patent right |
Owner name: FOUNDER INFORMATION INDUSTRY HOLDING CO., LTD. BEI Free format text: FORMER OWNER: BEIJING FOUNDER APABI TECHNOLOGY CO., LTD. Effective date: 20131021 |
|
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20131021 Address after: 100871 Beijing, Haidian District Road, building No. 298, founder of the building, Zhongguancun, layer 5 Applicant after: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Applicant after: FOUNDER INFORMATION INDUSTRY HOLDINGS Co.,Ltd. Applicant after: FOUNDER APABI TECHNOLOGY Ltd. Address before: 100871 Beijing, Haidian District Road, building No. 298, founder of the building, Zhongguancun, layer 5 Applicant before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Applicant before: FOUNDER APABI TECHNOLOGY Ltd. |
|
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 100871, Beijing, Haidian District, Cheng Fu Road, No. 298, Zhongguancun Fangzheng building, 5 floor Patentee after: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Patentee after: PKU FOUNDER INFORMATION INDUSTRY GROUP CO.,LTD. Patentee after: FOUNDER APABI TECHNOLOGY Ltd. Address before: 100871, Beijing, Haidian District, Cheng Fu Road, No. 298, Zhongguancun Fangzheng building, 5 floor Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Patentee before: FOUNDER INFORMATION INDUSTRY HOLDINGS Co.,Ltd. Patentee before: FOUNDER APABI TECHNOLOGY Ltd. |
|
TR01 | Transfer of patent right |
Effective date of registration: 20220920 Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031 Patentee after: New founder holdings development Co.,Ltd. Patentee after: FOUNDER APABI TECHNOLOGY Ltd. Address before: 100871, Beijing, Haidian District, Cheng Fu Road, No. 298, Zhongguancun Fangzheng building, 5 floor Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Patentee before: PKU FOUNDER INFORMATION INDUSTRY GROUP CO.,LTD. Patentee before: FOUNDER APABI TECHNOLOGY Ltd. |
|
TR01 | Transfer of patent right |