CN103186911A - 一种处理扫描书数据的方法及装置 - Google Patents
一种处理扫描书数据的方法及装置 Download PDFInfo
- Publication number
- CN103186911A CN103186911A CN201110448225XA CN201110448225A CN103186911A CN 103186911 A CN103186911 A CN 103186911A CN 201110448225X A CN201110448225X A CN 201110448225XA CN 201110448225 A CN201110448225 A CN 201110448225A CN 103186911 A CN103186911 A CN 103186911A
- Authority
- CN
- China
- Prior art keywords
- literal
- page
- style
- page documents
- character image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000002776 aggregation Effects 0.000 claims abstract description 28
- 238000004220 aggregation Methods 0.000 claims abstract description 28
- 238000012217 deletion Methods 0.000 claims description 31
- 230000037430 deletion Effects 0.000 claims description 31
- 230000006835 compression Effects 0.000 claims description 11
- 238000007906 compression Methods 0.000 claims description 11
- 238000005516 engineering process Methods 0.000 claims description 11
- 238000006116 polymerization reaction Methods 0.000 claims description 10
- 230000003287 optical effect Effects 0.000 claims description 5
- 238000012015 optical character recognition Methods 0.000 description 9
- 230000008707 rearrangement Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 210000004276 hyalin Anatomy 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 206010021703 Indifference Diseases 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N1/00—Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
- H04N1/41—Bandwidth or redundancy reduction
- H04N1/411—Bandwidth or redundancy reduction for the transmission or storage or reproduction of two-tone pictures, e.g. black and white pictures
- H04N1/4115—Bandwidth or redundancy reduction for the transmission or storage or reproduction of two-tone pictures, e.g. black and white pictures involving the recognition of specific patterns, e.g. by symbol matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
- Processing Or Creating Images (AREA)
- Editing Of Facsimile Originals (AREA)
Abstract
Description
确切矩形框位置信息 | 字符编码对应的文字 | 编号 |
… | … | … |
(100,70) | 道 | 31 |
(110,70) | g | 32 |
(118,70) | o | 33 |
(125,70) | o | 33 |
(132,70) | - | 34 |
(138,70) | g | 35 |
(145,70) | o | 33 |
(151,70) | o | 33 |
… | … | … |
Claims (10)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110448225.XA CN103186911B (zh) | 2011-12-28 | 2011-12-28 | 一种处理扫描书数据的方法及装置 |
US13/730,387 US8995768B2 (en) | 2011-12-28 | 2012-12-28 | Methods and devices for processing scanned book's data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110448225.XA CN103186911B (zh) | 2011-12-28 | 2011-12-28 | 一种处理扫描书数据的方法及装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103186911A true CN103186911A (zh) | 2013-07-03 |
CN103186911B CN103186911B (zh) | 2015-07-15 |
Family
ID=48678068
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110448225.XA Active CN103186911B (zh) | 2011-12-28 | 2011-12-28 | 一种处理扫描书数据的方法及装置 |
Country Status (2)
Country | Link |
---|---|
US (1) | US8995768B2 (zh) |
CN (1) | CN103186911B (zh) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104376317A (zh) * | 2013-08-12 | 2015-02-25 | 福建福昕软件开发股份有限公司北京分公司 | 一种将纸质文件转换为电子文件的方法 |
CN104715497A (zh) * | 2014-12-30 | 2015-06-17 | 上海孩子国科教设备有限公司 | 数据替换的方法及系统 |
CN105404683A (zh) * | 2015-11-30 | 2016-03-16 | 北大方正集团有限公司 | 一种版式文档处理方法及装置 |
CN106104570A (zh) * | 2014-03-11 | 2016-11-09 | 微软技术许可有限责任公司 | 检测和提取图像文档组件来创建流文档 |
CN107103597A (zh) * | 2016-02-19 | 2017-08-29 | 青岛海信电器股份有限公司 | 一种像素点位置确定方法和装置 |
CN107291342A (zh) * | 2017-05-03 | 2017-10-24 | 广东小天才科技有限公司 | 一种点读数据的勾勒方法及装置 |
CN107301418A (zh) * | 2017-06-28 | 2017-10-27 | 江南大学 | 光学字符识别中的版面分析 |
CN109479081A (zh) * | 2017-07-03 | 2019-03-15 | 京瓷办公信息系统株式会社 | 原稿读取装置 |
CN110852326A (zh) * | 2019-11-06 | 2020-02-28 | 贵州工程应用技术学院 | 一种手写体版面分析和多风格古籍背景融合方法 |
US10755594B2 (en) | 2015-11-20 | 2020-08-25 | Chrysus Intellectual Properties Limited | Method and system for analyzing a piece of text |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015175824A1 (en) * | 2014-05-16 | 2015-11-19 | AppCard, Inc. | Method and system for improved optical character recognition |
KR20160027692A (ko) * | 2014-09-02 | 2016-03-10 | 엘지전자 주식회사 | 화면 겹치기를 통해 디지털 컨텐츠를 복사하는 디지털 디바이스 및 그 제어 방법 |
CN105373790B (zh) * | 2015-10-23 | 2019-02-05 | 北京汉王数字科技有限公司 | 版面分析方法和装置 |
CN110309703B (zh) * | 2019-04-25 | 2021-07-27 | 东莞市七宝树教育科技有限公司 | 一种智能适应识别并切割试卷答案区域的方法及其系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101458699A (zh) * | 2007-12-12 | 2009-06-17 | 佳能株式会社 | 图像处理装置和图像处理方法 |
CN101558425A (zh) * | 2007-06-29 | 2009-10-14 | 佳能株式会社 | 图像处理设备、图像处理方法和计算机程序 |
CN101689203A (zh) * | 2007-06-29 | 2010-03-31 | 佳能株式会社 | 图像处理设备、图像处理方法和计算机程序 |
CN101782896A (zh) * | 2009-01-21 | 2010-07-21 | 汉王科技股份有限公司 | 结合ocr技术的pdf文字提取方法 |
CN102081732A (zh) * | 2010-12-29 | 2011-06-01 | 方正国际软件有限公司 | 一种版式识别模板方法及系统 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5825919A (en) * | 1992-12-17 | 1998-10-20 | Xerox Corporation | Technique for generating bounding boxes for word spotting in bitmap images |
US5373566A (en) * | 1992-12-24 | 1994-12-13 | Motorola, Inc. | Neural network-based diacritical marker recognition system and method |
US5577135A (en) * | 1994-03-01 | 1996-11-19 | Apple Computer, Inc. | Handwriting signal processing front-end for handwriting recognizers |
US5999647A (en) * | 1995-04-21 | 1999-12-07 | Matsushita Electric Industrial Co., Ltd. | Character extraction apparatus for extracting character data from a text image |
US6188790B1 (en) * | 1996-02-29 | 2001-02-13 | Tottori Sanyo Electric Ltd. | Method and apparatus for pre-recognition character processing |
US6636631B2 (en) * | 1998-06-04 | 2003-10-21 | Matsushita Electric Industrial Co., Ltd. | Optical character reading method and system for a document with ruled lines and its application |
US6249605B1 (en) * | 1998-09-14 | 2001-06-19 | International Business Machines Corporation | Key character extraction and lexicon reduction for cursive text recognition |
US6487311B1 (en) * | 1999-05-04 | 2002-11-26 | International Business Machines Corporation | OCR-based image compression |
US6681044B1 (en) * | 2000-03-29 | 2004-01-20 | Matsushita Electric Industrial Co., Ltd. | Retrieval of cursive Chinese handwritten annotations based on radical model |
JP3425408B2 (ja) * | 2000-05-31 | 2003-07-14 | 株式会社東芝 | 文書読取装置 |
US8065321B2 (en) * | 2007-06-20 | 2011-11-22 | Ricoh Company, Ltd. | Apparatus and method of searching document data |
US20090202151A1 (en) * | 2008-02-13 | 2009-08-13 | Kabushiki Kaisha Toshiba | Format processing apparatus for document image and format processing method for the same |
US8331680B2 (en) * | 2008-06-23 | 2012-12-11 | International Business Machines Corporation | Method of gray-level optical segmentation and isolation using incremental connected components |
KR20110091296A (ko) * | 2010-02-05 | 2011-08-11 | 삼성전자주식회사 | 문서 작성 장치 및 방법 |
-
2011
- 2011-12-28 CN CN201110448225.XA patent/CN103186911B/zh active Active
-
2012
- 2012-12-28 US US13/730,387 patent/US8995768B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101558425A (zh) * | 2007-06-29 | 2009-10-14 | 佳能株式会社 | 图像处理设备、图像处理方法和计算机程序 |
CN101689203A (zh) * | 2007-06-29 | 2010-03-31 | 佳能株式会社 | 图像处理设备、图像处理方法和计算机程序 |
CN101458699A (zh) * | 2007-12-12 | 2009-06-17 | 佳能株式会社 | 图像处理装置和图像处理方法 |
CN101782896A (zh) * | 2009-01-21 | 2010-07-21 | 汉王科技股份有限公司 | 结合ocr技术的pdf文字提取方法 |
CN102081732A (zh) * | 2010-12-29 | 2011-06-01 | 方正国际软件有限公司 | 一种版式识别模板方法及系统 |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104376317B (zh) * | 2013-08-12 | 2018-12-14 | 福建福昕软件开发股份有限公司北京分公司 | 一种将纸质文件转换为电子文件的方法 |
CN104376317A (zh) * | 2013-08-12 | 2015-02-25 | 福建福昕软件开发股份有限公司北京分公司 | 一种将纸质文件转换为电子文件的方法 |
CN106104570A (zh) * | 2014-03-11 | 2016-11-09 | 微软技术许可有限责任公司 | 检测和提取图像文档组件来创建流文档 |
CN106104570B (zh) * | 2014-03-11 | 2019-10-25 | 微软技术许可有限责任公司 | 检测和提取图像文档组件来创建流文档 |
CN104715497A (zh) * | 2014-12-30 | 2015-06-17 | 上海孩子国科教设备有限公司 | 数据替换的方法及系统 |
US10755594B2 (en) | 2015-11-20 | 2020-08-25 | Chrysus Intellectual Properties Limited | Method and system for analyzing a piece of text |
CN105404683A (zh) * | 2015-11-30 | 2016-03-16 | 北大方正集团有限公司 | 一种版式文档处理方法及装置 |
CN107103597B (zh) * | 2016-02-19 | 2020-04-21 | 青岛海信电器股份有限公司 | 一种像素点位置确定方法和装置 |
CN107103597A (zh) * | 2016-02-19 | 2017-08-29 | 青岛海信电器股份有限公司 | 一种像素点位置确定方法和装置 |
CN107291342A (zh) * | 2017-05-03 | 2017-10-24 | 广东小天才科技有限公司 | 一种点读数据的勾勒方法及装置 |
CN107291342B (zh) * | 2017-05-03 | 2020-01-31 | 广东小天才科技有限公司 | 一种点读数据的复制勾勒方法及装置 |
CN107301418A (zh) * | 2017-06-28 | 2017-10-27 | 江南大学 | 光学字符识别中的版面分析 |
CN109479081A (zh) * | 2017-07-03 | 2019-03-15 | 京瓷办公信息系统株式会社 | 原稿读取装置 |
CN109479081B (zh) * | 2017-07-03 | 2019-12-17 | 京瓷办公信息系统株式会社 | 原稿读取装置 |
CN110852326A (zh) * | 2019-11-06 | 2020-02-28 | 贵州工程应用技术学院 | 一种手写体版面分析和多风格古籍背景融合方法 |
CN110852326B (zh) * | 2019-11-06 | 2022-11-04 | 贵州工程应用技术学院 | 一种手写体版面分析和多风格古籍背景融合方法 |
Also Published As
Publication number | Publication date |
---|---|
US8995768B2 (en) | 2015-03-31 |
CN103186911B (zh) | 2015-07-15 |
US20130170751A1 (en) | 2013-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103186911A (zh) | 一种处理扫描书数据的方法及装置 | |
US20190294399A1 (en) | Method and device for parsing tables in pdf document | |
US8295590B2 (en) | Method and system for creating a form template for a form | |
US8736869B2 (en) | Layout print system, method for viewing layout document, and program product | |
TW399179B (en) | Method and apparatus for compressing slice-oriented bitmaps | |
CN100568263C (zh) | 布局分析设备和布局分析方法 | |
JP5455038B2 (ja) | 画像処理装置、画像処理方法、及びプログラム | |
US6959121B2 (en) | Document image processing device, document image processing method, and memory medium | |
CN103914496B (zh) | 一种页面校对的方法和装置 | |
CN101443790A (zh) | 数字图像中的非回流内容的有效处理 | |
US11630621B2 (en) | Information processing apparatus and non-transitory computer readable medium | |
CN101388111A (zh) | 图像处理装置和图像处理方法 | |
CN105335453A (zh) | 图像分文档方法 | |
US10586125B2 (en) | Line removal method, apparatus, and computer-readable medium | |
JPH09198511A (ja) | シンボルの分類方法 | |
CN102915429B (zh) | 一种扫描图片匹配方法和装置 | |
US20210279459A1 (en) | System for identifying and linking entity relationships in documents | |
US9218327B2 (en) | Optimizing the layout of electronic documents by reducing presentation size of content within document sections so that when combined a plurality of document sections fit within a page | |
US20080266606A1 (en) | Optimized print layout | |
CN104376317A (zh) | 一种将纸质文件转换为电子文件的方法 | |
CN103095964A (zh) | 一种页面点阵压缩方法及装置 | |
US20190005038A1 (en) | Method and apparatus for grouping documents based on high-level features clustering | |
EP3299949A1 (en) | Method of storing record information | |
JP2009011874A (ja) | 帳票仕分け方法及び該帳票仕分け方法を用いた光学的文字読取システム | |
US8634094B2 (en) | Image processing apparatus, image processing method and non-transitory computer readable medium storing program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
ASS | Succession or assignment of patent right |
Owner name: FOUNDER INFORMATION INDUSTRY HOLDING CO., LTD. BEI Free format text: FORMER OWNER: BEIJING FOUNDER APABI TECHNOLOGY CO., LTD. Effective date: 20130902 |
|
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20130902 Address after: 100871 Beijing, Haidian District into the house road, founder of the building on the 9 floor, No. 298 Applicant after: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Applicant after: FOUNDER INFORMATION INDUSTRY HOLDINGS Co.,Ltd. Applicant after: FOUNDER APABI TECHNOLOGY Ltd. Address before: 100871 Beijing, Haidian District into the house road, founder of the building on the 9 floor, No. 298 Applicant before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Applicant before: FOUNDER APABI TECHNOLOGY Ltd. |
|
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 100871, Beijing, Haidian District Cheng Fu Road 298, founder building, 9 floor Patentee after: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Patentee after: PKU FOUNDER INFORMATION INDUSTRY GROUP CO.,LTD. Patentee after: FOUNDER APABI TECHNOLOGY Ltd. Address before: 100871, Beijing, Haidian District Cheng Fu Road 298, founder building, 9 floor Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Patentee before: FOUNDER INFORMATION INDUSTRY HOLDINGS Co.,Ltd. Patentee before: FOUNDER APABI TECHNOLOGY Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220908 Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031 Patentee after: New founder holdings development Co.,Ltd. Patentee after: FOUNDER APABI TECHNOLOGY Ltd. Address before: 100871, Beijing, Haidian District Cheng Fu Road 298, founder building, 9 floor Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Patentee before: PKU FOUNDER INFORMATION INDUSTRY GROUP CO.,LTD. Patentee before: FOUNDER APABI TECHNOLOGY Ltd. |