JP7132050B2 - テキスト行の区分化方法 - Google Patents
テキスト行の区分化方法 Download PDFInfo
- Publication number
- JP7132050B2 JP7132050B2 JP2018172774A JP2018172774A JP7132050B2 JP 7132050 B2 JP7132050 B2 JP 7132050B2 JP 2018172774 A JP2018172774 A JP 2018172774A JP 2018172774 A JP2018172774 A JP 2018172774A JP 7132050 B2 JP7132050 B2 JP 7132050B2
- Authority
- JP
- Japan
- Prior art keywords
- connected components
- subset
- text
- row
- height
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/412—Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—Two-dimensional [2D] image generation
- G06T11/20—Drawing from basic elements
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/48—Extraction of image or video features by mapping characteristic values of the pattern into a parameter space, e.g. Hough transformation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
- G06V10/763—Non-hierarchical techniques, e.g. based on statistics of modelling distributions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/12—Bounding box
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Graphics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Geometry (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Character Input (AREA)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/828,110 | 2017-11-30 | ||
| US15/828,110 US10318803B1 (en) | 2017-11-30 | 2017-11-30 | Text line segmentation method |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| JP2019102061A JP2019102061A (ja) | 2019-06-24 |
| JP2019102061A5 JP2019102061A5 (https=) | 2019-07-25 |
| JP7132050B2 true JP7132050B2 (ja) | 2022-09-06 |
Family
ID=66634070
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2018172774A Active JP7132050B2 (ja) | 2017-11-30 | 2018-09-14 | テキスト行の区分化方法 |
Country Status (2)
| Country | Link |
|---|---|
| US (1) | US10318803B1 (https=) |
| JP (1) | JP7132050B2 (https=) |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107471648B (zh) * | 2017-05-23 | 2018-10-12 | 珠海赛纳打印科技股份有限公司 | 用于打印技术的图像数据处理方法以及打印系统 |
| US10956730B2 (en) * | 2019-02-15 | 2021-03-23 | Wipro Limited | Method and system for identifying bold text in a digital document |
| CN110619333B (zh) * | 2019-08-15 | 2022-06-14 | 平安国际智慧城市科技股份有限公司 | 一种文本行分割方法、文本行分割装置及电子设备 |
| CN111695540B (zh) * | 2020-06-17 | 2023-05-30 | 北京字节跳动网络技术有限公司 | 视频边框识别方法及裁剪方法、装置、电子设备及介质 |
| CN112561928B (zh) * | 2020-12-10 | 2024-03-08 | 西藏大学 | 一种藏文古籍的版面分析方法及系统 |
| CN112926590B (zh) * | 2021-03-18 | 2023-12-01 | 上海晨兴希姆通电子科技有限公司 | 线缆上字符的分割识别方法及其系统 |
| CN115290661B (zh) * | 2022-09-28 | 2022-12-16 | 江苏浚荣升新材料科技有限公司 | 基于计算机视觉的橡胶圈缺陷识别方法 |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2003281468A (ja) | 2002-03-20 | 2003-10-03 | Toshiba Corp | 文字認識装置および文字認識方法 |
Family Cites Families (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5513304A (en) * | 1993-04-19 | 1996-04-30 | Xerox Corporation | Method and apparatus for enhanced automatic determination of text line dependent parameters |
| US5588072A (en) * | 1993-12-22 | 1996-12-24 | Canon Kabushiki Kaisha | Method and apparatus for selecting blocks of image data from image data having both horizontally- and vertically-oriented blocks |
| JP3837193B2 (ja) * | 1996-05-13 | 2006-10-25 | 松下電器産業株式会社 | 文字行抽出方法および装置 |
| US5953451A (en) * | 1997-06-19 | 1999-09-14 | Xerox Corporation | Method of indexing words in handwritten document images using image hash tables |
| US20020037097A1 (en) * | 2000-05-15 | 2002-03-28 | Hector Hoyos | Coupon recognition system |
| US7130445B2 (en) * | 2002-01-07 | 2006-10-31 | Xerox Corporation | Systems and methods for authenticating and verifying documents |
| US8649600B2 (en) * | 2009-07-10 | 2014-02-11 | Palo Alto Research Center Incorporated | System and method for segmenting text lines in documents |
| US20110052094A1 (en) * | 2009-08-28 | 2011-03-03 | Chunyu Gao | Skew Correction for Scanned Japanese/English Document Images |
| US8606011B1 (en) * | 2012-06-07 | 2013-12-10 | Amazon Technologies, Inc. | Adaptive thresholding for image recognition |
| US8965127B2 (en) * | 2013-03-14 | 2015-02-24 | Konica Minolta Laboratory U.S.A., Inc. | Method for segmenting text words in document images |
| US9235755B2 (en) * | 2013-08-15 | 2016-01-12 | Konica Minolta Laboratory U.S.A., Inc. | Removal of underlines and table lines in document images while preserving intersecting character strokes |
| US9104940B2 (en) | 2013-08-30 | 2015-08-11 | Konica Minolta Laboratory U.S.A., Inc. | Line segmentation method applicable to document images containing handwriting and printed text characters or skewed text lines |
| US9430703B2 (en) * | 2014-12-19 | 2016-08-30 | Konica Minolta Laboratory U.S.A., Inc. | Method for segmenting text words in document images using vertical projections of center zones of characters |
| US9852348B2 (en) * | 2015-04-17 | 2017-12-26 | Google Llc | Document scanner |
| US20170091948A1 (en) * | 2015-09-30 | 2017-03-30 | Konica Minolta Laboratory U.S.A., Inc. | Method and system for automated analysis of cell images |
| US10127673B1 (en) * | 2016-12-16 | 2018-11-13 | Workday, Inc. | Word bounding box detection |
-
2017
- 2017-11-30 US US15/828,110 patent/US10318803B1/en active Active
-
2018
- 2018-09-14 JP JP2018172774A patent/JP7132050B2/ja active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2003281468A (ja) | 2002-03-20 | 2003-10-03 | Toshiba Corp | 文字認識装置および文字認識方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2019102061A (ja) | 2019-06-24 |
| US20190163971A1 (en) | 2019-05-30 |
| US10318803B1 (en) | 2019-06-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7132050B2 (ja) | テキスト行の区分化方法 | |
| KR101690981B1 (ko) | 형태 인식 방법 및 디바이스 | |
| JP2019102061A5 (https=) | ||
| CN106446896B (zh) | 一种字符分割方法、装置及电子设备 | |
| USRE47889E1 (en) | System and method for segmenting text lines in documents | |
| US8675974B2 (en) | Image processing apparatus and image processing method | |
| US9104940B2 (en) | Line segmentation method applicable to document images containing handwriting and printed text characters or skewed text lines | |
| US8442319B2 (en) | System and method for classifying connected groups of foreground pixels in scanned document images according to the type of marking | |
| WO2021017260A1 (zh) | 多语言文本识别方法、装置、计算机设备及存储介质 | |
| Kumar et al. | Handwritten Arabic text line segmentation using affinity propagation | |
| CN109740606B (zh) | 一种图像识别方法及装置 | |
| CN107209942B (zh) | 对象检测方法和图像检索系统 | |
| CN110503054B (zh) | 文本图像的处理方法及装置 | |
| US5359671A (en) | Character-recognition systems and methods with means to measure endpoint features in character bit-maps | |
| CN102831416A (zh) | 一种字符识别方法及相关装置 | |
| Salvi et al. | Handwritten text segmentation using average longest path algorithm | |
| TW200529093A (en) | Face image detection method, face image detection system, and face image detection program | |
| CN112381458A (zh) | 项目评审方法、项目评审装置、设备及存储介质 | |
| CN117612179A (zh) | 图像中字符识别方法、装置、电子设备及存储介质 | |
| S Deshmukh et al. | A hybrid character segmentation approach for cursive unconstrained handwritten historical Modi script documents | |
| CN114782973B (zh) | 文档分类方法、训练方法、设备和存储介质 | |
| Abdoli et al. | Offline signature verification using geodesic derivative pattern | |
| US9104450B2 (en) | Graphical user interface component classification | |
| Suresha et al. | Segmentation of handwritten text lines with touching of line | |
| CN111488870A (zh) | 文字识别方法和文字识别装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A524 | Written submission of copy of amendment under article 19 pct |
Free format text: JAPANESE INTERMEDIATE CODE: A524 Effective date: 20190417 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20210823 |
|
| TRDD | Decision of grant or rejection written | ||
| A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20220802 |
|
| A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20220825 |
|
| R150 | Certificate of patent or registration of utility model |
Ref document number: 7132050 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |