CN111539312A - Method for extracting table from image - Google Patents
Method for extracting table from image Download PDFInfo
- Publication number
- CN111539312A CN111539312A CN202010318730.1A CN202010318730A CN111539312A CN 111539312 A CN111539312 A CN 111539312A CN 202010318730 A CN202010318730 A CN 202010318730A CN 111539312 A CN111539312 A CN 111539312A
- Authority
- CN
- China
- Prior art keywords
- image
- vertical
- extracting
- horizontal
- directions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 15
- 238000003708 edge detection Methods 0.000 claims abstract description 5
- 238000012545 processing Methods 0.000 claims abstract description 5
- 238000012937 correction Methods 0.000 claims abstract description 4
- 230000002708 enhancing effect Effects 0.000 claims abstract 2
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000004458 analytical method Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 3
- 230000006835 compression Effects 0.000 claims 1
- 238000007906 compression Methods 0.000 claims 1
- 238000012015 optical character recognition Methods 0.000 abstract description 6
- 238000000605 extraction Methods 0.000 abstract description 2
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for extracting a table from an image, which comprises the following steps: s1, converting the image; s2, calculating the image pixel intensity gradient in the vertical and horizontal directions, and performing edge detection; s3, enhancing the edge pixels of the acquired image, and performing image binarization processing; s4, utilizing different structural elements to open in the morphology in the vertical and horizontal directions to find out the objects in the image which conform to the shape of the strip; s5, performing the operation of step S4 in the horizontal and vertical directions, respectively, and overlapping the results of the vertical and horizontal directions as an output; s6, finding out a closed square frame; s7, extracting and marking the position and the content of the table in the image; s8, performing table border correction to obtain the complete information of all tables in the image. The method can automatically finish the cutting of the image in the picture and the extraction of the table, and can improve the efficiency of the prepositive operation and the accuracy of the subsequent operation in the image character recognition tasks such as OCR (optical character recognition) and the like.
Description
Technical Field
The invention relates to the technical field of image processing methods, in particular to a method for extracting a table from an image.
Background
The domestic OCR technology for Chinese character recognition has good research results in recent years, and general recognition can achieve the accuracy of more than 95%, but a general model used for the layout analysis of pictures does not have good expression and universality, and most of the models are purposefully developed and customized. The domestic OCR technology for Chinese character recognition has good research results in recent years, and general recognition can achieve the accuracy of more than 95%, but a general model used for the layout analysis of pictures does not have good expression and universality, and most of the models are purposefully developed and customized.
If the accuracy and flexibility of Chinese character recognition need to be improved, layout analysis is a very important ring. Particularly, for some financial statements, business documents, design engineering drawings and the like, texts and forms are mixed, and how to accurately extract corresponding forms and use different models for calculation is necessary work.
The general table does not have an absolute format, the rows and the columns do not have a fixed quantity, the direction is not fixed, and the style used by the frame of the table is not absolute, so that the complexity and the difficulty of extracting the table are increased.
Disclosure of Invention
In view of the above technical shortcomings, the present invention provides a method for extracting a table from an image, which aims to solve the problems in the background art.
In order to solve the technical problems, the invention adopts the following technical scheme:
the invention provides a method for extracting a table from an image, which comprises the following steps:
s1, converting the original image colorful image into a gray-scale image;
s2, performing image pixel intensity gradient calculation in the vertical and horizontal directions by using a convolution method, and performing edge detection on the processed gray-scale image;
s3, using expansion in image morphology to enhance the edge pixels of the image, and carrying out image binarization processing according to a specific threshold value;
s4, opening the processed image in the morphology by using different structures in the vertical and horizontal directions respectively;
s5, overlapping the results of the vertical direction and the horizontal direction as output;
s6, through analysis of the topological structure, secondary judgment is carried out on the area occupied by the frame obtained in the step S5, whether the frame is reserved as a table or not is determined, and a closed square frame, namely a complete frame of the table, is found out;
s7, extracting and marking the position and the content of the table in the picture according to the result obtained in the step S6, namely extracting the table from the picture;
s8, performing table border correction on the outline characteristics of the table to obtain the complete information of all tables in the image.
Preferably, step S2 is specifically: firstly, evolution convolution is carried out by utilizing a Gaussian filter of 5x5 to achieve a noise reduction effect, and then gradual strength calculation is carried out by utilizing a kernel of 3x 3; the calculation of the gradient is divided into horizontal and vertical differential equations, wherein the horizontal equation is:
Gx(i,j)=Ii+1,j-1-Ii-1,i-j+2Ii+1,j-2Ii-1,j+Ii+1,j+1-Ii-1,j+1
the equation for the vertical direction is:
Gy(i,j)=Ii-1,j+1-Ii-1,i-j+2Ii,j+1-2Ii,j-1+Ii+1,j+1-Ii+1,j-1
finally, the expected gradient value is found out by utilizing L2 norm.
The invention has the beneficial effects that: the method can automatically finish the cutting of the image in the picture and the extraction of the table, and can improve the efficiency of the prepositive operation and the accuracy of the subsequent operation in the image character recognition tasks such as OCR (optical character recognition) and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for extracting a table from an image according to the present invention;
FIG. 2 is an original image in example 1;
FIG. 3 is a transformed grayscale image of example 1;
FIG. 4 is a graph of the edge detection in example 1;
FIG. 5 is the image after the binarization processing in example 1;
FIG. 6 is an image of embodiment 1 undergoing structure element for opening in morphology;
FIG. 7 is an output image of the result in the horizontal direction in example 1;
FIG. 8 is an output image of the result in the vertical direction in example 1;
FIG. 9 is an output image in which the vertical and horizontal directions coincide in embodiment 1;
FIG. 10 is a table image extracted in example 1;
fig. 11 is a frame-corrected form image in example 1.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1, a method of extracting a table from an image includes the steps of:
s1, converting the original image colorful image into a gray-scale image, as shown in figures 2-3;
s2, performing image pixel intensity gradient calculation in the vertical and horizontal directions by using a convolution method, and performing edge detection on the processed grayscale image, as shown in fig. 4:
firstly, evolution convolution is carried out by utilizing a Gaussian filter of 5x5 to achieve a noise reduction effect, and then gradual strength calculation is carried out by utilizing a kernel of 3x 3; the calculation of the gradient is divided into horizontal and vertical differential equations, wherein the horizontal equation is:
Gx(i,j)=Ii+1,j-1-Ii-1,i-j+2Ii+1,j-2Ii-1,j+Ii+1,j+1-Ii-1,j+1
the equation for the vertical direction is:
Gy(i,j)=Ii-1,j+1-Ii-1,i-j+2Ii,j+1-2Ii,j-1+Ii+1,j+1-Ii+1,j-1
finally, finding out an expected gradient value by using L2 norm;
s3, using the expansion in image morphology to enhance the edge pixels of the acquired image, and performing image binarization processing according to a specific threshold, see fig. 5;
s4, starting the morphology of the processed image in vertical and horizontal directions by using different structural elements, as shown in fig. 6;
s5, superimposing the results of the vertical and horizontal directions as an output, see fig. 7-9;
s6, through analysis of the topological structure, secondary judgment is carried out on the area occupied by the frame obtained in the step S5, whether the frame is reserved as a table or not is determined, and a closed square frame, namely a complete frame of the table, is found out;
s7, extracting and marking the position and content of the table in the picture according to the result obtained in the step S6, namely extracting the table from the picture, as shown in figure 10;
s8, performing table border correction on the outline characteristics of the table to obtain the complete information of all the tables in the image, as shown in FIG. 11.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (2)
1. A method of extracting a form from an image, comprising the steps of:
s1, converting the original image from the color image into a gray-scale image;
s2, calculating the image pixel intensity gradient in the vertical and horizontal directions by using a convolution method, carrying out edge detection on the processed gray-scale image, and judging conditions by using Non-Max medium compression according to the calculated gradient value and direction to find out edge errors;
s3, using the expansion in the image morphology to enhance the edge pixels of the acquired image, carrying out image binarization processing according to a specific threshold value, deleting noise, enhancing edge characteristics, determining that the edge is greater than the threshold value is set as 1, and setting the threshold value as 0;
s4, respectively opening the processed image in the morphology in the vertical and horizontal directions by using different structural elements, finding out the object in the image which conforms to the shape of the strip, and removing characters or other symbol objects;
s5, performing the operation of step S4 in the horizontal and vertical directions, respectively, adding the results of the two directions to locate the table position and size in the image, and superimposing the results of the vertical and horizontal directions as an output;
s6, finding out a closed square frame, namely a complete frame of the table, through the analysis of the topological structure;
s7, extracting and marking the position and content of the table in the image according to the result obtained in the step S6, namely extracting the table from the image;
s8, performing table border correction on the outline characteristics of the table to obtain the complete information of all tables in the image.
2. The method for extracting table from image as claimed in claim 1, wherein the step S2 is specifically: firstly, evolution convolution is carried out by utilizing a Gaussian filter of 5x5 to achieve a noise reduction effect, and then gradual strength calculation is carried out by utilizing a kernel of 3x 3; the calculation of the gradient is divided into horizontal and vertical differential equations, wherein the horizontal equation is:
Gx(i,j)=Ii+1,j-1-Ii-1,i-j+2Ii+1,j-2Ii-1,j+Ii+1,j+1-Ii-1,j+1
the equation for the vertical direction is:
Gy(i,j)=Ii-1,j+1-Ii-1,i-j+2Ii,j+1-2Ii,j-1+Ii+1,j+1-Ii+1,j-1
finally, the expected gradient value is found by using L2 norm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010318730.1A CN111539312A (en) | 2020-04-21 | 2020-04-21 | Method for extracting table from image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010318730.1A CN111539312A (en) | 2020-04-21 | 2020-04-21 | Method for extracting table from image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111539312A true CN111539312A (en) | 2020-08-14 |
Family
ID=71979426
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010318730.1A Pending CN111539312A (en) | 2020-04-21 | 2020-04-21 | Method for extracting table from image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111539312A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113284096A (en) * | 2021-05-08 | 2021-08-20 | 北京印刷学院 | Counting method of medicine box inner medicine plates based on high-frequency information and contour information |
TWI824757B (en) * | 2022-10-06 | 2023-12-01 | 普匯金融科技股份有限公司 | Electronic computing device, method for identifying grid lines position in a table, and computer program product thereof |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106611424A (en) * | 2016-03-23 | 2017-05-03 | 四川用联信息技术有限公司 | Image edge extraction method |
CN106815851A (en) * | 2017-01-24 | 2017-06-09 | 电子科技大学 | A kind of grid circle oil level indicator automatic reading method of view-based access control model measurement |
CN108090929A (en) * | 2017-12-04 | 2018-05-29 | 国家海洋局第海洋研究所 | The linear anomaly analysis extraction novel method in mining area |
CN108280823A (en) * | 2017-12-29 | 2018-07-13 | 南京邮电大学 | The detection method and system of the weak edge faults of cable surface in a kind of industrial production |
CN109543525A (en) * | 2018-10-18 | 2019-03-29 | 成都中科信息技术有限公司 | A kind of table extracting method of form of general use image |
CN110033471A (en) * | 2019-04-19 | 2019-07-19 | 福州大学 | A kind of wire detection method based on connected domain analysis and morphological operation |
CN110032989A (en) * | 2019-04-23 | 2019-07-19 | 福州大学 | A kind of form document image classification method based on wire feature and pixel distribution |
US20200089946A1 (en) * | 2018-06-11 | 2020-03-19 | Innoplexus Ag | System and method for extracting tabular data from electronic document |
-
2020
- 2020-04-21 CN CN202010318730.1A patent/CN111539312A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106611424A (en) * | 2016-03-23 | 2017-05-03 | 四川用联信息技术有限公司 | Image edge extraction method |
CN106815851A (en) * | 2017-01-24 | 2017-06-09 | 电子科技大学 | A kind of grid circle oil level indicator automatic reading method of view-based access control model measurement |
CN108090929A (en) * | 2017-12-04 | 2018-05-29 | 国家海洋局第海洋研究所 | The linear anomaly analysis extraction novel method in mining area |
CN108280823A (en) * | 2017-12-29 | 2018-07-13 | 南京邮电大学 | The detection method and system of the weak edge faults of cable surface in a kind of industrial production |
US20200089946A1 (en) * | 2018-06-11 | 2020-03-19 | Innoplexus Ag | System and method for extracting tabular data from electronic document |
CN109543525A (en) * | 2018-10-18 | 2019-03-29 | 成都中科信息技术有限公司 | A kind of table extracting method of form of general use image |
CN110033471A (en) * | 2019-04-19 | 2019-07-19 | 福州大学 | A kind of wire detection method based on connected domain analysis and morphological operation |
CN110032989A (en) * | 2019-04-23 | 2019-07-19 | 福州大学 | A kind of form document image classification method based on wire feature and pixel distribution |
Non-Patent Citations (1)
Title |
---|
王绪等: "基于投影特征与结构特征的表格图像识别", 《计算机工程》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113284096A (en) * | 2021-05-08 | 2021-08-20 | 北京印刷学院 | Counting method of medicine box inner medicine plates based on high-frequency information and contour information |
CN113284096B (en) * | 2021-05-08 | 2023-08-25 | 北京印刷学院 | Counting method for medicine plates in medicine box based on high-frequency information and contour information |
TWI824757B (en) * | 2022-10-06 | 2023-12-01 | 普匯金融科技股份有限公司 | Electronic computing device, method for identifying grid lines position in a table, and computer program product thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109241894B (en) | Bill content identification system and method based on form positioning and deep learning | |
TWI536277B (en) | Form identification method and device | |
CN104751142B (en) | A kind of natural scene Method for text detection based on stroke feature | |
CN105760842A (en) | Station caption identification method based on combination of edge and texture features | |
CN102930277A (en) | Character picture verification code identifying method based on identification feedback | |
CN112528997B (en) | Tibetan-Chinese bilingual scene text detection method based on text center region amplification | |
CN112712273B (en) | Handwriting Chinese character aesthetic degree judging method based on skeleton similarity | |
CN110751154B (en) | Complex environment multi-shape text detection method based on pixel-level segmentation | |
CN111507351B (en) | Ancient book document digitizing method | |
CN103258201A (en) | Form line extraction method integrating global information and local information | |
CN103218605A (en) | Quick eye locating method based on integral projection and edge detection | |
CN110738030A (en) | Table reconstruction method and device, electronic equipment and storage medium | |
CN108108731A (en) | Method for text detection and device based on generated data | |
CN111539312A (en) | Method for extracting table from image | |
CN105447508A (en) | Identification method and system for character image verification codes | |
CN112560850A (en) | Automatic identity card information extraction and authenticity verification method based on custom template | |
CN106980857A (en) | A kind of Brush calligraphy segmentation recognition method based on rubbings | |
CN107977648B (en) | Identification card definition distinguishing method and system based on face recognition | |
CN115273115A (en) | Document element labeling method and device, electronic equipment and storage medium | |
CN112686265A (en) | Hierarchic contour extraction-based pictograph segmentation method | |
CN109271882B (en) | Method for extracting color-distinguished handwritten Chinese characters | |
CN106709437A (en) | Improved intelligent processing method for image-text information of scanning copy of early patent documents | |
CN108647713B (en) | Embryo boundary identification and laser track fitting method | |
CN110473222A (en) | Image-element extracting method and device | |
TWI430187B (en) | License plate number identification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200814 |