CN111539312A - Method for extracting table from image - Google Patents

Method for extracting table from image Download PDF

Info

Publication number
CN111539312A
CN111539312A CN202010318730.1A CN202010318730A CN111539312A CN 111539312 A CN111539312 A CN 111539312A CN 202010318730 A CN202010318730 A CN 202010318730A CN 111539312 A CN111539312 A CN 111539312A
Authority
CN
China
Prior art keywords
image
vertical
extracting
horizontal
directions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010318730.1A
Other languages
Chinese (zh)
Inventor
罗嘉杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202010318730.1A priority Critical patent/CN111539312A/en
Publication of CN111539312A publication Critical patent/CN111539312A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for extracting a table from an image, which comprises the following steps: s1, converting the image; s2, calculating the image pixel intensity gradient in the vertical and horizontal directions, and performing edge detection; s3, enhancing the edge pixels of the acquired image, and performing image binarization processing; s4, utilizing different structural elements to open in the morphology in the vertical and horizontal directions to find out the objects in the image which conform to the shape of the strip; s5, performing the operation of step S4 in the horizontal and vertical directions, respectively, and overlapping the results of the vertical and horizontal directions as an output; s6, finding out a closed square frame; s7, extracting and marking the position and the content of the table in the image; s8, performing table border correction to obtain the complete information of all tables in the image. The method can automatically finish the cutting of the image in the picture and the extraction of the table, and can improve the efficiency of the prepositive operation and the accuracy of the subsequent operation in the image character recognition tasks such as OCR (optical character recognition) and the like.

Description

Method for extracting table from image
Technical Field
The invention relates to the technical field of image processing methods, in particular to a method for extracting a table from an image.
Background
The domestic OCR technology for Chinese character recognition has good research results in recent years, and general recognition can achieve the accuracy of more than 95%, but a general model used for the layout analysis of pictures does not have good expression and universality, and most of the models are purposefully developed and customized. The domestic OCR technology for Chinese character recognition has good research results in recent years, and general recognition can achieve the accuracy of more than 95%, but a general model used for the layout analysis of pictures does not have good expression and universality, and most of the models are purposefully developed and customized.
If the accuracy and flexibility of Chinese character recognition need to be improved, layout analysis is a very important ring. Particularly, for some financial statements, business documents, design engineering drawings and the like, texts and forms are mixed, and how to accurately extract corresponding forms and use different models for calculation is necessary work.
The general table does not have an absolute format, the rows and the columns do not have a fixed quantity, the direction is not fixed, and the style used by the frame of the table is not absolute, so that the complexity and the difficulty of extracting the table are increased.
Disclosure of Invention
In view of the above technical shortcomings, the present invention provides a method for extracting a table from an image, which aims to solve the problems in the background art.
In order to solve the technical problems, the invention adopts the following technical scheme:
the invention provides a method for extracting a table from an image, which comprises the following steps:
s1, converting the original image colorful image into a gray-scale image;
s2, performing image pixel intensity gradient calculation in the vertical and horizontal directions by using a convolution method, and performing edge detection on the processed gray-scale image;
s3, using expansion in image morphology to enhance the edge pixels of the image, and carrying out image binarization processing according to a specific threshold value;
s4, opening the processed image in the morphology by using different structures in the vertical and horizontal directions respectively;
s5, overlapping the results of the vertical direction and the horizontal direction as output;
s6, through analysis of the topological structure, secondary judgment is carried out on the area occupied by the frame obtained in the step S5, whether the frame is reserved as a table or not is determined, and a closed square frame, namely a complete frame of the table, is found out;
s7, extracting and marking the position and the content of the table in the picture according to the result obtained in the step S6, namely extracting the table from the picture;
s8, performing table border correction on the outline characteristics of the table to obtain the complete information of all tables in the image.
Preferably, step S2 is specifically: firstly, evolution convolution is carried out by utilizing a Gaussian filter of 5x5 to achieve a noise reduction effect, and then gradual strength calculation is carried out by utilizing a kernel of 3x 3; the calculation of the gradient is divided into horizontal and vertical differential equations, wherein the horizontal equation is:
Gx(i,j)=Ii+1,j-1-Ii-1,i-j+2Ii+1,j-2Ii-1,j+Ii+1,j+1-Ii-1,j+1
the equation for the vertical direction is:
Gy(i,j)=Ii-1,j+1-Ii-1,i-j+2Ii,j+1-2Ii,j-1+Ii+1,j+1-Ii+1,j-1
finally, the expected gradient value is found out by utilizing L2 norm.
The invention has the beneficial effects that: the method can automatically finish the cutting of the image in the picture and the extraction of the table, and can improve the efficiency of the prepositive operation and the accuracy of the subsequent operation in the image character recognition tasks such as OCR (optical character recognition) and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for extracting a table from an image according to the present invention;
FIG. 2 is an original image in example 1;
FIG. 3 is a transformed grayscale image of example 1;
FIG. 4 is a graph of the edge detection in example 1;
FIG. 5 is the image after the binarization processing in example 1;
FIG. 6 is an image of embodiment 1 undergoing structure element for opening in morphology;
FIG. 7 is an output image of the result in the horizontal direction in example 1;
FIG. 8 is an output image of the result in the vertical direction in example 1;
FIG. 9 is an output image in which the vertical and horizontal directions coincide in embodiment 1;
FIG. 10 is a table image extracted in example 1;
fig. 11 is a frame-corrected form image in example 1.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1, a method of extracting a table from an image includes the steps of:
s1, converting the original image colorful image into a gray-scale image, as shown in figures 2-3;
s2, performing image pixel intensity gradient calculation in the vertical and horizontal directions by using a convolution method, and performing edge detection on the processed grayscale image, as shown in fig. 4:
firstly, evolution convolution is carried out by utilizing a Gaussian filter of 5x5 to achieve a noise reduction effect, and then gradual strength calculation is carried out by utilizing a kernel of 3x 3; the calculation of the gradient is divided into horizontal and vertical differential equations, wherein the horizontal equation is:
Gx(i,j)=Ii+1,j-1-Ii-1,i-j+2Ii+1,j-2Ii-1,j+Ii+1,j+1-Ii-1,j+1
the equation for the vertical direction is:
Gy(i,j)=Ii-1,j+1-Ii-1,i-j+2Ii,j+1-2Ii,j-1+Ii+1,j+1-Ii+1,j-1
finally, finding out an expected gradient value by using L2 norm;
s3, using the expansion in image morphology to enhance the edge pixels of the acquired image, and performing image binarization processing according to a specific threshold, see fig. 5;
s4, starting the morphology of the processed image in vertical and horizontal directions by using different structural elements, as shown in fig. 6;
s5, superimposing the results of the vertical and horizontal directions as an output, see fig. 7-9;
s6, through analysis of the topological structure, secondary judgment is carried out on the area occupied by the frame obtained in the step S5, whether the frame is reserved as a table or not is determined, and a closed square frame, namely a complete frame of the table, is found out;
s7, extracting and marking the position and content of the table in the picture according to the result obtained in the step S6, namely extracting the table from the picture, as shown in figure 10;
s8, performing table border correction on the outline characteristics of the table to obtain the complete information of all the tables in the image, as shown in FIG. 11.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (2)

1. A method of extracting a form from an image, comprising the steps of:
s1, converting the original image from the color image into a gray-scale image;
s2, calculating the image pixel intensity gradient in the vertical and horizontal directions by using a convolution method, carrying out edge detection on the processed gray-scale image, and judging conditions by using Non-Max medium compression according to the calculated gradient value and direction to find out edge errors;
s3, using the expansion in the image morphology to enhance the edge pixels of the acquired image, carrying out image binarization processing according to a specific threshold value, deleting noise, enhancing edge characteristics, determining that the edge is greater than the threshold value is set as 1, and setting the threshold value as 0;
s4, respectively opening the processed image in the morphology in the vertical and horizontal directions by using different structural elements, finding out the object in the image which conforms to the shape of the strip, and removing characters or other symbol objects;
s5, performing the operation of step S4 in the horizontal and vertical directions, respectively, adding the results of the two directions to locate the table position and size in the image, and superimposing the results of the vertical and horizontal directions as an output;
s6, finding out a closed square frame, namely a complete frame of the table, through the analysis of the topological structure;
s7, extracting and marking the position and content of the table in the image according to the result obtained in the step S6, namely extracting the table from the image;
s8, performing table border correction on the outline characteristics of the table to obtain the complete information of all tables in the image.
2. The method for extracting table from image as claimed in claim 1, wherein the step S2 is specifically: firstly, evolution convolution is carried out by utilizing a Gaussian filter of 5x5 to achieve a noise reduction effect, and then gradual strength calculation is carried out by utilizing a kernel of 3x 3; the calculation of the gradient is divided into horizontal and vertical differential equations, wherein the horizontal equation is:
Gx(i,j)=Ii+1,j-1-Ii-1,i-j+2Ii+1,j-2Ii-1,j+Ii+1,j+1-Ii-1,j+1
the equation for the vertical direction is:
Gy(i,j)=Ii-1,j+1-Ii-1,i-j+2Ii,j+1-2Ii,j-1+Ii+1,j+1-Ii+1,j-1
finally, the expected gradient value is found by using L2 norm.
CN202010318730.1A 2020-04-21 2020-04-21 Method for extracting table from image Pending CN111539312A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010318730.1A CN111539312A (en) 2020-04-21 2020-04-21 Method for extracting table from image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010318730.1A CN111539312A (en) 2020-04-21 2020-04-21 Method for extracting table from image

Publications (1)

Publication Number Publication Date
CN111539312A true CN111539312A (en) 2020-08-14

Family

ID=71979426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010318730.1A Pending CN111539312A (en) 2020-04-21 2020-04-21 Method for extracting table from image

Country Status (1)

Country Link
CN (1) CN111539312A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284096A (en) * 2021-05-08 2021-08-20 北京印刷学院 Counting method of medicine box inner medicine plates based on high-frequency information and contour information
TWI824757B (en) * 2022-10-06 2023-12-01 普匯金融科技股份有限公司 Electronic computing device, method for identifying grid lines position in a table, and computer program product thereof

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106611424A (en) * 2016-03-23 2017-05-03 四川用联信息技术有限公司 Image edge extraction method
CN106815851A (en) * 2017-01-24 2017-06-09 电子科技大学 A kind of grid circle oil level indicator automatic reading method of view-based access control model measurement
CN108090929A (en) * 2017-12-04 2018-05-29 国家海洋局第海洋研究所 The linear anomaly analysis extraction novel method in mining area
CN108280823A (en) * 2017-12-29 2018-07-13 南京邮电大学 The detection method and system of the weak edge faults of cable surface in a kind of industrial production
CN109543525A (en) * 2018-10-18 2019-03-29 成都中科信息技术有限公司 A kind of table extracting method of form of general use image
CN110033471A (en) * 2019-04-19 2019-07-19 福州大学 A kind of wire detection method based on connected domain analysis and morphological operation
CN110032989A (en) * 2019-04-23 2019-07-19 福州大学 A kind of form document image classification method based on wire feature and pixel distribution
US20200089946A1 (en) * 2018-06-11 2020-03-19 Innoplexus Ag System and method for extracting tabular data from electronic document

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106611424A (en) * 2016-03-23 2017-05-03 四川用联信息技术有限公司 Image edge extraction method
CN106815851A (en) * 2017-01-24 2017-06-09 电子科技大学 A kind of grid circle oil level indicator automatic reading method of view-based access control model measurement
CN108090929A (en) * 2017-12-04 2018-05-29 国家海洋局第海洋研究所 The linear anomaly analysis extraction novel method in mining area
CN108280823A (en) * 2017-12-29 2018-07-13 南京邮电大学 The detection method and system of the weak edge faults of cable surface in a kind of industrial production
US20200089946A1 (en) * 2018-06-11 2020-03-19 Innoplexus Ag System and method for extracting tabular data from electronic document
CN109543525A (en) * 2018-10-18 2019-03-29 成都中科信息技术有限公司 A kind of table extracting method of form of general use image
CN110033471A (en) * 2019-04-19 2019-07-19 福州大学 A kind of wire detection method based on connected domain analysis and morphological operation
CN110032989A (en) * 2019-04-23 2019-07-19 福州大学 A kind of form document image classification method based on wire feature and pixel distribution

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王绪等: "基于投影特征与结构特征的表格图像识别", 《计算机工程》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284096A (en) * 2021-05-08 2021-08-20 北京印刷学院 Counting method of medicine box inner medicine plates based on high-frequency information and contour information
CN113284096B (en) * 2021-05-08 2023-08-25 北京印刷学院 Counting method for medicine plates in medicine box based on high-frequency information and contour information
TWI824757B (en) * 2022-10-06 2023-12-01 普匯金融科技股份有限公司 Electronic computing device, method for identifying grid lines position in a table, and computer program product thereof

Similar Documents

Publication Publication Date Title
CN109241894B (en) Bill content identification system and method based on form positioning and deep learning
TWI536277B (en) Form identification method and device
CN104751142B (en) A kind of natural scene Method for text detection based on stroke feature
CN105760842A (en) Station caption identification method based on combination of edge and texture features
CN102930277A (en) Character picture verification code identifying method based on identification feedback
CN112528997B (en) Tibetan-Chinese bilingual scene text detection method based on text center region amplification
CN112712273B (en) Handwriting Chinese character aesthetic degree judging method based on skeleton similarity
CN110751154B (en) Complex environment multi-shape text detection method based on pixel-level segmentation
CN111507351B (en) Ancient book document digitizing method
CN103258201A (en) Form line extraction method integrating global information and local information
CN103218605A (en) Quick eye locating method based on integral projection and edge detection
CN110738030A (en) Table reconstruction method and device, electronic equipment and storage medium
CN108108731A (en) Method for text detection and device based on generated data
CN111539312A (en) Method for extracting table from image
CN105447508A (en) Identification method and system for character image verification codes
CN112560850A (en) Automatic identity card information extraction and authenticity verification method based on custom template
CN106980857A (en) A kind of Brush calligraphy segmentation recognition method based on rubbings
CN107977648B (en) Identification card definition distinguishing method and system based on face recognition
CN115273115A (en) Document element labeling method and device, electronic equipment and storage medium
CN112686265A (en) Hierarchic contour extraction-based pictograph segmentation method
CN109271882B (en) Method for extracting color-distinguished handwritten Chinese characters
CN106709437A (en) Improved intelligent processing method for image-text information of scanning copy of early patent documents
CN108647713B (en) Embryo boundary identification and laser track fitting method
CN110473222A (en) Image-element extracting method and device
TWI430187B (en) License plate number identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200814