CN111985506A - Chart information extraction method and device and storage medium - Google Patents

Chart information extraction method and device and storage medium Download PDF

Info

Publication number
CN111985506A
CN111985506A CN202010851106.8A CN202010851106A CN111985506A CN 111985506 A CN111985506 A CN 111985506A CN 202010851106 A CN202010851106 A CN 202010851106A CN 111985506 A CN111985506 A CN 111985506A
Authority
CN
China
Prior art keywords
extracted
information
chart
intersection point
frame line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010851106.8A
Other languages
Chinese (zh)
Inventor
陈松波
李聪
谭伟
王文博
胡金磊
徐刚
汪密
李文航
欧阳业
陈俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingyuan Power Supply Bureau of Guangdong Power Grid Co Ltd
Original Assignee
Qingyuan Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingyuan Power Supply Bureau of Guangdong Power Grid Co Ltd filed Critical Qingyuan Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority to CN202010851106.8A priority Critical patent/CN111985506A/en
Publication of CN111985506A publication Critical patent/CN111985506A/en
Priority to PCT/CN2021/070082 priority patent/WO2022036997A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a device and a storage medium for extracting diagram information, wherein the method comprises the steps of obtaining an information diagram to be extracted and preprocessing the information diagram to be extracted; extracting transverse frame lines and longitudinal frame lines of the preprocessed information chart to be extracted based on an opening and closing operation principle; performing intersection operation on the extracted transverse frame lines and the extracted longitudinal frame lines to obtain intersection points of the information chart to be extracted; determining a minimum identification unit of the information chart to be extracted based on the intersection point, the transverse frame line and the longitudinal frame line of the information chart to be extracted; and performing text recognition on each minimum recognition unit in the information chart to be extracted to obtain chart information of the information chart to be extracted. The method and the device can efficiently, flexibly and accurately extract the key information in the image table, and also achieve the technical effect of improving the working efficiency of the power grid safety supervision personnel for examining the related image table.

Description

Chart information extraction method and device and storage medium
Technical Field
The embodiment of the invention relates to the technical field of image recognition, in particular to a chart information extraction method, a chart information extraction device and a storage medium.
Background
In the daily power production and operation and maintenance process, a large number of image tables are usually accompanied, on one hand, these tables can be used for recording various data, and on the other hand, some tables can be used for guiding the operation of the specification.
However, the image tables have the characteristics of large quantity, fixed format, strong correlation among the tables, dispersed key information needing to be checked and the like, so that the working efficiency of the power grid security monitoring personnel for checking the image tables is greatly influenced.
Disclosure of Invention
The invention provides a chart information extraction method, a chart information extraction device and a storage medium, which not only can efficiently, flexibly and accurately extract key information in an image table, but also realize the technical effect of improving the work efficiency of a power grid security supervisor for examining the related image table.
The embodiment of the invention provides a chart information extraction method, which comprises the following steps:
acquiring an information chart to be extracted, and preprocessing the information chart to be extracted;
extracting the transverse frame line and the longitudinal frame line of the preprocessed information chart to be extracted based on the open-close operation principle;
performing intersection operation on the extracted transverse frame line and the extracted longitudinal frame line to obtain an intersection point of the information chart to be extracted;
determining a minimum identification unit of the information chart to be extracted based on the intersection point of the information chart to be extracted, the transverse frame line and the longitudinal frame line;
and performing text recognition on each minimum recognition unit in the information chart to be extracted to obtain chart information of the information chart to be extracted.
Further, the determining the minimum identification unit of the information diagram to be extracted based on the intersection point of the information diagram to be extracted, the transverse frame line and the longitudinal frame line includes:
detecting whether a false intersection point exists in the transverse direction of the intersection point of the information chart to be extracted, if so, removing the detected false intersection point to obtain a target transverse intersection point;
detecting whether a false intersection point exists longitudinally of the target transverse intersection point, and if so, filtering the detected false intersection point to obtain a target intersection point;
and determining the minimum identification unit of the information chart to be extracted based on the target intersection point, the transverse frame line and the longitudinal frame line.
Further, the detecting whether a false intersection exists transversely of the intersection of the information diagram to be extracted, if so, removing the detected false intersection, and obtaining a target transverse intersection comprises:
sequentially detecting the intersection point m of each information chart to be extracted from left to right along the transverse directioni,jIs a straight line in the longitudinal direction of (1), wherein mi,jThe ith intersection point of the ith row, i is 1, 2, 3, … …, and j is 1, 2, 3, … …;
if the information does not exist, the intersection point m of the information chart to be extractedi,jAnd removing the false intersection points for the false intersection points to obtain the target transverse intersection point.
Further, the detecting whether a false intersection exists longitudinally of the target transverse intersection, if so, filtering the detected false intersection, and obtaining the target intersection includes:
sequentially acquiring each target transverse intersection point ms,gHorizontally adjacent point m ofs+1,gLongitudinally below the target transverse intersection point ms+1,g+1、ms+1,g+2、……ms+1,g+nObtaining the target transverse intersection point ms,gA plurality of diagonal points ms+1,g+1、ms+1,g+2、……ms+1,g+nWherein m iss,gThe g-th target transverse intersection point in the s-th column is defined as s-1, 2, 3, … …, g-1, 2, 3, … …, and n is the number of the target transverse intersection points in the s + 1-th column;
detecting the target transverse intersection ms,gWhether a corresponding transverse frame line exists at the projection of the connecting line between each diagonal point in the transverse direction;
and if the transverse frame line does not exist, the diagonal line point is the false intersection point, and the false intersection point is filtered to obtain the target intersection point.
Further, before performing text recognition on each minimum recognition unit in the information chart to be extracted, the method further includes:
carrying out area positioning on the information chart to be extracted by utilizing the determined minimum identification unit;
and performing text recognition on each minimum recognition unit in the information chart to be extracted after the area is positioned.
Further, the performing, by using the determined minimum recognition unit, region positioning on the information diagram to be extracted includes:
cutting the information chart to be extracted according to the determined minimum identification unit;
and coding each cut minimum identification unit to obtain the information chart to be extracted after area positioning.
Further, the acquiring the information diagram to be extracted and preprocessing the information diagram to be extracted include:
acquiring an image of the information chart to be extracted;
judging whether the image is inclined or not;
if the image is inclined, correcting the image by using an image inclination angle detection algorithm based on direction projection;
and carrying out binarization processing on the corrected image to obtain the preprocessed information chart to be extracted.
Further, the extracting of the preprocessed horizontal frame line and vertical frame line of the information graph to be extracted based on the principle of opening and closing operations includes:
using formulas
Figure BDA0002644758970000041
And extracting the preprocessed information chart to be extracted, the preprocessed transverse frame line and the preprocessed longitudinal frame line, wherein F is the information chart to be extracted, Y is the frame line extraction result, and G represents a structural element.
The embodiment of the invention also provides a chart information extraction device, which comprises:
the device comprises an acquisition unit, a preprocessing unit and a processing unit, wherein the acquisition unit is used for acquiring an information chart to be extracted and preprocessing the information chart to be extracted;
the frame line extraction unit is used for extracting the transverse frame line and the longitudinal frame line of the preprocessed information chart to be extracted based on the opening and closing operation principle;
the intersection point determining unit is used for performing intersection operation on the extracted transverse frame line and the extracted longitudinal frame line to obtain an intersection point of the information chart to be extracted;
the determining unit is used for determining a minimum identification unit of the information chart to be extracted based on the intersection point of the information chart to be extracted, the transverse frame line and the longitudinal frame line;
and the text recognition unit is used for performing text recognition on each minimum recognition unit in the information chart to be extracted to obtain the chart information of the information chart to be extracted.
The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the graph information extraction method according to any embodiment of the present invention.
The invention discloses a method, a device and a storage medium for extracting diagram information, wherein the method comprises the steps of obtaining an information diagram to be extracted and preprocessing the information diagram to be extracted; extracting transverse frame lines and longitudinal frame lines of the preprocessed information chart to be extracted based on an opening and closing operation principle; performing intersection operation on the extracted transverse frame lines and the extracted longitudinal frame lines to obtain intersection points of the information chart to be extracted; determining a minimum identification unit of the information chart to be extracted based on the intersection point, the transverse frame line and the longitudinal frame line of the information chart to be extracted; and performing text recognition on each minimum recognition unit in the information chart to be extracted to obtain chart information of the information chart to be extracted. The intersection point of the information chart to be extracted is determined through the extracted transverse frame line and the extracted longitudinal frame line, the minimum identification unit of a new chart to be extracted is further determined, and then the chart information of the information to be extracted is obtained through extracting the text information of the minimum identification unit, so that the key information in the image table can be efficiently, flexibly and accurately extracted, and the technical effect of improving the work efficiency of the power grid safety supervision personnel for examining the related image table is realized.
Drawings
Fig. 1 is a flowchart of a chart information extraction method according to an embodiment of the present invention;
FIG. 2 is a diagram of a minimum identification unit according to an embodiment of the present invention;
FIG. 3 is a flow chart of another method for extracting diagram information according to an embodiment of the present invention;
FIG. 4 is a flowchart of another chart information extraction method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of another minimum identification unit provided by an embodiment of the present invention;
FIG. 6 is a flowchart of another chart information extraction method according to an embodiment of the present invention;
FIG. 7 is a flowchart of another chart information extraction method according to an embodiment of the present invention;
fig. 8 is a block diagram of a chart information extraction apparatus according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
It should be noted that the terms first, second and the like in the description and claims of the present invention and in the drawings are used for distinguishing different objects and are not intended to limit a specific order. The following embodiments of the present invention may be implemented individually, or in combination with each other, and the embodiments of the present invention are not limited in this respect.
Fig. 1 is a flowchart of a chart information extraction method according to an embodiment of the present invention.
As shown in fig. 1, the method for extracting diagram information includes the following steps:
and S101, acquiring an information chart to be extracted, and preprocessing the information chart to be extracted.
Specifically, for a paper image form, to extract key information therein, the paper image form needs to be changed into image data first, so that the paper image form needs to be scanned to obtain image data of the paper image form, and then the image data is preprocessed to obtain preprocessed image information.
And S102, extracting the transverse frame line and the longitudinal frame line of the preprocessed information chart to be extracted based on the opening and closing operation principle.
Specifically, since the image table in the power industry is composed of horizontal and vertical frame lines, the horizontal frame line and the vertical frame line of the preprocessed information diagram to be extracted can be extracted, so that the position of the text information of the information diagram to be extracted can be determined through the horizontal and vertical frame lines. Wherein, the horizontal frame line is the frame line of the row direction, and the vertical frame line is the frame line of the column direction.
Optionally, in step S102, the extracting the horizontal frame line and the vertical frame line of the preprocessed information graph to be extracted based on the principle of the opening and closing operation includes: using formulas
Figure BDA0002644758970000071
And extracting the preprocessed information chart to be extracted, the horizontal frame line and the vertical frame line, wherein F is the information chart to be extracted, Y is the frame line extraction result, and G represents a structural element.
In particular, a formula may be utilized
Figure BDA0002644758970000072
The horizontal and vertical frame lines of the table are extracted by using opening and closing operations in mathematical form, wherein F represents an information chart to be extracted, Y represents a frame line extraction result, and G represents a structural element.
The extraction of the outline of the information chart to be extracted may be divided into vertical outline extraction and horizontal outline extraction. The two kinds of frame lines are extracted by respectively defining two different shapes of structural elements GhAnd GwTo be realized. Wherein G ishIs a rectangular region with 1 pixel point in transverse length and h pixel points in longitudinal length, and is represented as Gh(1, h), and GwIs a rectangular region with h pixel points in the transverse length and 1 pixel point in the longitudinal length, and is represented as Gw(h,1), wherein h is the number of the maximum pixel points required by displaying one character in the horizontal direction or the vertical direction in the table.
And step S103, performing intersection operation on the extracted transverse frame line and the extracted longitudinal frame line to obtain an intersection point of the information chart to be extracted.
Wherein, the extraction result of the horizontal frame line and the vertical frame line of the table is recorded as YwAnd Yh. After extracting the horizontal and vertical frame lines of the new graph to be extracted, the intersection point of the information graph to be extracted can be obtained through intersection operation, and specifically, the intersection point of the information graph to be extracted can be obtained through a formula M1=Yw∩YhPerforming intersection operation on the extracted transverse frame line and the extracted longitudinal frame line, wherein M is1The results are extracted for the corresponding table intersections.
And step S104, determining the minimum identification unit of the information chart to be extracted based on the intersection point, the transverse frame line and the longitudinal frame line of the information chart to be extracted.
Specifically, the minimum identification unit is a minimum graphic unit constituting an image table, and fig. 2 is a schematic diagram of a minimum identification unit provided in an embodiment of the present invention. Referring to fig. 2, the minimum graphic unit A, B, C and the like shown in fig. 2 are all minimum recognized units.
And step S105, performing text recognition on each minimum recognition unit in the information chart to be extracted to obtain chart information of the information chart to be extracted.
Specifically, after the minimum recognition unit in the information chart to be extracted is determined, text recognition is performed on each minimum recognition unit in the information chart to be extracted, and finally chart information of the information chart to be extracted is obtained. The online or offline text recognition service provided by a third party can be flexibly adopted to complete the text recognition of each minimum recognition unit so as to meet the requirements of different scenes.
In the embodiment of the invention, the intersection point of the information chart to be extracted is determined through the extracted transverse frame line and the extracted longitudinal frame line, the minimum identification unit of the information chart to be extracted is further determined, and the chart information of the information chart to be extracted is obtained through extracting the text information of the minimum identification unit, so that the key information in the image table can be efficiently, flexibly and accurately extracted, and the technical effect of improving the work efficiency of the power grid safety supervision personnel for examining the related image table is realized.
Based on the technical scheme, the embodiment obtains the information chart to be extracted, and performs preprocessing and optimization on the information chart to be extracted. Fig. 3 is a flowchart of another chart information extraction method according to an embodiment of the present invention, and as shown in fig. 3, the chart information extraction method according to the embodiment includes the following steps:
step S301, acquiring an image of an information chart to be extracted.
Step S302, judge whether the picture has inclined.
In step S303, if the tilt occurs, the image is corrected by using an image tilt angle detection algorithm based on the directional projection.
And step S304, carrying out binarization processing on the corrected image to obtain a preprocessed information chart to be extracted.
Specifically, for the problem that the scanned image form is inclined due to the fact that the paper form may not be placed correctly in the scanning process of the image form, after the image data of the information chart to be extracted is obtained, whether the image is inclined or not needs to be judged, and if yes, the inclined image needs to be corrected; in order to ensure that the calculation amount of a correction algorithm is small and the robustness is strong, an image dip angle detection algorithm based on directional projection is selected to finish the correction of the image; and after the correction is finished, carrying out binarization processing on the corrected image to finally obtain a preprocessed information chart to be extracted.
Step S305, extracting the transverse frame line and the longitudinal frame line of the preprocessed information chart to be extracted based on the opening and closing operation principle.
And S306, performing intersection operation on the extracted transverse frame line and the extracted longitudinal frame line to obtain an intersection point of the information chart to be extracted.
In step S307, the minimum identification unit of the information diagram to be extracted is determined based on the intersection point, the horizontal frame line, and the vertical frame line of the information diagram to be extracted.
And step S308, performing text recognition on each minimum recognition unit in the information chart to be extracted to obtain chart information of the information chart to be extracted.
By using the chart information extraction method provided by the embodiment, the key information in the image table can be efficiently, flexibly and accurately extracted, and the technical effect of improving the work efficiency of the power grid safety supervision personnel for examining the related image table is achieved.
Based on the above technical solution, this embodiment optimizes the minimum recognition unit that determines the information diagram to be extracted based on the intersection of the information diagram to be extracted, the horizontal outline and the vertical outline. Fig. 4 is a flowchart of another chart information extraction method according to an embodiment of the present invention, and as shown in fig. 4, the chart information extraction method according to the embodiment includes the following steps:
step S401, obtaining an information chart to be extracted, and preprocessing the information chart to be extracted.
And step S402, extracting the transverse frame line and the longitudinal frame line of the preprocessed information chart to be extracted based on the opening and closing operation principle.
And step S403, performing intersection operation on the extracted transverse frame line and the extracted longitudinal frame line to obtain an intersection point of the information chart to be extracted.
And S404, detecting whether the transverse direction of the intersection point of the information chart to be extracted has a false intersection point, and if so, removing the detected false intersection point to obtain a target transverse intersection point.
And S405, detecting whether a false intersection point exists in the longitudinal direction of the target transverse intersection point, and if so, filtering the detected false intersection point to obtain the target intersection point.
In step S406, the minimum recognition unit of the information chart to be extracted is determined based on the target intersection point, the horizontal frame line, and the vertical frame line.
Specifically, referring to fig. 2, for the minimum recognized cell a, it is obvious that the intersection point inside the two dotted circular frames has no practical meaning for determining the minimum recognized cell a, in other words, the minimum recognized cell a cannot be determined by using the two intersection points inside the dotted circular frames, and at this time, the intersection point inside the dotted circular frame is a false intersection point. In order to accurately obtain the minimum identification unit, whether a false intersection exists in the obtained intersections of the information chart to be extracted needs to be detected first.
For the determination of the false intersection point, it is first required to detect whether there is a false intersection point in the horizontal direction of the intersection point of the information graph to be extracted, for example, two intersection points in a dashed circle shown in fig. 2, and for the intersection point of the upper left corner of the minimum identification unit a, there are two false intersection points; after the horizontal false intersections are removed, the obtained remaining intersections are target horizontal intersections, and then the target horizontal intersections are further detected to determine whether there is a false intersection in the longitudinal direction of each target horizontal intersection, and after the vertical false intersections are removed, the remaining intersections are target intersections, fig. 5 is a schematic diagram of another minimum identification unit provided by the embodiment of the present invention, referring to fig. 5, for two angles in a dotted circular frame shown in fig. 5, for an intersection at the upper left corner of the minimum identification unit a, two vertical false intersections need to be removed.
And the minimum identification unit of the information chart to be extracted can be accurately determined through the residual target intersection points and the identified horizontal and vertical frame lines.
Step S407, performing text recognition on each minimum recognition unit in the information chart to be extracted to obtain chart information of the information chart to be extracted.
By using the chart information extraction method provided by the embodiment, the key information in the image table can be efficiently, flexibly and accurately extracted, and the technical effect of improving the work efficiency of the power grid safety supervision personnel for examining the related image table is achieved.
Based on the technical scheme, the embodiment detects whether the transverse direction of the intersection point of the information chart to be extracted has a false intersection point, if so, the detected false intersection point is removed to obtain a target transverse intersection point; and detecting whether a false intersection point exists longitudinally of the target transverse intersection point, if so, filtering the detected false intersection point to obtain target intersection points, and respectively optimizing the target intersection points, as shown in fig. 6, wherein the method for extracting the chart information provided by the embodiment comprises the following steps:
step S601, obtaining an information chart to be extracted, and preprocessing the information chart to be extracted.
Step S602, extracting the transverse frame line and the longitudinal frame line of the preprocessed information chart to be extracted based on the opening and closing operation principle.
And step S603, performing intersection operation on the extracted transverse frame line and the extracted longitudinal frame line to obtain an intersection point of the information chart to be extracted.
Step S604, detecting the intersection m of each information chart to be extracted from left to right in turn along the transverse directioni,jIs a straight line in the longitudinal direction of (1), wherein mi,jThe intersection point j of the ith row, i is 1, 2, 3,……,j=1,2,3,……;
step S605, if not, the intersection m of the information chart to be extractedi,jAnd removing the false intersection points to obtain the target transverse intersection point.
Specifically, the portrait, i.e., the first row of the diagram, points in the direction of the last row. For the determination of the horizontal false intersection point, it only needs to determine whether there is a straight line in the longitudinal direction of the intersection point of the information chart to be extracted, taking the N (2, 2) minimum identification unit in fig. 2 as an example, the first intersection point m at the upper left corner of the N (2, 2) minimum identification unit2,2Has a straight line in the longitudinal direction of (1), i.e., the left-side longitudinal frame line of the N (2, 2) minimum identification unit, and thus, the intersection point m2,2Not a false intersection, for an intersection m2,3In other words, there is no straight line in the longitudinal direction, and therefore, the intersection m2,3Is a false intersection, needs to be removed, and correspondingly, is an intersection m2,4Also false intersection points, are also removed, and intersection point m2,5Not a false intersection, needs to be preserved.
Step S606, each target transverse intersection point m is acquired in sequences,gHorizontally adjacent point m ofs+1,gLongitudinally below the target transverse intersection ms+1,g+1、ms+1,g+2、……ms+1,g+nTo obtain a target transverse intersection point ms,gA plurality of diagonal points ms+1,g+1、ms+1,g+2、……ms+1,g+nWherein m iss,gThe g-th target transverse intersection point in the s-th column is s-1, 2, 3, … …, g-1, 2, 3, … …, and n is the number of target transverse intersection points in the s + 1-th column.
Step S607, detecting the target transverse intersection ms,gAnd whether a corresponding transverse frame line exists at the projection of the connecting line between each diagonal point in the transverse direction.
Step 608, if there is no transverse frame line, the diagonal line point is a false intersection point, and the false intersection point is filtered out to obtain the target intersection point.
Specifically, for the determination of the longitudinal false intersection, after the target transverse intersection is determined, it is only necessary to sequentially detect the target transverse intersection (i.e., diagonal point) longitudinally below the adjacent point in the horizontal direction by the intersection, whether a corresponding transverse frame line exists at the projection position of a connecting line between the target transverse intersection point and the target transverse intersection point in the transverse direction or not, if so, the diagonal point can be determined to be the desired target intersection point and retained, otherwise the diagonal point is a false intersection point and needs to be filtered out, and then continuously judging the next intersection point along the longitudinal direction and the downward direction of the horizontal adjacent points of the target transverse intersection point until the transverse frame line exists in the transverse projection of the connecting line between the target transverse intersection point and the diagonal line point, wherein the longitudinal false intersection points needing to be filtered do not exist in the corresponding diagonal line points of the transverse frame line at the transverse projection of the connecting line.
Referring to fig. 5, taking the N (2, 1) minimum recognition unit in fig. 5 as an example, the first intersection m at the upper left corner of the N (2, 1) minimum recognition unit1,2Is a target transverse intersection point, the intersection point m1,2Horizontally adjacent point m of2,2Is taken as a diagonal point m2,3(i.e., the intersection point within the upper circular dashed box in FIG. 5), the target lateral intersection point m1,2To diagonal point m2,3There is no corresponding transverse frame line in the projection of the connecting line therebetween in the transverse direction (i.e., there is a transverse frame line at the broken line shown in fig. 5), and therefore, the diagonal point m2,3False intersection points need to be removed; similarly, diagonal point m2,4(i.e., the intersection point within the lower circular dashed box in FIG. 5) is also a false intersection point and is also removed, and the intersection point m is2,5(i.e., the intersection point within the square dashed box in FIG. 5) m1,2Has a transverse frame line at the projection of the connecting line in the transverse direction, so that the intersection point m2,5Not a false intersection, needs to be preserved.
In step S609, the minimum recognition unit of the information chart to be extracted is determined based on the target intersection, the horizontal outline, and the vertical outline.
Step S610, performing text recognition on each minimum recognition unit in the information chart to be extracted to obtain chart information of the information chart to be extracted.
By using the chart information extraction method provided by the embodiment, the key information in the image table can be efficiently, flexibly and accurately extracted, and the technical effect of improving the work efficiency of the power grid safety supervision personnel for examining the related image table is achieved.
Based on the above technical solution, this embodiment optimizes the above embodiment, and as shown in fig. 7, the method for extracting diagram information provided by this embodiment further includes the following steps:
step S701, acquiring an information chart to be extracted, and preprocessing the information chart to be extracted.
And step S702, extracting the transverse frame line and the longitudinal frame line of the preprocessed information chart to be extracted based on the opening and closing operation principle.
And step S703, performing intersection operation on the extracted transverse frame line and the extracted longitudinal frame line to obtain an intersection point of the information chart to be extracted.
In step S704, the minimum recognition unit of the information diagram to be extracted is determined based on the intersection, the horizontal border, and the vertical border of the information diagram to be extracted.
Step S705, carrying out area positioning on the information graph to be extracted by utilizing the determined minimum identification unit;
step S706, performing text recognition on each minimum recognition unit in the information chart to be extracted after the area positioning.
Specifically, in order to accurately find the position of each minimum recognition unit in the information chart to be extracted, the determined minimum recognition unit needs to be subjected to area positioning in the information chart to be extracted, and finally, text recognition is performed on each minimum recognition unit in the information chart to be extracted after the area positioning, so as to obtain the chart information of the information chart to be extracted.
Optionally, in step S705, performing area location on the information graph to be extracted by using the determined minimum identification unit includes:
cutting the information chart to be extracted according to the determined minimum identification unit;
and coding each cut minimum identification unit to obtain the information chart to be extracted after the area is positioned.
Specifically, in order to accurately find the position of each minimum identification unit in the table, the information graph to be extracted needs to be cut first, the cut minimum identification units are cut according to the minimum identification units, that is, each minimum identification unit is cut into one picture, and then the cut pictures are numbered. For example, referring to fig. 2 and 6, the picture coding of a certain minimum recognition unit may be denoted as N (p, q), where p and q are both positive integers greater than or equal to 1. p denotes the number of rows where the minimum recognized cell is located, and q denotes the number of columns where the minimum recognized cell is located. Therefore, the corresponding key information extracted from the information chart to be extracted can be represented by extracting information corresponding to a string of sequences N (p, q).
By using the chart information extraction method provided by the embodiment, the key information in the image table can be efficiently, flexibly and accurately extracted, and the technical effect of improving the work efficiency of the power grid safety supervision personnel for examining the related image table is achieved.
The embodiment of the present invention further provides a chart information extraction device, where the chart information extraction device is configured to execute the chart information extraction method provided in the above embodiment of the present invention, and the following describes the chart information extraction device provided in the embodiment of the present invention in detail.
Fig. 8 is a structural diagram of a chart information extraction device according to an embodiment of the present invention, and as shown in fig. 8, the chart information extraction device mainly includes: a preprocessing unit 81, a frame line extracting unit 82, an intersection determining unit 83, a determining unit 84, and a text recognizing unit 85, wherein:
the preprocessing unit 81 is configured to acquire an information chart to be extracted and preprocess the information chart to be extracted;
a frame line extraction unit 82 for extracting a horizontal frame line and a vertical frame line of the preprocessed information chart to be extracted based on the open-close operation principle;
the intersection point determining unit 83 is configured to perform intersection operation on the extracted horizontal frame line and the extracted longitudinal frame line to obtain an intersection point of the information graph to be extracted;
a determination unit 84 for determining a minimum recognition unit of the information chart to be extracted based on the intersection, the horizontal border, and the vertical border of the information chart to be extracted;
and the text recognition unit 85 is configured to perform text recognition on each minimum recognition unit in the information chart to be extracted to obtain chart information of the information chart to be extracted.
By using the chart information extraction device provided by the embodiment of the invention, the key information in the image table can be efficiently, flexibly and accurately extracted, and the technical effect of improving the work efficiency of the power grid safety supervision personnel for examining the related image table is also realized.
Optionally, the determining unit 84 includes:
the transverse detection subunit is used for detecting whether a false intersection point exists transversely of the intersection points of the information chart to be extracted, and if the false intersection point exists transversely, the detected false intersection point is removed to obtain a target transverse intersection point;
the longitudinal detection subunit is used for detecting whether a false intersection point exists longitudinally of the target transverse intersection point, and if so, filtering the detected false intersection point to obtain a target intersection point;
and the determining subunit is used for determining the minimum identification unit of the information chart to be extracted based on the target intersection point, the transverse frame line and the longitudinal frame line.
Optionally, the lateral detection subunit is configured to: sequentially detecting the intersection point m of each chart of the information to be extracted from left to right along the transverse directioni,jIs a straight line in the longitudinal direction of (1), wherein mi,jThe ith intersection point of the ith row, i is 1, 2, 3, … …, and j is 1, 2, 3, … …; if not, the intersection point m of the information chart to be extractedi,jAnd removing the false intersection points to obtain the target transverse intersection point.
Optionally, the longitudinal detection subunit is configured to: sequentially acquiring each target transverse intersection ms,gHorizontally adjacent point m ofs+1,gLongitudinally below the target transverse intersection ms+1,g+1、ms+1,g+2、……ms+1,g+nTo obtain a target transverse intersection point ms,gA plurality of diagonal points ms+1,g+1、ms+1,g+2、……ms+1,g+nWherein m iss,gThe g-th target transverse intersection point of the s-th column is 1, 2, 3, … …, g1, 2, 3, … …, n is the number of target horizontal intersection points in column s + 1; detecting the target transverse intersection ms,gWhether a corresponding transverse frame line exists at the projection position of a connecting line between each diagonal point in the transverse direction; if the transverse frame line does not exist, the diagonal line point is a false intersection point, and the false intersection point is filtered out to obtain a target intersection point.
Optionally, the chart information extraction device further includes:
and the area positioning unit is used for carrying out area positioning on the information chart to be extracted by utilizing the determined minimum identification unit.
Optionally, the area locating unit comprises:
the cutting subunit is used for cutting the information chart to be extracted according to the determined minimum identification unit;
and the coding subunit is used for coding each cut minimum identification unit to obtain the information chart to be extracted after the area positioning.
Optionally the pre-processing unit 81 comprises:
the acquisition subunit is used for acquiring an image of the information chart to be extracted;
a judging subunit, configured to judge whether the image is tilted;
the corrector subunit is used for correcting the image by utilizing an image dip angle detection algorithm based on the direction projection if the inclination occurs;
and the binarization subunit is used for performing binarization processing on the corrected image to obtain a preprocessed information chart to be extracted.
Optionally, the wire extracting unit 82 is specifically configured to: using formulas
Figure BDA0002644758970000171
And extracting the preprocessed information chart to be extracted, the horizontal frame line and the vertical frame line, wherein F is the information chart to be extracted, Y is the frame line extraction result, and G represents a structural element.
The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.
The chart information extraction method provided by the embodiment of the invention has the same technical characteristics as the chart information extraction device provided by the embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.
Embodiments of the present invention also provide a computer-readable storage medium on which a computer program is stored, where the computer program is used to execute the graph information extraction method provided in any embodiment of the present invention when executed by a computer processor.
Specifically, the chart information extraction method includes:
acquiring an information chart to be extracted, and preprocessing the information chart to be extracted;
extracting the transverse frame line and the longitudinal frame line of the preprocessed information chart to be extracted based on the open-close operation principle;
performing intersection operation on the extracted transverse frame line and the extracted longitudinal frame line to obtain an intersection point of the information chart to be extracted;
determining a minimum identification unit of the information chart to be extracted based on the intersection point of the information chart to be extracted, the transverse frame line and the longitudinal frame line;
and performing text recognition on each minimum recognition unit in the information chart to be extracted to obtain chart information of the information chart to be extracted.
Of course, the computer-readable storage medium provided by the embodiments of the present invention is not limited to the above-mentioned method operations when the computer processor executes the computer program stored on the computer-readable storage medium, and may also execute the relevant operations in the chart information extraction method provided by any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the above search apparatus, each included unit and module are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
In the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Finally, it should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention and the technical principles applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A chart information extraction method, characterized in that the method comprises:
acquiring an information chart to be extracted, and preprocessing the information chart to be extracted;
extracting the transverse frame line and the longitudinal frame line of the preprocessed information chart to be extracted based on the open-close operation principle;
performing intersection operation on the extracted transverse frame line and the extracted longitudinal frame line to obtain an intersection point of the information chart to be extracted;
determining a minimum identification unit of the information chart to be extracted based on the intersection point of the information chart to be extracted, the transverse frame line and the longitudinal frame line;
and performing text recognition on each minimum recognition unit in the information chart to be extracted to obtain chart information of the information chart to be extracted.
2. The chart information extraction method according to claim 1, wherein the determining the minimum recognition unit of the information chart to be extracted based on the intersection of the information chart to be extracted and the horizontal and vertical outline includes:
detecting whether a false intersection point exists in the transverse direction of the intersection point of the information chart to be extracted, if so, removing the detected false intersection point to obtain a target transverse intersection point;
detecting whether a false intersection point exists longitudinally of the target transverse intersection point, and if so, filtering the detected false intersection point to obtain a target intersection point;
and determining the minimum identification unit of the information chart to be extracted based on the target intersection point, the transverse frame line and the longitudinal frame line.
3. The method for extracting diagram information according to claim 2, wherein the detecting whether there is a false intersection point in the horizontal direction of the intersection point of the information diagram to be extracted, and if there is a false intersection point, removing the detected false intersection point to obtain the target horizontal intersection point comprises:
sequentially detecting the intersection point m of each information chart to be extracted from left to right along the transverse directioni,jIs a straight line in the longitudinal direction of (1), wherein mi,jThe ith intersection point of the ith row, i is 1, 2, 3, … …, and j is 1, 2, 3, … …;
if the information does not exist, the intersection point m of the information chart to be extractedi,jAnd removing the false intersection points for the false intersection points to obtain the target transverse intersection point.
4. The chart information extraction method according to claim 2, wherein the detecting whether there is a false intersection in the longitudinal direction of the target transverse intersection, and if so, filtering out the detected false intersection to obtain the target intersection comprises:
sequentially acquiring each target transverse intersection point ms,gHorizontally adjacent point m ofs+1,gLongitudinally below the target transverse intersection point ms+1,g+1、ms+1,g+2、……ms+1,g+nObtaining the target transverse intersection point ms,gA plurality of diagonal points ms+1,g+1、ms+1,g+2、……ms+1,g+nWherein m iss,gThe g-th target transverse intersection point in the s-th column is defined as s-1, 2, 3, … …, g-1, 2, 3, … …, and n is the number of the target transverse intersection points in the s + 1-th column;
detecting the target transverse intersection ms,gWhether a corresponding transverse frame line exists at the projection of the connecting line between each diagonal point in the transverse direction;
and if the transverse frame line does not exist, the diagonal line point is the false intersection point, and the false intersection point is filtered to obtain the target intersection point.
5. The chart information extraction method according to claim 1, wherein before performing text recognition on each of the minimum recognition units in the information chart to be extracted, the method further comprises:
carrying out area positioning on the information chart to be extracted by utilizing the determined minimum identification unit;
and performing text recognition on each minimum recognition unit in the information chart to be extracted after the area is positioned.
6. The chart information extraction method according to claim 5, wherein the performing area positioning on the information chart to be extracted by using the determined minimum recognition unit comprises:
cutting the information chart to be extracted according to the determined minimum identification unit;
and coding each cut minimum identification unit to obtain the information chart to be extracted after area positioning.
7. The chart information extraction method according to claim 1, wherein the obtaining of the information chart to be extracted and the preprocessing of the information chart to be extracted comprise:
acquiring an image of the information chart to be extracted;
judging whether the image is inclined or not;
if the image is inclined, correcting the image by using an image inclination angle detection algorithm based on direction projection;
and carrying out binarization processing on the corrected image to obtain the preprocessed information chart to be extracted.
8. The chart information extraction method according to claim 1, wherein the extraction of the horizontal and vertical frame lines of the information chart to be extracted after preprocessing based on the principle of opening and closing operations includes:
using formulas
Figure FDA0002644758960000031
And extracting the preprocessed information chart to be extracted, the preprocessed transverse frame line and the preprocessed longitudinal frame line, wherein F is the information chart to be extracted, Y is the frame line extraction result, and G represents a structural element.
9. A chart information extraction apparatus, characterized in that the apparatus comprises:
the device comprises an acquisition unit, a preprocessing unit and a processing unit, wherein the acquisition unit is used for acquiring an information chart to be extracted and preprocessing the information chart to be extracted;
the frame line extraction unit is used for extracting the transverse frame line and the longitudinal frame line of the preprocessed information chart to be extracted based on the opening and closing operation principle;
the intersection point determining unit is used for performing intersection operation on the extracted transverse frame line and the extracted longitudinal frame line to obtain an intersection point of the information chart to be extracted;
the determining unit is used for determining a minimum identification unit of the information chart to be extracted based on the intersection point of the information chart to be extracted, the transverse frame line and the longitudinal frame line;
and the text recognition unit is used for performing text recognition on each minimum recognition unit in the information chart to be extracted to obtain the chart information of the information chart to be extracted.
10. A computer-readable storage medium on which a computer program is stored, the program, when being executed by a processor, implementing the graph information extracting method according to any one of claims 1 to 8.
CN202010851106.8A 2020-08-21 2020-08-21 Chart information extraction method and device and storage medium Pending CN111985506A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010851106.8A CN111985506A (en) 2020-08-21 2020-08-21 Chart information extraction method and device and storage medium
PCT/CN2021/070082 WO2022036997A1 (en) 2020-08-21 2021-01-04 Chart information extraction method and apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010851106.8A CN111985506A (en) 2020-08-21 2020-08-21 Chart information extraction method and device and storage medium

Publications (1)

Publication Number Publication Date
CN111985506A true CN111985506A (en) 2020-11-24

Family

ID=73442531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010851106.8A Pending CN111985506A (en) 2020-08-21 2020-08-21 Chart information extraction method and device and storage medium

Country Status (2)

Country Link
CN (1) CN111985506A (en)
WO (1) WO2022036997A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022036997A1 (en) * 2020-08-21 2022-02-24 广东电网有限责任公司清远供电局 Chart information extraction method and apparatus, and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452702B (en) * 2023-06-15 2023-08-18 深圳大学 Information chart rapid design method, device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008809A (en) * 2019-01-04 2019-07-12 阿里巴巴集团控股有限公司 Acquisition methods, device and the server of list data
CN110163198A (en) * 2018-09-27 2019-08-23 腾讯科技(深圳)有限公司 A kind of Table recognition method for reconstructing, device and storage medium
CN110502985A (en) * 2019-07-11 2019-11-26 新华三大数据技术有限公司 Table recognition method, apparatus and Table recognition equipment
CN111368638A (en) * 2020-02-10 2020-07-03 深圳追一科技有限公司 Spreadsheet creation method and device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985506A (en) * 2020-08-21 2020-11-24 广东电网有限责任公司清远供电局 Chart information extraction method and device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163198A (en) * 2018-09-27 2019-08-23 腾讯科技(深圳)有限公司 A kind of Table recognition method for reconstructing, device and storage medium
CN110008809A (en) * 2019-01-04 2019-07-12 阿里巴巴集团控股有限公司 Acquisition methods, device and the server of list data
CN110502985A (en) * 2019-07-11 2019-11-26 新华三大数据技术有限公司 Table recognition method, apparatus and Table recognition equipment
CN111368638A (en) * 2020-02-10 2020-07-03 深圳追一科技有限公司 Spreadsheet creation method and device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022036997A1 (en) * 2020-08-21 2022-02-24 广东电网有限责任公司清远供电局 Chart information extraction method and apparatus, and storage medium

Also Published As

Publication number Publication date
WO2022036997A1 (en) 2022-02-24

Similar Documents

Publication Publication Date Title
CN109635268B (en) Method for extracting form information in PDF file
WO2014026483A1 (en) Character identification method and relevant device
EP2833288A1 (en) Face calibration method and system, and computer storage medium
US20150199821A1 (en) Segmentation of a multi-column document
CN111985506A (en) Chart information extraction method and device and storage medium
CN112906695B (en) Form recognition method adapting to multi-class OCR recognition interface and related equipment
CN111737478B (en) Text detection method, electronic device and computer readable medium
CN111091124B (en) Spine character recognition method
CN110490190B (en) Structured image character recognition method and system
CN105184225B (en) A kind of multinational banknote image recognition methods and device
CN111695540A (en) Video frame identification method, video frame cutting device, electronic equipment and medium
CN111222508B (en) ROI-based house type graph scale identification method and device and computer equipment
CN112507782A (en) Text image recognition method and device
JP2011188465A (en) Method and device for detecting direction of document layout
CN112926564A (en) Picture analysis method, system, computer device and computer-readable storage medium
CN112329641A (en) Table identification method, device and equipment and readable storage medium
CN115082941A (en) Form information acquisition method and device for form document image
CN107145888A (en) Video caption real time translating method
JP2003150902A (en) Method and device for dividing image into character image lines, character image recognizing method and device
CN109035285B (en) Image boundary determining method and device, terminal and storage medium
CN110598575B (en) Form layout analysis and extraction method and related device
CN111340000A (en) Method and system for extracting and optimizing PDF document table
CN115457581A (en) Table extraction method and device and computer equipment
CN110263736A (en) A kind of component identification method, apparatus, storage medium and system
CN114882192B (en) Building facade segmentation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination