WO2022036997A1 - 图表信息提取方法、装置和存储介质 - Google Patents

图表信息提取方法、装置和存储介质 Download PDF

Info

Publication number
WO2022036997A1
WO2022036997A1 PCT/CN2021/070082 CN2021070082W WO2022036997A1 WO 2022036997 A1 WO2022036997 A1 WO 2022036997A1 CN 2021070082 W CN2021070082 W CN 2021070082W WO 2022036997 A1 WO2022036997 A1 WO 2022036997A1
Authority
WO
WIPO (PCT)
Prior art keywords
intersection
extracted
chart
information
frame line
Prior art date
Application number
PCT/CN2021/070082
Other languages
English (en)
French (fr)
Inventor
陈松波
李聪
谭伟
王文博
胡金磊
徐刚
汪密
李文航
欧阳业
陈俊
Original Assignee
广东电网有限责任公司清远供电局
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广东电网有限责任公司清远供电局 filed Critical 广东电网有限责任公司清远供电局
Publication of WO2022036997A1 publication Critical patent/WO2022036997A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the embodiments of the present application relate to the technical field of image recognition, for example, to a method, device, and storage medium for extracting chart information.
  • the present application provides a method, device and storage medium for extracting chart information, which can not only efficiently, flexibly and accurately extract key information in an image table, but also improve the work efficiency of grid security supervisors in reviewing relevant image tables.
  • the embodiment of the present application provides a method for extracting chart information, and the method includes:
  • Text recognition is performed on each minimum recognition unit in the information chart to be extracted to obtain chart information of the information chart to be extracted.
  • the embodiment of the present application also provides a chart information extraction device, the device includes:
  • a preprocessing unit configured to obtain the information chart to be extracted, and preprocess the information chart to be extracted
  • a frame line extraction unit configured to extract the preprocessed horizontal frame line and vertical frame line of the information chart to be extracted based on the principle of opening and closing operation
  • an intersection determining unit configured to perform an intersection operation on the extracted horizontal frame line and the vertical frame line, to obtain the intersection point of the information chart to be extracted
  • an area determination unit configured to determine a plurality of minimum identification units of the information chart to be extracted based on the intersection of the information chart to be extracted, the horizontal frame line and the vertical frame line;
  • a text recognition unit configured to perform text recognition on each minimum recognition unit in the to-be-extracted information chart to obtain chart information of the to-be-extracted information chart.
  • the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the chart information extraction method described in any embodiment of the present application.
  • FIG. 1 is a flowchart of a method for extracting chart information provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a minimum identification unit provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of another method for extracting chart information provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of another minimum identification unit provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of another method for extracting chart information provided by an embodiment of the present application.
  • FIG. 7 is a flowchart of another method for extracting chart information provided by an embodiment of the present application.
  • FIG. 8 is a structural diagram of an apparatus for extracting chart information provided by an embodiment of the present application.
  • FIG. 1 is a flowchart of a method for extracting chart information provided by an embodiment of the present application.
  • the method for extracting chart information includes the following steps:
  • Step S101 acquiring the information chart to be extracted, and preprocessing the information chart to be extracted.
  • the paper image form For the paper image form, to extract the key information in the image form, the paper image form needs to be converted into image data first, so the paper image form needs to be scanned to obtain the paper image form image data, and then preprocess the image data to obtain the preprocessed image information.
  • Step S102 extracting the preprocessed horizontal frame line and vertical frame line of the to-be-extracted information chart based on the opening and closing operation principle.
  • the horizontal and vertical frame lines of the preprocessed information chart to be extracted can be extracted for subsequent use of the horizontal frame lines.
  • the vertical frame line to determine the position of the text information of the infographic to be extracted.
  • the horizontal frame is the frame in the row direction
  • the vertical frame is the frame in the column direction.
  • step S102 extracting the preprocessed horizontal frame line and the vertical frame line of the information chart to be extracted based on the opening and closing operation principle includes: using the formula: The horizontal and vertical frame lines of the preprocessed information chart to be extracted are extracted, wherein F is the information chart to be extracted, Y is the frame line extraction result, and G represents a structural element.
  • the extraction of the frame lines of the information chart to be extracted can be divided into vertical frame line extraction and horizontal frame line extraction.
  • the extraction process of these two frame lines is realized by defining two structural elements G h and G w with different shapes respectively.
  • G h is a rectangular area with a horizontal length of 1 pixel and a vertical length of h pixels, expressed as G h (1,h)
  • G w is a horizontal length of h pixels and a vertical length of 1.
  • step S103 an intersection operation is performed on the extracted horizontal frame line and the vertical frame line to obtain the intersection point of the information chart to be extracted.
  • the extraction results of the horizontal and vertical frame lines of the table are denoted as Yw and Yh.
  • the intersection point of the to-be-extracted information chart can be obtained through an intersection operation.
  • Step S104 Determine a plurality of minimum identification units of the information chart to be extracted based on the intersection point, the horizontal frame line and the vertical frame line of the information chart to be extracted.
  • the minimum identification unit is the smallest graphic unit that constitutes an image table
  • FIG. 2 is a schematic diagram of a minimum identification unit provided by an embodiment of the present application. Referring to FIG. 2, the minimum graphic units A, B, C, etc. shown in FIG. 2 are all minimum identification units.
  • Step S105 performing text recognition on each minimum identification unit in the information chart to be extracted, to obtain chart information of the information chart to be extracted.
  • the intersection of the information chart to be extracted is determined by the extracted horizontal frame line and the vertical frame line, and the minimum identification unit of the information chart to be extracted is further determined, and then the text information of the minimum identification unit is extracted to obtain the information to be extracted.
  • the chart information of the information can not only efficiently, flexibly and accurately extract the key information in the image table, but also improve the work efficiency of the power grid security inspector in reviewing the relevant image table.
  • FIG. 3 is a flowchart of another method for extracting chart information provided by an embodiment of the present application. As shown in FIG. 3 , the method for extracting chart information provided by this embodiment includes the following steps:
  • Step S301 acquiring the image of the information chart to be extracted.
  • Step S302 it is determined whether the image is tilted.
  • Step S303 if the image is tilted, correct the image by using an image tilt angle detection algorithm based on directional projection.
  • Step S304 performing a binarization process on the corrected image to obtain a preprocessed information chart to be extracted.
  • the scanned image form may be tilted.
  • the image data of the information chart to be extracted it is necessary to determine whether the image is tilted. If the image is tilted, the tilted image needs to be corrected; in order to make the correction algorithm less computationally expensive and more robust, the image tilt detection algorithm based on directional projection is selected to complete the image correction; The resulting image is subjected to binarization processing, and finally a preprocessed information chart to be extracted is obtained.
  • Step S305 extracting the preprocessed horizontal frame line and vertical frame line of the information chart to be extracted based on the principle of opening and closing operation.
  • step S306 an intersection operation is performed on the extracted horizontal frame line and the vertical frame line to obtain the intersection point of the information chart to be extracted.
  • Step S307 Determine a plurality of minimum identification units of the information chart to be extracted based on the intersection point, the horizontal frame line and the vertical frame line of the information chart to be extracted.
  • Step S308 Perform text recognition on each minimum identification unit in the information chart to be extracted to obtain chart information of the information chart to be extracted.
  • FIG. 4 is a flowchart of another method for extracting chart information provided by an embodiment of the present application. As shown in FIG. 4 , the method for extracting chart information provided by this embodiment includes the following steps:
  • Step S401 acquiring the information chart to be extracted, and preprocessing the information chart to be extracted.
  • Step S402 extracting the preprocessed horizontal frame line and vertical frame line of the to-be-extracted information chart based on the opening and closing operation principle.
  • step S403 an intersection operation is performed on the extracted horizontal frame line and the vertical frame line to obtain the intersection point of the information chart to be extracted.
  • Step S404 Detect whether there is a false intersection in the horizontal direction of the intersection of the information graph to be extracted. If there is a false intersection in the horizontal direction of the intersection of the information graph to be extracted, filter out the detected false intersection to obtain the target horizontal intersection.
  • Step S405 Detect whether there is a false intersection in the longitudinal direction of the horizontal intersection of the target. If there is a false intersection in the longitudinal direction of the horizontal intersection of the target, filter out the detected false intersection to obtain the target intersection.
  • Step S406 Determine a plurality of minimum identification units of the information chart to be extracted based on the target intersection, the horizontal frame line and the vertical frame line.
  • the intersection point within the two dotted circle boxes has no practical significance for determining the minimum identification unit A. In other words, it cannot be determined by using the two intersection points within the dotted circle box.
  • the smallest recognition unit A at this time, the intersection point in the dotted circle frame is the false intersection point. In order to accurately obtain the minimum identification unit, it is necessary to first detect whether there is a false intersection in the intersection of the obtained information chart to be extracted.
  • FIG. 5 is a schematic diagram of another minimum identification unit provided by an embodiment of the present application. Referring to FIG. 5 , the two intersection points in the dotted circular frame shown in FIG. 5 , for the upper left corner of the minimum identification unit N(2,1) The intersection point is two vertical false intersection points, which need to be filtered out.
  • Step S407 performing text recognition on each minimum identification unit in the information chart to be extracted, to obtain chart information of the information chart to be extracted.
  • the embodiment it is detected whether there is a false intersection in the horizontal direction of the intersection of the information chart to be extracted. target lateral intersection; and detecting whether there is a false intersection in the longitudinal direction of the target lateral intersection, if there is a false intersection in the longitudinal direction of the target lateral intersection, filter out the detected false intersection, and obtain the target intersection for description.
  • the method for extracting chart information includes the following steps:
  • Step S601 acquiring the information chart to be extracted, and preprocessing the information chart to be extracted.
  • Step S602 extracting the preprocessed horizontal frame line and vertical frame line of the information chart to be extracted based on the principle of opening and closing operation.
  • step S603 an intersection operation is performed on the extracted horizontal frame line and the vertical frame line to obtain the intersection point of the information chart to be extracted.
  • Step S604 Detect whether there is a straight line below the intersection (mi, j) of each information chart to be extracted in turn from left to right in the horizontal direction, where the intersection (mi, j) is the intersection of the i-th row and the j-th column, 1 ⁇ i ⁇ n1, 1 ⁇ j ⁇ n2, n1 is the total number of rows of the information graph, and n2 is the total number of columns of the information graph.
  • FIG. 2 there are 5 rows and 7 columns in the information chart shown in FIG. 2 . 1,2), the intersection at the upper left corner of the minimum identification unit N(1,3), the intersection at the upper left corner of the minimum identification unit N(1,4), and the intersection at the upper left corner of the minimum identification unit N(2, 3)
  • the intersection at the upper left corner, the intersection at the upper left corner of the minimum recognition unit N(2,4), and the intersection at the lower right corner of the minimum recognition unit N(2,4) are located in 7 columns of the infographic, respectively.
  • Step S605 if there is no straight line in the vertical direction of the intersection (mi, j) of the information chart to be extracted, the intersection (mi, j) of the information chart to be extracted is a false intersection, and the false intersection is filtered out to obtain the target horizontal intersection.
  • Portrait is the direction in which the first row of the chart points to the last row. For the determination of the horizontal false intersection, it is only necessary to determine whether there is a straight line below the intersection of the information chart to be extracted.
  • the minimum identification unit N(2, 2) in FIG. ) in the upper left corner of the first intersection (m2, 2) has a straight line in the longitudinal direction, that is, the left longitudinal frame line of the minimum identification unit N (2, 2). Therefore, the intersection (m2, 2) is not a false intersection, for the intersection ( For m2, 3), there is no straight line in the longitudinal direction. Therefore, the intersection point (m2, 3) is a false intersection point, which needs to be filtered out. , 5) is not a false intersection and needs to be preserved.
  • Step S606 successively acquire all the intersection points below the horizontally adjacent target lateral intersection points of each target lateral intersection point, and obtain a plurality of diagonal points of each target lateral intersection point, wherein one of the all intersection points is The intersection point is one diagonal point among the plurality of diagonal intersection points.
  • Step S607 Detect whether there is a corresponding horizontal frame line at the projection in the horizontal direction of the connection line between the horizontal intersection of each target and each diagonal point of the horizontal intersection of each target.
  • Step S608 if there is no corresponding horizontal frame line at the projection of the connecting line between each target horizontal intersection point and each diagonal point in the horizontal direction, then each diagonal point is false. Intersection, filter out false intersections to get the target intersection.
  • the target lateral intersections After the target lateral intersections are determined, it is only necessary to sequentially detect the intersections (that is, diagonal points) below the longitudinal direction of the adjacent target lateral intersections in the horizontal direction from the target lateral intersections (ie, diagonal points), the target lateral intersections.
  • the diagonal point is the required target intersection point and reserved, if there is no corresponding horizontal frame line at the projection of the connection between the target horizontal intersection point and the diagonal point in the horizontal direction, the diagonal The line point is a false intersection point, which needs to be filtered out, and then continue to judge the next intersection point along the vertical direction of the horizontal adjacent target horizontal intersection point of the target horizontal intersection point, until the connection between the target horizontal intersection point and the diagonal point is detected.
  • the diagonal points that do not have a corresponding horizontal frame line at the projection of the connection line with the target horizontal intersection point in the horizontal direction are the vertical false intersection points that need to be filtered out.
  • the first intersection point (m2, 1) in the upper left corner of the minimum identification unit N(2, 1) is the target horizontal intersection point, and the target horizontal
  • the first intersection point below the horizontally adjacent target horizontal intersection point (m2, 2) of the intersection point (m2, 1) is taken as the diagonal point (m3, 2) (that is, the intersection point in the upper circular dotted line frame in Figure 5 ), there is no corresponding horizontal frame line at the projection of the connection between the target horizontal intersection point (m2, 1) and the diagonal point (m3, 2) in the horizontal direction (that is, there is a horizontal frame at the dotted line shown in Figure 5).
  • the diagonal point (m3, 2) is a false intersection and needs to be filtered out; similarly, the diagonal point (m4, 2) (that is, the intersection in the circle dotted line box below in Figure 5) is also False intersections should also be filtered out, and because the connection between the intersection (m5, 2) (that is, the intersection in the square dashed box in Figure 5) and the target horizontal intersection (m2, 1) has a horizontal projection in the horizontal direction There is a horizontal frame line , so the intersection (m5, 2) is not a spurious intersection and needs to be preserved.
  • Step S609 Determine a plurality of minimum identification units of the information chart to be extracted based on the target intersection, the horizontal frame line and the vertical frame line.
  • Step S610 performing text recognition on each minimum identification unit in the information chart to be extracted, to obtain chart information of the information chart to be extracted.
  • this embodiment describes the above embodiments. As shown in FIG. 7 , the method for extracting chart information provided by this embodiment further includes the following steps:
  • Step S701 acquiring the information chart to be extracted, and preprocessing the information chart to be extracted.
  • Step S702 extracting the preprocessed horizontal frame line and vertical frame line of the to-be-extracted information chart based on the open-close operation principle.
  • Step S703 Perform an intersection operation on the extracted horizontal frame line and vertical frame line to obtain the intersection point of the information chart to be extracted.
  • Step S704 Determine a plurality of minimum identification units of the information chart to be extracted based on the intersection point, the horizontal frame line and the vertical frame line of the information chart to be extracted.
  • Step S705 using the determined multiple minimum identification units to locate the area of the information chart to be extracted.
  • Step S706 perform text recognition on each minimum recognition unit in the information chart to be extracted after regional positioning.
  • each minimum identification unit in the information chart to be extracted in the information chart to be extracted it is also necessary to perform regional positioning on the determined multiple minimum identification units in the information chart to be extracted, and finally after the regional positioning
  • Each minimum recognition unit in the to-be-extracted information chart performs text recognition to obtain chart information of the to-be-extracted information chart.
  • step S705 using the determined multiple minimum identification units to perform regional positioning on the information chart to be extracted includes:
  • the information chart to be extracted is cut;
  • Each minimum identification unit after cutting is encoded to obtain the information chart to be extracted after regional positioning.
  • the picture coding of a minimum identification unit may be denoted as N(p, q), (both p and q are positive integers greater than or equal to 1).
  • p represents the number of rows where the smallest recognition unit is located
  • q represents the number of columns in the row where the smallest recognition unit is located. Therefore, the key information extracted from the information graph to be extracted can be expressed as extracting information corresponding to a series of sequences N(p, q).
  • An embodiment of the present application also provides a chart information extraction apparatus, which is used to execute the chart information extraction method provided by the above embodiments of the present application.
  • the following describes the chart information extraction apparatus provided by the embodiments of the present application.
  • FIG. 8 is a structural diagram of a chart information extraction device provided by an embodiment of the present application.
  • the chart information extraction device mainly includes: a preprocessing unit 81 , a frame line extraction unit 82 , an intersection determination unit 83 , a region The determination unit 84 and the text recognition unit 85, wherein: the preprocessing unit 81 is set to obtain the information chart to be extracted, and preprocess the information chart to be extracted; the frame line extraction unit 82 is set to extract the preprocessed information based on the principle of opening and closing operation The horizontal frame line and the vertical frame line of the information chart to be extracted; the intersection determination unit 83 is set to perform the intersection operation with the extracted horizontal frame line and the vertical frame line to obtain the intersection point of the information chart to be extracted; the area determination unit 84 is set to Based on the intersection of the information chart to be extracted, the horizontal frame line and the vertical frame line, a plurality of minimum recognition units of the information chart to be extracted are determined; the text recognition unit 85 is set to perform text
  • the area determination unit 84 includes: a lateral detection subunit, configured to detect whether there is a false intersection in the lateral direction of the intersection of the information chart to be extracted, and if there is a fake intersection in the lateral direction of the intersection of the information chart to be extracted, filter out the detected information.
  • the false intersection of the target is obtained, and the horizontal intersection of the target is obtained;
  • the vertical detection subunit is set to detect whether there is a false intersection in the longitudinal direction of the horizontal intersection of the target. ; determine a subunit, set to determine a plurality of minimum identification units of the information chart to be extracted based on the target intersection, the horizontal frame line and the vertical frame line.
  • the lateral detection subunit is set to: sequentially detect from left to right along the lateral direction whether there is a straight line below the intersection (mi, j) of each information chart to be extracted, wherein, the intersection (mi, j) is the th The intersection point of row i and column j, 1 ⁇ i ⁇ n1, 1 ⁇ j ⁇ n2, n1 is the total number of rows of the infographic, n2 is the total number of columns of the infographic; if the intersection of the infographic to be extracted (mi, If there is no straight line in the vertical direction of j), the intersection (mi, j) of the information chart to be extracted is a false intersection, and the false intersection is filtered out to obtain the target horizontal intersection.
  • the vertical detection subunit is set to: successively obtain all the intersection points below the vertical direction of the horizontally adjacent target horizontal intersection points of each target horizontal intersection point, and obtain a plurality of diagonal points of each target horizontal intersection point, Wherein, one of the all intersection points is one of the plurality of diagonal intersection points; detecting each diagonal point of the horizontal intersection of each target and the horizontal intersection of each target Whether there is a corresponding horizontal frame line at the horizontal projection of the connection between The horizontal frame line of , then each diagonal point is a false intersection, and the false intersection is filtered out to obtain the target intersection.
  • the apparatus for extracting chart information further includes: a region positioning unit, configured to perform region positioning on the information chart to be extracted by using the determined multiple minimum identification units.
  • the area positioning unit includes: a cutting subunit, configured to cut the information chart to be extracted according to the determined multiple minimum identification units; a coding subunit, configured to encode each minimum identification unit after the cut, to obtain The information chart to be extracted after regional positioning.
  • the preprocessing unit 81 includes: an acquiring subunit, configured to acquire the image of the information chart to be extracted; a judging subunit, configured to determine whether the image is tilted; The image is corrected by using the image inclination detection algorithm based on directional projection; the binarization subunit is set to binarize the corrected image to obtain the preprocessed information chart to be extracted.
  • the frame line extraction unit 82 is configured to: use the formula The horizontal and vertical frame lines of the preprocessed information chart to be extracted are extracted, wherein F is the information chart to be extracted, Y is the frame line extraction result, and G represents a structural element.
  • the chart information extraction method provided by the embodiment of the present application has the same technical features as the chart information extraction device provided by the above-mentioned embodiment, so it can also solve the same technical problem.
  • Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and when executed by a computer processor, the computer program is used to execute the chart information extraction method provided by any embodiment of the present application.
  • the chart information extraction method includes: acquiring the information chart to be extracted, and preprocessing the information chart to be extracted; extracting the preprocessed horizontal and vertical frame lines of the information chart to be extracted based on the principle of opening and closing operations; Perform an intersection operation on the extracted horizontal frame line and the vertical frame line to obtain the intersection of the to-be-extracted information chart; based on the intersection of the to-be-extracted information chart, the horizontal frame line and the vertical frame line Determine a plurality of minimum identification units of the information chart to be extracted; perform text recognition on each of the minimum identification units in the information chart to be extracted to obtain chart information of the information chart to be extracted.
  • a computer-readable storage medium provided by an embodiment of the present application when the computer program stored on the computer-readable storage medium is executed by a computer processor is not limited to the above-mentioned method operations, and can also execute any of the embodiments of the present application. Related operations in the provided chart information extraction method.
  • the present application can be implemented by means of software and general hardware, and certainly can also be implemented by hardware.
  • the technical solution of the present application can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as a floppy disk of a computer, a read-only memory (Read-Only Memory, ROM), Random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disk, etc., including multiple instructions to make a computer device (which may be a personal computer, server, or network device, etc.) The method described in various embodiments.
  • the multiple units and modules included are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; in addition, multiple functional units
  • the names are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of this application.
  • the terms “installed”, “connected” and “connected” should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection, or It can be connected in one piece; it can be a mechanical connection or an electrical connection; it can be directly connected or indirectly connected through an intermediate medium, and it can be internal communication between two components.
  • installed should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection, or It can be connected in one piece; it can be a mechanical connection or an electrical connection; it can be directly connected or indirectly connected through an intermediate medium, and it can be internal communication between two components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

一种图表信息提取方法、装置和存储介质,方法包括:获取待提取信息图表,并对待提取信息图表进行预处理(S101);基于开闭运算原理提取预处理后的待提取信息图表的横向框线和纵向框线(S102);将提取出的横向框线和纵向框线做交集运算,得到待提取信息图表的交点(S103);基于待提取信息图表的交点、横向框线和纵向框线确定待提取信息图表的多个最小识别单元(S104);对待提取信息图表中的每个最小识别单元进行文本识别,得到待提取信息图表的图表信息(S105)。

Description

图表信息提取方法、装置和存储介质
本申请要求在2020年8月21日提交中国专利局、申请号为202010851106.8的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及图像识别技术领域,例如涉及一种图表信息提取方法、装置和存储介质。
背景技术
日常的电力生产和运行维护过程中,通常伴随着大量的图像表格,一方面,这些图表可以用于记录多项数据,另一方面,有些图表还可以用于指导规范的操作。
但是,这些图像表格存在数量多、格式固定、表格间的关联性强以及需要检查的关键信息分散等特点,因而极大地影响了电网安全监察人员审查这些图像表格的工作效率。
发明内容
本申请提供一种图表信息提取方法、装置和存储介质,不仅能够高效、灵活、准确地提取出图像表格中的关键信息,还提高了电网安全监察人员对相关图像表格审查的工作效率。
本申请实施例提供了一种图表信息提取方法,所述方法包括:
获取待提取信息图表,并对所述待提取信息图表进行预处理;
基于开闭运算原理提取预处理后的所述待提取信息图表的横向框线和纵向框线;
将提取出的所述横向框线和所述纵向框线做交集运算,得到所述待提取信息图表的交点;
基于所述待提取信息图表的交点、所述横向框线和所述纵向框线确定所述待提取信息图表的多个最小识别单元;
对所述待提取信息图表中的每个最小识别单元进行文本识别,得到所述待提取信息图表的图表信息。
本申请实施例还提供了一种图表信息提取装置,所述装置包括:
预处理单元,设置为获取待提取信息图表,并对所述待提取信息图表进行预处理;
框线提取单元,设置为基于开闭运算原理提取预处理后的所述待提取信息图表的横向框线和纵向框线;
交点确定单元,设置为提取出的所述横向框线和所述纵向框线做交集运算,得到所述待提取信息图表的交点;
区域确定单元,设置为基于所述待提取信息图表的交点、所述横向框线和所述纵向框线确定所述待提取信息图表的多个最小识别单元;
文本识别单元,设置为对所述待提取信息图表中的每个最小识别单元进行文本识别,得到所述待提取信息图表的图表信息。
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本申请任一实施例所述的图表信息提取方法。
附图说明
图1是本申请实施例提供的一种图表信息提取方法的流程图;
图2是本申请实施例提供的一种最小识别单元的示意图;
图3是本申请实施例提供的另一种图表信息提取方法的流程图;
图4是本申请实施例提供的又一种图表信息提取方法的流程图;
图5是本申请实施例提供的另一种最小识别单元的示意图;
图6是本申请实施例提供的又一种图表信息提取方法的流程图;
图7是本申请实施例提供的又一种图表信息提取方法的流程图;
图8是本申请实施例提供的一种图表信息提取装置的结构图。
具体实施方式
下面结合附图和实施例对本申请进行说明。此处所描述的实施例仅仅用于解释本申请,而非对本申请的限定。另外,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
本申请的说明书和权利要求书及附图中的术语第一、第二等是用于区别不同对象,而不是用于限定指定顺序。本申请下述多个实施例可以单独执行,多个实施例之间也可以相互结合执行,本申请实施例对此不作限制。
图1是本申请实施例提供的一种图表信息提取方法的流程图。
如图1所述,该图表信息提取方法包括如下步骤:
步骤S101,获取待提取信息图表,并对待提取信息图表进行预处理。
对于纸质的图像表格来说,要提取出图像表格中的关键信息,首先需要将纸质的图像表格变为图像数据,因此需要对纸质的图像表格进行扫描,以获取纸质的图像表格的图像数据,然后对图像数据进行预处理,得到预处理后的图像信息。
步骤S102,基于开闭运算原理提取预处理后的待提取信息图表的横向框线和纵向框线。
由于电力行业中的图像表格都是由横向框线和纵向框线组成的,因此,可以将预处理后的待提取信息图表的横向框线和纵向框线提取出来,以备后续通过横向框线和纵向框线确定待提取信息图表的文本信息的位置。横向框线为行方向的框线,纵向框线为列方向的框线。
可选地,步骤S102,基于开闭运算原理提取预处理后的待提取信息图表的横向框线和纵向框线包括:利用公式
Figure PCTCN2021070082-appb-000001
提取预处理后的待提取信息图 表的横向框线和纵向框线,其中,F为待提取信息图表,Y为框线提取结果,G表示结构元素。
可以利用公式
Figure PCTCN2021070082-appb-000002
使用数学形态中的开闭运算来提取表格的横向框线和纵向框线,F表示待提取信息图表,Y表示框线提取结果,G表示结构元素。
待提取信息图表的框线的提取可以分为纵向框线提取和横向框线提取。这两种框线的提取过程是通过分别定义两个不同形状的结构元素G h和G w来实现的。G h是一个横向长度为1个像素点、纵向长度为h个像素点的矩形区域,表示为G h(1,h),而G w是一个横向长度为h个像素点、纵向长度为1个像素点的矩形区域,表示为G w(h,1),h是表格中显示一个文字横向或纵向所需的最大像素点的个数。
步骤S103,将提取出的横向框线和纵向框线做交集运算,得到待提取信息图表的交点。
表格的横向框线和纵向框线的提取结果记为Yw和Yh。在提取出预处理后的待提取信息图表的横向框线和纵向框线之后,可以通过交集运算得到待提取信息图表的交点。可以通过公式M 1=Y w∩Y h对提取出的横向框线和纵向框线进行交集运算,M 1为相应的表格交点提取结果。
步骤S104,基于待提取信息图表的交点、横向框线和纵向框线确定待提取信息图表的多个最小识别单元。
最小识别单元是组成图像表格的最小图形单元,图2是本申请实施例提供的一种最小识别单元的示意图。参见图2,图2中所示的最小图形单元A、B、C等均为最小识别单元。
步骤S105,对待提取信息图表中的每个最小识别单元进行文本识别,得到 待提取信息图表的图表信息。
在确定出待提取信息图表中的最小识别单元之后,对待提取信息图表中的每个最小识别单元进行文本识别,最终得到待提取信息图表的图表信息。可以灵活采用第三方提供的在线或离线文本识别服务来完成对每个最小识别单元进行文本识别,以满足不同场景的需求。
在本申请实施例中,通过提取出的横向框线和纵向框线确定待提取信息图表的交点,并进一步确定待提取信息图表的最小识别单元,继而通过提取最小识别单元的文本信息得到待提取信息的图表信息,不仅能够高效、灵活、准确地提取出图像表格中的关键信息,还提高了电网安全监察人员对相关图像表格审查的工作效率。
基于上述技术方案,本实施例对上述实施例中获取待提取信息图表,并对待提取信息图表进行预处理进行说明。图3是本申请实施例提供的另一种图表信息提取方法的流程图,如图3所示,本实施例提供的图表信息提取方法包括如下步骤:
步骤S301,获取待提取信息图表的图像。
步骤S302,判断图像是否发生倾斜。
步骤S303,若图像发生倾斜,则利用基于方向投影的图像倾角检测算法对图像进行校正。
步骤S304,对校正后的图像进行二值化处理,得到预处理后的待提取信息图表。
针对图像表格在扫描过程中纸质表格的位置可能摆放不正而造成扫描得到的图像表格存在倾斜的问题,在获取到待提取信息图表的图像数据之后,需要判断该图像是否发生倾斜,若该图像发生倾斜,则需要对发生倾斜的图像进行 校正;为了使得校正算法的计算量小且鲁棒性强,选择基于方向投影的图像倾角检测算法来完成图像的校正;在完成校正之后,对校正后的图像进行二值化处理,最终得到预处理后的待提取信息图表。
步骤S305,基于开闭运算原理提取预处理后的待提取信息图表的横向框线和纵向框线。
步骤S306,将提取出的横向框线和纵向框线做交集运算,得到待提取信息图表的交点。
步骤S307,基于待提取信息图表的交点、横向框线和纵向框线确定待提取信息图表的多个最小识别单元。
步骤S308,对待提取信息图表中的每个最小识别单元进行文本识别,得到待提取信息图表的图表信息。
通过使用本实施例提供的图表信息提取方法,不仅能够高效、灵活、准确地提取出图像表格中的关键信息,还提高了电网安全监察人员对相关图像表格审查的工作效率。
基于上述技术方案,本实施例对基于所述待提取信息图表的交点、所述横向框线和所述纵向框线确定所述待提取信息图表的最小识别单元进行说明。图4是本申请实施例提供的又一种图表信息提取方法的流程图,如图4所示,本实施例提供的图表信息提取方法包括如下步骤:
步骤S401,获取待提取信息图表,并对待提取信息图表进行预处理。
步骤S402,基于开闭运算原理提取预处理后的待提取信息图表的横向框线和纵向框线。
步骤S403,将提取出的横向框线和纵向框线做交集运算,得到待提取信息图表的交点。
步骤S404,检测待提取信息图表的交点的横向是否存在虚假交点,若待提取信息图表的交点的横向存在虚假交点,则过滤掉检测到的虚假交点,得到目标横向交点。
步骤S405,检测目标横向交点的纵向是否存在虚假交点,若目标横向交点的纵向存在虚假交点,则将检测到的虚假交点过滤掉,得到目标交点。
步骤S406,基于目标交点、横向框线和纵向框线确定待提取信息图表的多个最小识别单元。
参见图2,对于最小识别单元A来说,两个虚线圆形框内的交点对于确定最小识别单元A来说没有实际的意义,换句话说,利用虚线圆形框内的两个交点不能确定最小识别单元A,此时,虚线圆形框内的交点即为虚假交点。为了能够准确得到最小识别单元,需要先检测得到的待提取信息图表的交点中是否存在虚假交点。
对于虚假交点的确定,首先需要检测待提取信息图表的交点的横向是否存在虚假交点,例如图2中所示的虚线圆形框内的两个交点,对于最小识别单元A左上角的交点来说就是两个虚假交点;在将横向的虚假交点过滤掉之后,得到的剩余交点即为目标横向交点,然后对目标横向交点进一步检测,以确定每个目标横向交点的纵向是否存在虚假交点,在将纵向的虚假交点过滤掉之后,剩余的交点即为目标交点。图5是本申请实施例提供的另一种最小识别单元的示意图,参见图5,图5所示的虚线圆形框内的两个交点,对于最小识别单元N(2,1)左上角的交点来说为两个纵向的虚假交点,需要过滤掉。
通过剩余的目标交点以及识别出的横向框线和纵向框线即可准确地确定出待提取信息图表的多个最小识别单元。
步骤S407,对待提取信息图表中的每个最小识别单元进行文本识别,得到 待提取信息图表的图表信息。
通过使用本实施例提供的图表信息提取方法,不仅能够高效、灵活、准确地提取出图像表格中的关键信息,还提高了电网安全监察人员对相关图像表格审查的工作效率。
基于上述技术方案,在实施例对检测所述待提取信息图表的交点的横向是否存在虚假交点,若所述待提取信息图表的交点的横向存在虚假交点,则过滤掉检测到的虚假交点,得到目标横向交点;以及检测所述目标横向交点的纵向是否存在虚假交点,若所述目标横向交点的纵向存在虚假交点,则将检测到的虚假交点过滤掉,得到目标交点分别进行说明。如图6所示,本实施例提供的图表信息提取方法包括如下步骤:
步骤S601,获取待提取信息图表,并对待提取信息图表进行预处理。
步骤S602,基于开闭运算原理提取预处理后的待提取信息图表的横向框线和纵向框线。
步骤S603,将提取出的横向框线和纵向框线做交集运算,得到待提取信息图表的交点。
步骤S604,沿横向从左至右依次检测每个待提取信息图表的交点(mi,j)的纵向下方是否存在直线,其中,交点(mi,j)为第i行第j列的交点,1≤i≤n1,1≤j≤n2,n1为所述信息图表的总行数,n2为所述信息图表的总列数。
在本实施例中,以图2为例,图2所示的信息图表中共存在5行7列,其中,位于最小识别单元N(1,1)的左上角的交点、位于最小识别单元N(1,2)的左上角的交点、位于最小识别单元N(1,3)的左上角的交点、位于最小识别单元N(1,4)的左上角的交点、位于最小识别单元N(2,3)的左上角的交点、位于最小识别单元N(2,4)的左上角的交点以及位于最小识别单元N(2,4)的 右下角的交点分别位于信息图表的7列中。
步骤S605,若待提取信息图表的交点(mi,j)的纵向不存在直线,则待提取信息图表的交点(mi,j)为虚假交点,将虚假交点过滤掉,得到目标横向交点。
纵向即图表的第一行指向最后一行的方向。对于横向虚假交点的确定,只需要确定待提取信息图表的交点的纵向下方是否存在直线即可,以图2中的最小识别单元N(2,2)为例,最小识别单元N(2,2)的左上角第一个交点(m2,2)的纵向有直线,即最小识别单元N(2,2)的左侧纵向框线,因此,交点(m2,2)不是虚假交点,对于交点(m2,3)来说,其纵向没有直线,因此,交点(m2,3)是虚假交点,需要过滤掉,相应的,交点(m2,4)也是虚假交点,也要过滤掉,而交点(m2,5)不是虚假交点,需要保留。
步骤S606,依次获取每个目标横向交点的水平相邻的目标横向交点的纵向下方的全部交点,得到所述每个目标横向交点的多个对角线点,其中,所述全部交点中的一个交点为所述多个对角线交点中的一个对角线点。
步骤S607,检测所述每个目标横向交点与所述每个目标横向交点的每个对角线点之间的连线在横向上的投影处是否存在对应的横向框线。
步骤S608,若所述每个目标横向交点与所述每个对角线点之间的连线在横向上的投影处不存在对应的横向框线,则所述每个对角线点为虚假交点,将虚假交点过滤掉,得到目标交点。
对于纵向虚假交点的确定,在确定了目标横向交点之后,仅需要依次检测由该目标横向交点沿水平方向的相邻目标横向交点的纵向下方的交点(即对角线点),该目标横向交点与该交点之间的连线在横向上的投影处是否存在对应的横向框线,若该目标横向交点与对角线点之间的连线在横向上的投影处存在 对应的横向框线,则可确定该对角线点为所需要的目标交点,予以保留,若该目标横向交点与对角线点之间的连线在横向上的投影处不存在对应的横向框线,该对角线点为虚假交点,需要过滤掉,然后沿着该目标横向交点的水平相邻目标横向交点的纵向下方继续判别下一个交点,直到检测到该目标横向交点与对角线点之间的连线在横向上的投影存在横向框线为止,与该目标横向交点之间的连线在横向上的投影处不存在对应的横向框线的对角线点均为需要过滤掉的纵向虚假交点。
参见图5,以图5中的最小识别单元N(2,1)为例,最小识别单元N(2,1)的左上角第一个交点(m2,1)为目标横向交点,该目标横向交点(m2,1)的水平相邻目标横向交点(m2,2)的纵向下方的第一个交点作为对角线点(m3,2)(即图5中上方的圆形虚线框内的交点),目标横向交点(m2,1)与对角线点(m3,2)之间的连线在横向上的投影处不存在对应的横向框线(即图5中所示虚线处存在横向框线),因此,对角线点(m3,2)是虚假交点,需要过滤掉;同理,对角线点(m4,2)(即图5中下方的圆形虚线框内的交点)也是虚假交点,也要过滤掉,而由于交点(m5,2)(即图5中方形虚线框内的交点)和目标横向交点(m2,1)的连线在横向上的投影处存在横向框线,因此交点(m5,2)不是虚假交点,需要保留。
步骤S609,基于目标交点、横向框线和纵向框线确定待提取信息图表的多个最小识别单元。
步骤S610,对待提取信息图表中的每个最小识别单元进行文本识别,得到待提取信息图表的图表信息。
通过使用本实施例提供的图表信息提取方法,不仅能够高效、灵活、准确地提取出图像表格中的关键信息,还提高了电网安全监察人员对相关图像表格 审查的工作效率。
基于上述技术方案,本实施例对上述实施例进行说明。如图7所示,本实施例提供的图表信息提取方法还包括如下步骤:
步骤S701,获取待提取信息图表,并对待提取信息图表进行预处理。
步骤S702,基于开闭运算原理提取预处理后的待提取信息图表的横向框线和纵向框线。
步骤S703,将提取出的横向框线和纵向框线做交集运算,得到待提取信息图表的交点。
步骤S704,基于待提取信息图表的交点、横向框线和纵向框线确定待提取信息图表的多个最小识别单元。
步骤S705,利用确定出的多个最小识别单元对待提取信息图表进行区域定位。
步骤S706,对区域定位后的待提取信息图表中的每个最小识别单元进行文本识别。
为了准确地找到待提取信息图表中的每一块最小识别单元在待提取信息图表中的位置,还需要对确定出的多个最小识别单元在待提取信息图表中进行区域定位,最终对区域定位后的待提取信息图表中的每个最小识别单元进行文本识别,得到待提取信息图表的图表信息。
可选地,步骤S705,利用确定出的多个最小识别单元对待提取信息图表进行区域定位包括:
依据确定出的多个最小识别单元对待提取信息图表进行切割;
对切割后的每个最小识别单元进行编码,得到区域定位后的待提取信息图表。
为了准确地找到表格中每一块最小识别单元在表格中的位置,首先需要对待提取信息图表进行切割,切割的依据为得出的每个最小识别单元,即每一个最小识别单元切割成一个图片,然后对切割出来的图片进行编号。示例性地,参见图2和图6,可将一个最小识别单元的图片编码记为N(p,q),(p和q均为大于等于1的正整数)。p表示最小识别单元所在的行数,q表示最小识别单元所在行的列数。因此,从待提取信息图表中提取的关键信息可表示为提取一串序列N(p,q)所对应的信息。
通过使用本实施例提供的图表信息提取方法,不仅能够高效、灵活、准确地提取出图像表格中的关键信息,还提高了电网安全监察人员对相关图像表格审查的工作效率。
本申请实施例还提供了一种图表信息提取装置,该图表信息提取装置用于执行本申请上述实施例所提供的图表信息提取方法,以下对本申请实施例提供的图表信息提取装置做介绍。
图8是本申请实施例提供的一种图表信息提取装置的结构图,如图8所示,该图表信息提取装置主要包括:预处理单元81,框线提取单元82,交点确定单元83,区域确定单元84和文本识别单元85,其中:预处理单元81,设置为获取待提取信息图表,并对待提取信息图表进行预处理;框线提取单元82,设置为基于开闭运算原理提取预处理后的待提取信息图表的横向框线和纵向框线;交点确定单元83,设置为提取出的横向框线和纵向框线做交集运算,得到待提取信息图表的交点;区域确定单元84,设置为基于待提取信息图表的交点、横向框线和纵向框线确定待提取信息图表的多个最小识别单元;文本识别单元85,设置为对待提取信息图表中的每个最小识别单元进行文本识别,得到待提取信息图表的图表信息。
通过使用本申请实施例提供的图表信息提取装置,不仅能够高效、灵活、准确地提取出图像表格中的关键信息,还提高了电网安全监察人员对相关图像表格审查的工作效率。
可选地,区域确定单元84,包括:横向检测子单元,设置为检测待提取信息图表的交点的横向是否存在虚假交点,若待提取信息图表的交点的横向存在虚假交点,则过滤掉检测到的虚假交点,得到目标横向交点;纵向检测子单元,设置为检测目标横向交点的纵向是否存在虚假交点,若目标横向交点的纵向存在虚假交点,则将检测到的虚假交点过滤掉,得到目标交点;确定子单元,设置为基于目标交点、横向框线和纵向框线确定待提取信息图表的多个最小识别单元。
可选地,横向检测子单元是设置为:沿横向从左至右依次检测每个待提取信息图表的交点(mi,j)的纵向下方是否存在直线,其中,交点(mi,j)为第i行第j列交点,1≤i≤n1,1≤j≤n2,n1为所述信息图表的总行数,n2为所述信息图表的总列数;若待提取信息图表的交点(mi,j)的纵向不存在直线,则待提取信息图表的交点(mi,j)为虚假交点,将虚假交点过滤掉,得到目标横向交点。
可选地,纵向检测子单元是设置为:依次获取每个目标横向交点的水平相邻的目标横向交点的纵向下方的全部交点,得到所述每个目标横向交点的多个对角线点,其中,所述全部交点中的一个交点为所述多个对角线交点中的一个对角线点;检测所述每个目标横向交点与所述每个目标横向交点的每个对角线点之间的连线在横向上的投影处是否存在对应的横向框线;若所述每个目标横向交点与所述每个对角线点之间的连线在横向上的投影处不存在对应的横向框线,则所述每个对角线点为虚假交点,将虚假交点过滤掉,得到目标交点。
可选地,该图表信息提取装置还包括:区域定位单元,设置为利用确定出的多个最小识别单元对待提取信息图表进行区域定位。
可选地,区域定位单元包括:切割子单元,设置为依据确定出的多个最小识别单元对待提取信息图表进行切割;编码子单元,设置为对切割后的每个最小识别单元进行编码,得到区域定位后的待提取信息图表。
可选地,预处理单元81包括:获取子单元,设置为获取待提取信息图表的图像;判断子单元,设置为判断图像是否发生倾斜;校正子单元,设置为若所述图像发生倾斜,则利用基于方向投影的图像倾角检测算法对图像进行校正;二值化子单元,设置为对校正后的图像进行二值化处理,得到预处理后的待提取信息图表。
可选地,框线提取单元82是设置为:利用公式
Figure PCTCN2021070082-appb-000003
提取预处理后的待提取信息图表的横向框线和纵向框线,其中,F为待提取信息图表,Y为框线提取结果,G表示结构元素。
本申请实施例所提供的图表信息提取装置,实现原理和前述方法实施例相同,为简要描述,装置实施例部分未提及之处,可参考前述方法实施例中相应内容。
本申请实施例提供的图表信息提取方法,与上述实施例提供的图表信息提取装置具有相同的技术特征,所以也能解决相同的技术问题。
本申请实施例还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序在由计算机处理器执行时用于执行本申请任意实施例提供的图表信息提取方法。
该图表信息提取方法包括:获取待提取信息图表,并对所述待提取信息图表进行预处理;基于开闭运算原理提取预处理后的所述待提取信息图表的横向 框线和纵向框线;将提取出的所述横向框线和所述纵向框线做交集运算,得到所述待提取信息图表的交点;基于所述待提取信息图表的交点、所述横向框线和所述纵向框线确定所述待提取信息图表的多个最小识别单元;对所述待提取信息图表中的每个所述最小识别单元进行文本识别,得到所述待提取信息图表的图表信息。
本申请实施例所提供的一种计算机可读存储介质,其计算机可读存储介质上存储的计算机程序由计算机处理器执行时不限于如上所述的方法操作,还可以执行本申请任意实施例所提供的图表信息提取方法中的相关操作。
通过以上关于实施方式的描述,所属领域的技术人员可以了解到,本申请可借助软件及通用硬件来实现,当然也可以通过硬件实现。基于这样的理解,本申请的技术方案可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括多个指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请的多个实施例所述的方法。
上述图表信息提取装置的实施例中,所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的名称也只是为了便于相互区分,并不用于限制本申请的保护范围。
在本申请实施例的描述中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普 通技术人员而言,可以根据情况理解上述术语在本申请中的含义。

Claims (10)

  1. 一种图表信息提取方法,包括:
    获取待提取信息图表,并对所述待提取信息图表进行预处理;
    基于开闭运算原理提取预处理后的所述待提取信息图表的横向框线和纵向框线;
    将提取出的所述横向框线和所述纵向框线做交集运算,得到所述待提取信息图表的交点;
    基于所述待提取信息图表的交点、所述横向框线和所述纵向框线确定所述待提取信息图表的多个最小识别单元;
    对所述待提取信息图表中的每个最小识别单元进行文本识别,得到所述待提取信息图表的图表信息。
  2. 根据权利要求1所述的图表信息提取方法,其中,所述基于所述待提取信息图表的交点、所述横向框线和所述纵向框线确定所述待提取信息图表的多个最小识别单元,包括:
    检测所述待提取信息图表的交点的横向是否存在虚假交点,响应于所述待提取信息图表的交点的横向存在虚假交点的检测结果,过滤掉检测到的虚假交点,得到目标横向交点;
    检测所述目标横向交点的纵向是否存在虚假交点,响应于所述目标横向交点的纵向存在虚假交点的检测结果,将检测到的虚假交点过滤掉,得到目标交点;
    基于所述目标交点、所述横向框线和所述纵向框线确定所述待提取信息图表的多个最小识别单元。
  3. 根据权利要求2所述的图表信息提取方法,其中,所述检测所述待提取信息图表的交点的横向是否存在虚假交点,响应于所述待提取信息图表的交点的横向存在虚假交点的检测结果,过滤掉检测到的虚假交点,得到目标横向交点,包括:
    沿横向从左至右依次检测每个待提取信息图表的交点(mi,j)的纵向下方是否存在直线,其中,交点(mi,j)为第i行第j列交点,1≤i≤n1,1≤j≤n2,n1为所述信息图表的总行数,n2为所述信息图表的总列数;
    响应于所述待提取信息图表的交点(mi,j)的纵向不存在直线的检测结果,所述待提取信息图表的交点(mi,j)为所述虚假交点,将所述虚假交点过滤掉,得到所述目标横向交点。
  4. 根据权利要求2所述的图表信息提取方法,其中,所述检测所述目标横向交点的纵向是否存在虚假交点,响应于所述目标横向交点的纵向存在虚假交点的检测结果,将检测到的虚假交点过滤掉,得到目标交点,包括:
    依次获取每个目标横向交点的水平相邻的目标横向交点的纵向下方的全部交点,得到所述每个目标横向交点的多个对角线点,其中,所述全部交点中的一个交点为所述多个对角线交点中的一个对角线点;
    检测所述每个目标横向交点与所述每个目标横向交点的每个对角线点之间的连线在横向上的投影处是否存在对应的横向框线;
    响应于所述每个目标横向交点与所述每个对角线点之间的连线在横向上的投影处不存在对应的横向框线的检测结果,所述每个对角线点为所述虚假交点,将所述虚假交点过滤掉,得到所述目标交点。
  5. 根据权利要求1所述的图表信息提取方法,在对所述待提取信息图表中的每个最小识别单元进行文本识别之前,还包括:
    利用确定出的所述多个最小识别单元对所述待提取信息图表进行区域定位;
    对区域定位后的所述待提取信息图表中的每个最小识别单元进行文本识别。
  6. 根据权利要求5所述的图表信息提取方法,其中,所述利用确定出的所述多个最小识别单元对所述待提取信息图表进行区域定位,包括:
    依据确定出的所述多个最小识别单元对所述待提取信息图表进行切割;
    对切割后的每个最小识别单元进行编码,得到区域定位后的所述待提取信息图表。
  7. 根据权利要求1所述的图表信息提取方法,其中,所述获取待提取信息图表,并对所述待提取信息图表进行预处理,包括:
    获取所述待提取信息图表的图像;
    判断所述图像是否发生倾斜;
    响应于所述图像发生倾斜的判断结果,利用基于方向投影的图像倾角检测算法对所述图像进行校正;
    对校正后的所述图像进行二值化处理,得到预处理后的所述待提取信息图表。
  8. 根据权利要求1所述的图表信息提取方法,其中,所述基于开闭运算原理提取预处理后的所述待提取信息图表的横向框线和纵向框线,包括:
    利用公式Y=(FοG)·G提取预处理后的所述待提取信息图表的横向框线和纵向框线,其中,F为所述待提取信息图表,Y为框线提取结果,G表示结构元素。
  9. 一种图表信息提取装置,包括:
    预处理单元,设置为获取待提取信息图表,并对所述待提取信息图表进行预处理;
    框线提取单元,设置为基于开闭运算原理提取预处理后的所述待提取信息图表的横向框线和纵向框线;
    交点确定单元,设置为提取出的所述横向框线和所述纵向框线做交集运算,得到所述待提取信息图表的交点;
    区域确定单元,设置为基于所述待提取信息图表的交点、所述横向框线和所述纵向框线确定所述待提取信息图表的多个最小识别单元;
    文本识别单元,设置为对所述待提取信息图表中的每个最小识别单元进行文本识别,得到所述待提取信息图表的图表信息。
  10. 一种计算机可读存储介质,存储有计算机程序,该程序被处理器执行时实现如权利要求1-8任一所述的图表信息提取方法。
PCT/CN2021/070082 2020-08-21 2021-01-04 图表信息提取方法、装置和存储介质 WO2022036997A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010851106.8 2020-08-21
CN202010851106.8A CN111985506A (zh) 2020-08-21 2020-08-21 一种图表信息提取方法、装置和存储介质

Publications (1)

Publication Number Publication Date
WO2022036997A1 true WO2022036997A1 (zh) 2022-02-24

Family

ID=73442531

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/070082 WO2022036997A1 (zh) 2020-08-21 2021-01-04 图表信息提取方法、装置和存储介质

Country Status (2)

Country Link
CN (1) CN111985506A (zh)
WO (1) WO2022036997A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452702A (zh) * 2023-06-15 2023-07-18 深圳大学 信息图表快速设计方法、装置、计算机设备和存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985506A (zh) * 2020-08-21 2020-11-24 广东电网有限责任公司清远供电局 一种图表信息提取方法、装置和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008809A (zh) * 2019-01-04 2019-07-12 阿里巴巴集团控股有限公司 表格数据的获取方法、装置和服务器
CN110163198A (zh) * 2018-09-27 2019-08-23 腾讯科技(深圳)有限公司 一种表格识别重建方法、装置和存储介质
CN110502985A (zh) * 2019-07-11 2019-11-26 新华三大数据技术有限公司 表格识别方法、装置及表格识别设备
CN111368638A (zh) * 2020-02-10 2020-07-03 深圳追一科技有限公司 电子表格的创建方法、装置、计算机设备和存储介质
CN111985506A (zh) * 2020-08-21 2020-11-24 广东电网有限责任公司清远供电局 一种图表信息提取方法、装置和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163198A (zh) * 2018-09-27 2019-08-23 腾讯科技(深圳)有限公司 一种表格识别重建方法、装置和存储介质
CN110008809A (zh) * 2019-01-04 2019-07-12 阿里巴巴集团控股有限公司 表格数据的获取方法、装置和服务器
CN110502985A (zh) * 2019-07-11 2019-11-26 新华三大数据技术有限公司 表格识别方法、装置及表格识别设备
CN111368638A (zh) * 2020-02-10 2020-07-03 深圳追一科技有限公司 电子表格的创建方法、装置、计算机设备和存储介质
CN111985506A (zh) * 2020-08-21 2020-11-24 广东电网有限责任公司清远供电局 一种图表信息提取方法、装置和存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452702A (zh) * 2023-06-15 2023-07-18 深圳大学 信息图表快速设计方法、装置、计算机设备和存储介质
CN116452702B (zh) * 2023-06-15 2023-08-18 深圳大学 信息图表快速设计方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN111985506A (zh) 2020-11-24

Similar Documents

Publication Publication Date Title
US10699146B2 (en) Mobile document detection and orientation based on reference object characteristics
CN109635268B (zh) Pdf文件中表格信息的提取方法
WO2022036997A1 (zh) 图表信息提取方法、装置和存储介质
US11256902B2 (en) People-credentials comparison authentication method, system and camera
US9014459B2 (en) Identification method for valuable file and identification device thereof
WO2021012382A1 (zh) 配置聊天机器人的方法、装置、计算机设备和存储介质
CN108563990B (zh) 一种基于cis图像采集系统的证照鉴伪方法及系统
CN110008809A (zh) 表格数据的获取方法、装置和服务器
CN103577818A (zh) 一种图像文字识别的方法和装置
CN110378351B (zh) 印章鉴别方法及装置
WO2014026483A1 (zh) 一种字符识别方法及相关装置
CN103679147A (zh) 手机型号的识别方法与装置
CN108830267A (zh) 一种基于图像识别进行阅卷的方法及系统
JP2011192274A (ja) フォームテンプレートを定義する方法及び装置
CN111368511A (zh) Pdf文档解析方法及装置
CN112906695B (zh) 适配多类ocr识别接口的表格识别方法及相关设备
CN110378328B (zh) 一种证件图像处理方法及装置
CN106846354B (zh) 一种基于图像分割和随机hough变换的架上图书清点方法
CN115240213A (zh) 表格图像识别方法、装置、电子设备及存储介质
CN104408403B (zh) 一种二次录入不一致的仲裁方法及装置
CN108665495A (zh) 图像处理方法及装置、移动终端
CN115082941A (zh) 表格文档影像的表格信息获取方法及装置
CN109635729B (zh) 一种表格识别方法及终端
CN117037198A (zh) 一种银行对账单的识别方法
CN112036232B (zh) 一种图像表格结构识别方法、系统、终端以及存储介质

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21857103

Country of ref document: EP

Kind code of ref document: A1