CN112906532A - Image processing method and apparatus, electronic device, and storage medium - Google Patents

Image processing method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
CN112906532A
CN112906532A CN202110169261.6A CN202110169261A CN112906532A CN 112906532 A CN112906532 A CN 112906532A CN 202110169261 A CN202110169261 A CN 202110169261A CN 112906532 A CN112906532 A CN 112906532A
Authority
CN
China
Prior art keywords
region
line
coordinate system
frames
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110169261.6A
Other languages
Chinese (zh)
Other versions
CN112906532B (en
Inventor
徐青松
李青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Ruisheng Software Co Ltd
Original Assignee
Hangzhou Ruisheng Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Ruisheng Software Co Ltd filed Critical Hangzhou Ruisheng Software Co Ltd
Priority to CN202110169261.6A priority Critical patent/CN112906532B/en
Publication of CN112906532A publication Critical patent/CN112906532A/en
Priority to PCT/CN2022/073988 priority patent/WO2022166707A1/en
Application granted granted Critical
Publication of CN112906532B publication Critical patent/CN112906532B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

An image processing method, an image processing apparatus, an electronic device, and a non-transitory computer-readable storage medium. The image processing method comprises the following steps: acquiring an input image, wherein the input image comprises a table area, and the table area comprises a plurality of object areas; performing area identification processing on an input image to obtain a plurality of object area frames corresponding to a plurality of object areas one by one and a table area frame corresponding to a table area; performing table line detection processing on the input image to judge whether a table area comprises a wired table or not; and in response to the table area not including the wired table: aligning the object region frames to obtain a plurality of region labeling frames which are in one-to-one correspondence with the object region frames; determining at least one dividing line based on the object region frames, and dividing the region labeling frames through the at least one dividing line to form a plurality of cells; and generating a cell table corresponding to the table region based on the plurality of cells.

Description

Image processing method and apparatus, electronic device, and storage medium
Technical Field
Embodiments of the present disclosure relate to an image processing method, an image processing apparatus, an electronic device, and a non-transitory computer-readable storage medium.
Background
Currently, users often take pictures of objects (for example, the objects may be business cards, test papers, laboratory test papers, documents, etc.), and want to process the pictures accordingly to obtain information about the objects in the pictures. According to different actual requirements, in some cases, a user wants to present related information of an object obtained based on an image in a table form, so that the obtained information is more intuitive and normative. Therefore, when processing an image to obtain information related to an object in the image, it is also necessary to draw a table based on the size, position, and the like of an area occupied by the information related to the object in the image, so as to meet the requirement that information desired by a user can be presented in a table form.
Disclosure of Invention
At least one embodiment of the present disclosure provides an image processing method, including: acquiring an input image, wherein the input image comprises a table area, the table area comprises a plurality of object areas, and each object area in the plurality of object areas comprises at least one object; performing region identification processing on the input image to obtain a plurality of object region frames corresponding to the object regions one by one and a table region frame corresponding to the table region; performing table line detection processing on the input image to judge whether the table area comprises a wired table or not; and in response to the table area not including a wired table: aligning the object region frames to obtain a plurality of region labeling frames which are in one-to-one correspondence with the object region frames; determining at least one dividing line based on the object region frames, and dividing the region labeling frames through the at least one dividing line to form a plurality of cells; and generating a cell table corresponding to the table region based on the plurality of cells.
At least one embodiment of the present disclosure also provides an image processing apparatus including: the device comprises an image acquisition module, an area identification processing module, a table line detection processing module and a unit table generation module; an image acquisition module configured to acquire an input image, the input image comprising a table region comprising a plurality of object regions, each of the plurality of object regions comprising at least one object; the region identification processing module is configured to perform region identification processing on the input image to obtain a plurality of object region frames corresponding to the object regions one by one and a table region frame corresponding to the table region; the table line detection processing module is configured to perform table line detection processing on the input image to determine whether the table area includes a wired table; the cell table generation module is configured to, in response to the table region not including a wired table: aligning the object region frames to obtain a plurality of region labeling frames which are in one-to-one correspondence with the object region frames; determining at least one dividing line based on the object region frames, and dividing the region labeling frames through the at least one dividing line to form a plurality of cells; based on the plurality of cells, a cell table corresponding to the table region is generated.
At least one embodiment of the present disclosure also provides an electronic device comprising a processor and a memory, the memory for storing computer-readable instructions; the processor is configured to implement the steps of the method according to any of the above embodiments when executing the computer readable instructions.
At least one embodiment of the present disclosure also provides a non-transitory computer-readable storage medium for non-transitory storage of computer-readable instructions that, when executed by a processor, implement the steps of the method of any of the above embodiments.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure and are not limiting to the present disclosure.
Fig. 1 is a schematic flowchart of an image processing method according to at least one embodiment of the present disclosure;
fig. 2A is a schematic diagram of an input image according to at least one embodiment of the present disclosure;
FIGS. 2B-2H are schematic diagrams illustrating the process of image processing the input image shown in FIG. 2A;
fig. 3A is a schematic diagram of another input image provided by at least one embodiment of the present disclosure;
3B-3I are schematic diagrams of the process of image processing the input image shown in FIG. 3A;
fig. 4 is a flowchart illustrating a step S30 in an image processing method according to at least one embodiment of the disclosure;
fig. 5 is a flowchart illustrating a part of the operation of step S302 in an image processing method according to at least one embodiment of the disclosure;
fig. 6 is a flowchart illustrating a step S3020 in an image processing method according to at least one embodiment of the disclosure;
fig. 7 is a schematic flowchart of another image processing method according to at least one embodiment of the present disclosure;
fig. 8 is a flowchart illustrating a step S401 in an image processing method according to at least one embodiment of the disclosure;
fig. 9 is a schematic flowchart of step S4012 in an image processing method according to at least one embodiment of the present disclosure;
fig. 10 is a partial flowchart illustrating a step S402 in an image processing method according to at least one embodiment of the disclosure;
fig. 11 is a schematic flowchart of another image processing method according to at least one embodiment of the disclosure;
fig. 12 is a schematic block diagram of an image processing apparatus according to at least one embodiment of the present disclosure;
fig. 13 is a schematic diagram of an electronic device according to at least one embodiment of the present disclosure; and
fig. 14 is a schematic diagram of a non-transitory computer-readable storage medium according to at least one embodiment of the disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.
Unless otherwise defined, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. Also, the use of the terms "a," "an," or "the" and similar referents do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
At least one embodiment of the present disclosure provides an image processing method, an image processing apparatus, an electronic device, and a non-transitory computer-readable storage medium. The image processing method comprises the following steps: acquiring an input image, wherein the input image comprises a table area, the table area comprises a plurality of object areas, and each object area in the plurality of object areas comprises at least one object; performing area identification processing on an input image to obtain a plurality of object area frames corresponding to a plurality of object areas one by one and a table area frame corresponding to a table area; performing table line detection processing on the input image to judge whether a table area comprises a wired table or not; and in response to the table area not including the wired table: aligning the object region frames to obtain a plurality of region labeling frames which are in one-to-one correspondence with the object region frames; determining at least one dividing line based on the object region frames, and dividing the region labeling frames through the at least one dividing line to form a plurality of cells; and generating a cell table corresponding to the table region based on the plurality of cells.
In the image processing method provided by the embodiment of the present disclosure, when it is determined that the table area of the input image does not include a wired table by performing the table line detection processing on the input image, the corresponding plurality of area labeling frames are obtained by performing the alignment processing on the plurality of object area frames obtained by the identification processing, the dividing line is determined based on the plurality of object area frames, and the plurality of cells are formed by performing the division processing on the plurality of area labeling frames through the dividing line, so that the cell table corresponding to the table area of the input image is generated based on the plurality of cells. Therefore, after the objects in the object area are filled into each cell of the cell table, the object table containing the related information of the objects in the input image can be generated, so that the acquired related information of the objects in the input image can be more intuitively and normatively presented to the user through the form of the cell table.
The image processing method provided by the embodiment of the disclosure can be applied to the image processing device provided by the embodiment of the disclosure, and the image processing device can be configured on an electronic device. The electronic device may be a personal computer, a mobile terminal, and the like, and the mobile terminal may be a hardware device such as a mobile phone and a tablet computer.
Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. It should be noted that the present disclosure is not limited to these specific embodiments.
Fig. 1 is a schematic flowchart of an image processing method according to at least one embodiment of the present disclosure.
As shown in fig. 1, an image processing method according to at least one embodiment of the present disclosure includes the following steps S10 to S40.
Step S10: an input image is acquired. For example, the input image includes a table region including a plurality of object regions, each of the plurality of object regions including at least one object.
Step S20: the input image is subjected to region identification processing to obtain a plurality of object region frames corresponding to the plurality of object regions one to one and a table region frame corresponding to the table region.
Step S30: form line detection processing is performed on the input image to determine whether the form area includes a wired form.
Step S40: in response to the table area not including the wired table, the following steps S401 to S403 are performed.
Step S401: and aligning the object area frames to obtain a plurality of area labeling frames which are in one-to-one correspondence with the object area frames.
Step S402: and determining at least one dividing line based on the object region frames, and dividing the region labeling frames through the at least one dividing line to form a plurality of cells.
Step S403: based on the plurality of cells, a cell table corresponding to the table area is generated.
For step S10, the input image may be an image obtained by photographing an object by a user, for example, a business card, a test paper, a laboratory sheet, a document, an invoice, etc., and accordingly, the object in the input image may be characters (chinese and/or foreign language; print and/or handwritten characters), data, graphics, symbols, etc., contained in the object.
For example, the shape of the input image may be a regular shape such as a rectangle or a square, or may be an irregular shape, and the shape, size, and the like of the input image may be set by the user according to the actual situation. For example, the input image may be an image captured by a digital camera, a mobile phone, or the like, and may be, for example, an original image directly captured by the digital camera, the mobile phone, or the like, or an image obtained by preprocessing the original image. For example, the input image may be a grayscale image or a color image.
For example, fig. 2A and 3A are examples of two kinds of input images, respectively. The input image shown in fig. 2A includes a table area 201, and the table area 201 includes a plurality of object areas 202, and each object area 202 includes at least one text or data. The input image shown in fig. 3A includes a table area 301, and the table area 301 includes a plurality of object areas 302, and each object area 302 includes at least one text or data. For example, in the object areas 202 and 302, letters and data are arranged in a row in the horizontal direction.
It should be noted that, in the examples shown in fig. 2A and fig. 3A, the texts and data contained in the object region are arranged in a row along the horizontal direction, while in some other examples of the present disclosure, the input image may also include a case where the texts or data contained in the object region are arranged in a row along the vertical direction or arranged in a plurality of rows along the horizontal direction and the vertical direction, respectively, which is not limited in this regard. In the examples shown in fig. 2A and fig. 3A, the object included in the object region is text or data, but in some other examples of the present disclosure, the object included in the object region may further include a graphic, a symbol, and the like, which is not limited by the embodiment of the present disclosure.
For example, in the examples shown in fig. 2A and 3A, the shapes of the table regions 201 and 301 and the object regions 202 and 302 to be recognized in the input image are rectangles, while in some other examples of the present disclosure, the table regions and the object regions to be recognized in the input image may also be other regular shapes such as diamonds and squares, or may also be irregular shapes, etc., as long as it is satisfied that the table regions can cover all the objects to be recognized and each object region can cover the corresponding object to be recognized.
For example, for the input image shown in fig. 2A, the text "2019 annual report" located in the upper right corner may be divided in the table area 201, that is, the table area 201 includes an area occupied by the text "2019 annual report"; alternatively, the text "year of 2019 report" may also be divided outside the table area 201, that is, the table area 201 may not include an area occupied by the text "year of 2019 report", and the embodiment of the present disclosure is not limited thereto. For the input image shown in fig. 3A, the text "blood routine test" at the top may be divided outside the table area 301, i.e., the table area 301 does not include the area occupied by the text "blood routine test"; alternatively, the text "blood routine test" may also be divided in the table area 301, that is, the table area 301 may also include an area occupied by the text "blood routine test", which is not limited by the embodiment of the disclosure.
For example, in some embodiments of the present disclosure, after the input image is acquired, the input image may be preprocessed and then the operations in the subsequent steps may be performed, so as to improve the accuracy and reliability of the operations in the subsequent steps. For example, the input image may be subjected to a correction process, which may include, for example, performing a global correction and a local correction on the input image, the global correction may correct, for example, global offset of a text line, and since there may be details unadjusted after the global correction, some supplementary corrections may be performed for details ignored in the global correction process by the local correction, thereby reducing or preventing details lost due to the global correction, and improving accuracy and reliability of the obtained correction process result.
For step S20, for example, the table region and the plurality of object regions in the input image may be identified by a region identification model, which may be implemented using machine learning techniques and run on a general purpose computing device or a special purpose computing device, for example. The region identification model may be, for example, a neural network model trained in advance. For example, the region identification model may be implemented using an applicable neural network such as a DEEP convolutional neural network (DEEP-CNN).
For example, the specific shapes of the table area frame and the object area frame may be determined according to the specific shapes, sizes, and the like of the table area and the object area, respectively, the table area frame surrounds the table area and can contain all the objects located in the table area, and the object area frame surrounds the corresponding object area and can contain all the objects located in the object area. For example, the distance between the border of the object region box and the object located at the edge of the object region may approach 0 to make the shape of the object region box closer to the actual shape of the object region. For example, the distance between the border of the table area box and the object located at the edge of the table area may be adaptively increased compared to the object area box, so that the table area box may include all the objects therein.
For example, taking the region identification processing performed on the input image shown in fig. 2A and 3A as an example, as shown in fig. 2B, after the region identification processing is performed on the input image shown in fig. 2A, a table region frame 210 corresponding to the table region 201 and a plurality of object region frames 220 corresponding to the plurality of object regions 202 one-to-one can be obtained; as shown in fig. 3B, after the region identification processing is performed on the input image shown in fig. 3A, a table region frame 310 corresponding to the table region 301 and a plurality of object region frames 320 corresponding to the plurality of object regions 302 one to one can be obtained.
For example, in order to facilitate subsequent operations, in the embodiments provided in the present disclosure, the shape of the object area frame may be set to a regular shape such as a rectangle, a square, or the like, so as to facilitate subsequent alignment processing operations on a plurality of object area frames in response to the table area not including the wired table.
Note that, in the embodiment of the present disclosure, the "shape of the table area" and the "shape of the object area" indicate the general shape of the table area or the object area, and similarly, the "shape of the table area frame" and the "shape of the object area frame" indicate the general shape of the table area frame or the object area frame.
With step S30, for example, a table line detection process may be performed on the input image based on an edge detection algorithm to identify a table line segment in the input image, and it may be determined whether a wired table is included in a table region of the input image according to the identification result of the table line segment.
Fig. 4 is a flowchart illustrating a step S30 in an image processing method according to at least one embodiment of the present disclosure.
For example, as shown in fig. 4, step S30 may include the following steps S301 to S302.
Step S301: in a case where the form line detection processing is performed on the input image and it is detected that the input image does not have the form line segment, it is determined that the form area does not include the wired form.
Step S302: in the case where the input image is subjected to the table line detection processing and one or more table line segments are obtained, it is determined whether the table area includes a wired table based on the one or more table line segments.
With step S301, in the case where it is determined from the table line detection processing result that there is no table line segment in the input image, it may be determined that the wired table is not included in the table area of the input image, whereby the operation of step S40 is performed in response to the table area of the input image not including the wired table.
With step S302, in the case where at least one table line segment is found in the input image according to the table line detection processing result, it is necessary to further determine whether or not a wired table is included in the table area of the input image based on the obtained table line segment.
The following describes a specific operation procedure in step S302, taking the table line detection process performed on the input image shown in fig. 2A as an example.
Fig. 5 is a flowchart illustrating a part of operations of step S302 in an image processing method according to at least one embodiment of the present disclosure.
For example, as shown in fig. 5, performing the table line detection process on the input image to obtain one or more table line segments in step S302 may include the following steps S3011 to S3016.
Step S3011: and carrying out line segment detection on the input image to obtain a plurality of detection line segments.
Step S3012: and merging the detection line segments to redraw the detection line segments to obtain a plurality of first intermediate table line segments.
Step S3013: and respectively performing expansion processing on the plurality of first intermediate table line segments to obtain a plurality of second intermediate table line segments.
Step S3014: and deleting a second intermediate table line segment in any object area frame in the plurality of object area frames in the plurality of second intermediate table line segments, and taking the rest second intermediate table line segments in the plurality of second intermediate table line segments as a plurality of third intermediate table line segments.
Step S3015: and merging the third intermediate table line segments to obtain a fourth intermediate table line segment.
Step S3016: and respectively performing expansion processing on the plurality of fourth intermediate table line segments to obtain one or more fifth intermediate table line segments, and taking the one or more fifth intermediate table line segments as one or more table line segments.
For step S3011, for example, taking the input image shown in fig. 2A as an example, as shown in fig. 2C, after the line segment detection is performed on the input image shown in fig. 2A, a plurality of detected line segments L0 may be obtained, so that operations such as merge processing, inflation processing, and the like in subsequent steps may be performed based on the detected plurality of detected line segments L0 to obtain corresponding table line segments, thereby determining whether the table area 201 of the input image shown in fig. 2A includes a wired table based on the obtained table line segments.
With respect to step S3012, the merging process includes: and for a first segment to be merged and a second segment to be merged, merging the first segment to be merged and the second segment to be merged in response to the fact that the difference between the slope of the first segment to be merged and the slope of the second segment to be merged is smaller than a slope threshold value and the distance between the end point of the first segment to be merged, which is close to the second segment to be merged, and the end point of the second segment to be merged, which is close to the first segment to be merged, is smaller than or equal to a distance threshold value. For example, the first line segment to be merged and the second line segment to be merged are any two of the plurality of detection line segments.
For example, with respect to the plurality of detected line segments L0 detected based on the input image shown in fig. 2A, any two detected line segments L0 of the plurality of detected line segments L0 are taken as the first line segment to be merged and the second line segment to be merged, it is determined whether the any two detected line segments L0 satisfy the condition for performing the merging process, that is, it is determined whether the difference between the slopes of the any two detected line segments L0 is smaller than the slope threshold and the distance between the end points of the any two detected line segments L0 adjacent to each other is equal to or smaller than the distance threshold, and in the case where the above-described condition for the merging process is satisfied, the any two detected line segments L0 are merged to obtain the first intermediate table line segment L1.
For example, the slope threshold may range from 0 ° to 10 °, the first distance threshold may be a numerical value in units of pixels, and the first distance threshold may range from 0 to 10 pixels, thereby improving the accuracy and reliability of the table line segment obtained based on the detection line segment.
For example, taking the detection line segments L11 and L12 located in the region RN1 in fig. 2C as an example, as shown in fig. 2D, the detection line segments L11 and L12 are taken as the first to-be-merged line segment and the second to-be-merged line segment, respectively, and the difference between the slope of the first to-be-merged line segment L11 and the slope of the second to-be-merged line segment L12 approaches zero, that is, the difference between the slope of the first to-be-merged line segment L11 and the slope of the second to-be-merged line segment L12 may be determined to be smaller than a slope threshold, and the distance between the end point D11 of the first to-be-merged line segment L11 close to the second to-be-merged line segment L12 and the end point D12 of the second to-be-merged line segment L12 close to the first to-be-merged line segment L11 is determined to be equal to or smaller than a distance threshold, so that the first to-intermediate table. Thus, after the merging process is performed on any two detected line segments L0 in fig. 2C, a plurality of first intermediate table line segments L1 can be obtained accordingly.
For step S3013, the obtained plurality of first intermediate table line segments L1 are respectively subjected to inflation processing to obtain a plurality of second intermediate table line segments L2. For example, the width of the expanded second intermediate table line segment L2 may be 1-4 times the width of the corresponding first intermediate table line segment L1, so as to facilitate the merging process operation in the subsequent steps.
With step S3014, the second intermediate table line segment L2 located within any of the object region boxes 220 is deleted, and the remaining second intermediate table line segments L2 are treated as a plurality of third intermediate table line segments L3. For example, in step S3014, if a second intermediate table line segment L2 is all located in an object region box 220, that is, does not pass through the object region box 220, the second intermediate table line segment L2 is deleted, so that the detected line segment shown in fig. 2C, which is derived from, for example, text or data, can be deleted, thereby further improving the accuracy and reliability of the subsequently derived table line segment. For example, after step S3014, a plurality of third intermediate table line segments L3 shown in fig. 2E may be obtained.
For step S3015 and step S3016, after obtaining the plurality of third intermediate table line segments L3 shown in fig. 2E, the merging processing procedure and the expanding processing procedure in step S3012 and step S3013 are repeated based on the plurality of third intermediate table line segments L3, so as to obtain a plurality of fifth intermediate table line segments shown in fig. 2F, and the fifth intermediate table line segments obtained in fig. 2F are used as the table line segments TL, so as to improve the accuracy and reliability of the obtained table line segments TL, and thus improve the accuracy and reliability of the determination process of whether the table area of the input image includes the wired table based on the table line segments TL.
For example, the merging processing in step S3015 includes: and for a first segment to be merged and a second segment to be merged, merging the first segment to be merged and the second segment to be merged in response to the fact that the difference between the slope of the first segment to be merged and the slope of the second segment to be merged is smaller than a slope threshold value and the distance between the end point of the first segment to be merged, which is close to the second segment to be merged, and the end point of the second segment to be merged, which is close to the first segment to be merged, is smaller than or equal to a distance threshold value. For example, the first line segment to be merged and the second line segment to be merged are any two third intermediate table line segments of the plurality of third intermediate table line segments.
For the operation procedures of step S3015 and step S3016, reference may be made to the description of the operation procedures of step S3012 and step S3013, and details are not described here.
Thus, it is possible to determine whether to directly perform step S40 shown in fig. 1 or whether the table area of the input image includes a wired table needs to be further determined based on the resulting table line segment according to the result of the table line detection processing of the input image.
For example, after performing the table line detection processing on the input image and obtaining at least one table line segment, the determination of whether the table area includes the wired table based on one or more table line segments in the above step S302 may include the following steps S3019 to S3022.
In response to obtaining a table segment:
step S3019: it is determined that the table area does not include wired tables.
In response to obtaining the plurality of table segments:
step S3020: intersections between the plurality of table segments are determined.
Step S3021: determining that the table area includes the wired table in response to the number of the intersections being greater than or equal to the second reference value.
Step S3022: in response to the number of intersections being less than the second reference value, it is determined that the table area does not include the wired table.
With step S3019, in a case where the form line detection processing is performed on the input image and it is detected that the input image has only one form line segment, since one form line segment cannot form a complete form structure, the form area of the input image may be determined not to include the wired form, and the operation of step S40 shown in fig. 1 is performed.
With respect to steps S3020 to S3022, in a case where the form line detection processing is performed on the input image and the input image is detected to have a plurality of form line segments, it is necessary to further determine whether a complete form structure can be formed between the plurality of form line segments based on the plurality of form line segments to determine whether the form area of the input image includes the wired form. For example, whether the table area of the input image includes the wired table is further determined by determining whether a complete table structure can be formed between a plurality of table line segments based on the number of intersections determined by the plurality of table line segments in steps S3020 to S3022.
For example, as shown in fig. 6, the intersection between the plurality of table line segments in step S3020 may be determined by the following steps S3020A to S3020D.
Step S3020A: the plurality of table segments are divided into a plurality of first table segments and a plurality of second table segments.
Step S3020B: the plurality of first table line segments are divided into a plurality of first line segment rows and the line number of the first line segment row to which each of the plurality of first table line segments belongs is marked. For example, each first line segment row includes at least one first table line segment arranged in a third direction.
Step S3020C: the plurality of second table line segments are divided into a plurality of second line segment columns and the column number of the second line segment column to which each of the plurality of second table line segments belongs is marked. For example, each second line segment column includes at least one second table line segment arranged along the fourth direction.
Step S3030D: a plurality of intersections between the plurality of first table line segments and the plurality of second table line segments are identified, and coordinates of the plurality of intersections are determined. For example, the coordinates of any one of the plurality of intersection points include a row number corresponding to a first table segment and a column number corresponding to a second table segment that intersect to form any one of the intersection points.
For example, in step S3020A, each first table line segment is at a first angle range with respect to the third direction, each first table line segment is at a second angle range with respect to the fourth direction, each second table line segment is at a second angle range with respect to the third direction, each second table line segment is at a first angle range with respect to the fourth direction, and the third direction and the fourth direction are perpendicular to each other.
For example, taking the plurality of table line segments TL shown in fig. 2F as an example, as shown in fig. 2G, the third direction R3 may be the horizontal direction shown in fig. 2G, and the fourth direction R4 may be the vertical direction shown in fig. 2G. For example, the first angle range may be 0 ° to 45 °, the second angle range may be 45 ° to 90 °, and thus the plurality of table line segments TL may be divided into a plurality of first table line segments TL1 and a plurality of second table line segments TL 2. Further, the plurality of first table line segments TL1 are divided into a plurality of first line segment rows in the fourth direction R4 and the line numbers of the first line segment rows to which each first table line segment TL1 belongs are marked, for example, the plurality of first line segment rows include the 1 st line segment row to the 43 th line segment row as shown in fig. 2G; the plurality of second table line segments TL2 are divided into a plurality of second line segment columns along the third direction R3 and the column numbers of the second line segment columns to which each second table line segment TL2 belongs are marked, for example, the plurality of second line segment columns include the 1 st line segment column to the 5 th line segment column as shown in fig. 2G. Thus, the coordinates of each intersection N1 shown in fig. 2G can be obtained based on the row number corresponding to the first table line segment TL1 and the column number corresponding to the second table line segment TL2 that constitute each intersection N1.
For example, after the coordinates of each intersection N1 are determined, based on the number of intersections N1, steps S3021 and S3022 are performed to determine whether a wired table is included in the table area of the input image.
For example, the second reference value in steps S3021 and S3022 may be the larger of the number of the plurality of first line segment rows and the number of the plurality of second line segment columns. For example, taking the case shown in fig. 2G as an example, if the number of the first line segment rows is 43 and the number of the second line segment columns is 5, the second reference value is 43. Thus, it is possible to determine whether or not the table area of the input image includes the wired table, based on the magnitude relationship between the number of intersections and the second reference value.
For example, taking the case shown in fig. 2G as an example, the number of the intersection points N1 is 215, which is larger than the second reference value 43, and thus it can be determined that the table area 201 of the input image shown in fig. 2A includes a wired table.
For example, after the input image shown in fig. 3A is processed using the above-described step S30, it is determined that the table area 301 of the input image shown in fig. 3A does not include a wired table. Therefore, in response to the table area 301 of the input image shown in fig. 3A not including the wired table, the above-described step S40 is performed to generate the cell table corresponding to the table area 301 of the input image shown in fig. 3A; in response to the table area 201 of the input image shown in fig. 2A including the wired table, the following step S50 is performed to generate a cell table corresponding to the table area 201 of the input image shown in fig. 2A.
Fig. 7 is a schematic flowchart of another image processing method according to at least one embodiment of the present disclosure. It should be noted that, except for step S50, steps S10 to S30 shown in fig. 7 are substantially the same as steps S10 to S30 shown in fig. 1, and repeated description is omitted.
For example, as shown in fig. 7, in response to the table area including the wired table, the image processing method provided by the embodiment of the present disclosure further includes the following step S50.
Step S50: a cell table corresponding to the table area is generated based on the plurality of table line segments.
For example, taking the input image shown in fig. 2A as an example, after determining that the table area 201 of the input image shown in fig. 2A includes a wired table through step S30, a corresponding cell table may be generated based on the plurality of table line segments TL1 and TL2 shown in fig. 2G.
For example, in some embodiments of the present disclosure, step S50 may include the following step S501.
Step S501: based on the plurality of intersections, each cell in the cell table is determined. For example, the vertices of each cell in the cell table are made up of at least three of the plurality of intersection points.
For example, the obtained intersection points are used as the vertices of the cells in the cell table, and the cells in the cell table are determined based on the coordinates of the intersection points. For example, the cells may be in the form of rectangles, squares, etc., so that one cell can be determined by three or more intersections, and a table structure is formed by a plurality of cells, so as to generate a corresponding cell table.
For example, in some embodiments of the present disclosure, step S501 may include the following steps S5011 to S5014.
Step S5011: the current intersection point is determined. For example, the current intersection is any one of a plurality of intersections.
Step S5012: and determining a first current table line segment and a second current table line segment corresponding to the current intersection point based on the coordinates of the current intersection point. For example, the first current table line segment is any one of the first table line segments, and the second current table line segment is any one of the second table line segments.
Step S5013: a first intersection point on the first current table line segment adjacent to the current intersection point is determined, and a second intersection point on the second current table line segment adjacent to the current intersection point is determined.
Step S5014: a cell is determined based on the current intersection, the first intersection, and the second intersection.
Thus, by the table line segment where the intersection point is located, a first intersection point and a second intersection point adjacent to the current intersection point, for example, in the horizontal direction and the vertical direction, respectively, can be determined, so that one cell is constructed based on the determined intersection points to generate a cell table presented in a table structure form.
In the case where the table area of the input image does not include the wired table, the above-described step S40 is performed, whereby the cell table corresponding to the table area is generated based on the object area frame recognized in the input image.
Fig. 8 is a flowchart illustrating a step S401 in an image processing method according to at least one embodiment of the present disclosure.
For example, as shown in fig. 8, step S401 includes the following steps S4011 to S4013.
Step S4011: and dividing the table area frame into a plurality of coordinate grid areas arranged in M rows and N columns along the first direction and the second direction by taking the reference value as a coordinate unit to establish a table coordinate system. For example, M rows of grid regions are arranged in a first direction, N columns of grid regions are arranged in a second direction, and M and N are positive integers.
Step S4012: the coordinates of the plurality of object region boxes in the table coordinate system are determined.
Step S4013: and performing expansion processing on the multiple object area frames based on the coordinates of the multiple object area frames in the table coordinate system to obtain multiple area labeling frames.
For example, taking the object region box 320 located in the region RN2 in the input image shown in fig. 3A and 3B as an example, as shown in fig. 3C, after dividing the table region box 310 into a plurality of coordinate table regions 311 arranged in a plurality of rows and a plurality of columns along the first direction R1 and the second direction R2, the corresponding coordinates of each object region box 320 in the table coordinate system are respectively determined, for example, the row number and the column number of the coordinate table region 311 corresponding to each edge of each object region box 320 in the table coordinate system are respectively determined. Thus, based on the coordinates of the plurality of target area frames 320 in the table coordinate system, the plurality of target area frames 320 are expanded to obtain the area labeling frame corresponding to the target area frame 320.
For example, the base reference value in step S4011 may be determined according to an average height of the plurality of object region boxes in the first direction. Therefore, the relative position between the object region frames can be accurately determined based on the generated table coordinate system, and the subsequent alignment processing of the object region frames based on the relative position between the object region frames is facilitated to determine the region labeling frame.
For example, taking as an example that the object included in the object area 302 of the input image shown in fig. 3A and 3B is text or data, the table area frame 310 may be divided into a plurality of coordinate frame areas 311 in the first direction R1 and the second direction R2 with half the text height or data height as a reference value, thereby forming a high-density table coordinate system having a row and column width of half the text height or data height based on the table area frame 310. Thus, the relative position between the object region frames 320 can be determined more accurately based on the generated table coordinate system.
Fig. 9 is a schematic flowchart of step S4012 in an image processing method according to at least one embodiment of the present disclosure. As shown in fig. 9, step S4012 includes the following steps S4012A through S4012C.
Step S4012A: a plurality of slopes of the plurality of object region boxes is determined. For example, the slope of each of the plurality of object region boxes represents the slope of the side of each of the object region boxes extending in the second direction with respect to the second direction.
Step S4012B: and according to the slopes of the object region frames, carrying out correction processing on the input image to obtain a corrected input image.
Step S4012C: based on the corrected input image, coordinates of the plurality of object region frames in the table coordinate system are determined.
Therefore, before determining the coordinates of the object region frames in the table coordinate system, the input image may be subjected to calibration processing according to the slope of the side of each object region frame extending along the second direction R2 with respect to the second direction R2, for example, adjusting the rotation angle of the input image in the plane formed by the first direction R1 and the second direction R2, thereby implementing global correction on the input image, improving, for example, global offset of text lines in the input image, and further improving accuracy and reliability of the determined coordinates of the object region frames in the table coordinate system, so that the relative positions of the object region frames may be determined more accurately based on the table coordinate system.
For example, in some examples, the above-described step S4012B may include the following steps S4012D and S4012E.
Step S4012D: an average value of the slopes is calculated from the slopes of the object region frames.
Step S4012E: the input image is rotated in a plane made up of the first direction and the second direction based on an average value of the plurality of slopes so that the average value of the plurality of slopes approaches 0.
Therefore, by performing the rotation processing on the input image in the plane formed by the first direction and the second direction, the inclination angles of the plurality of object region frames relative to the first direction or the second direction in the plane can be relatively kept consistent, for example, within a certain angle range, thereby improving the line offset situation, for example, which may occur in the object included in the object region as a whole, and realizing the global correction on the input image.
In some other examples of the present disclosure, for the above step S4012A and step S4012B, the input image may also be subjected to correction processing according to a slope of an edge of each object region box extending in the first direction R1 with respect to the first direction R1, which is not limited by the embodiments of the present disclosure.
In some examples of the present disclosure, step S4013 may include the following steps S4013A to S4013D.
Step S4013A: first start coordinates and first end coordinates of the plurality of object region boxes in a first direction and second start coordinates and second end coordinates in a second direction in the table coordinate system are determined. For example, the first start coordinate of any one of the plurality of object region boxes includes a coordinate of a start row of a coordinate grid region occupied by any one of the object region boxes in the grid coordinate system, the first end coordinate of any one of the object region boxes includes a coordinate of an end row of the coordinate grid region occupied by any one of the object region boxes in the grid coordinate system, the second start coordinate of any one of the object region boxes includes a coordinate of a start column of the coordinate grid region occupied by any one of the object region boxes in the grid coordinate system, and the second end coordinate of any one of the object region boxes includes a coordinate of an end column of the coordinate grid region occupied by any one of the object region boxes in the grid coordinate system.
Step S4013B: dividing the object area frames into a plurality of rows and a plurality of columns, performing expansion processing on the object area frames row by row according to the direction pointing to the ending row along the starting row in the table coordinate system, and sequentially performing expansion processing on each row of the object area frames according to the direction pointing to the ending column along the starting column in the table coordinate system.
For the ith object region box of the plurality of object region boxes, for example, i is a positive integer:
step S4013C: performing expansion processing on the ith object area frame in the first direction so that the start line of the grid area occupied by the ith object area frame is moved in the first direction by the reference value each time in a direction away from the end line of the grid area occupied by the ith object area frame, moving the reference value in a first direction each time in a direction away from a start line of the coordinate lattice area occupied by the i-th object area frame so that the end line of the coordinate lattice area occupied by the i-th object area frame is moved until the first start coordinate of the i-th object area frame is equal to 0 or equal to the first end coordinate of any one of the object area frames other than the i-th object area frame among the plurality of object area frames, and the first end coordinate of the ith object area frame is equal to the maximum row value of the table coordinate system or equal to the first start coordinate of any one of the object area frames except the ith object area frame.
Step S4013D: performing expansion processing on the ith object area frame in the second direction such that the starting column of the grid area occupied by the ith object area frame is moved in the second direction in a direction away from the ending column of the grid area occupied by the ith object area frame each time by the base reference value, such that the ending column of the grid area occupied by the ith object area frame is moved in the second direction in a direction away from the starting column of the grid area occupied by the ith object area frame each time by the base reference value, until the second starting coordinate of the ith object area frame is made equal to 0 or equal to the second ending coordinate of any one of the plurality of object area frames other than the ith object area frame, and the second ending coordinate of the ith object area frame is made equal to the maximum column value of the table coordinate system or equal to the second starting coordinate of any one of the plurality of object area frames other than the ith object area frame, thereby obtaining an area labeling frame corresponding to the ith object area frame.
For example, taking the input image shown in fig. 3A as an example, the plurality of object region boxes 320 may be divided into 23 rows and 7 columns, the expansion process may be performed on each row of object region boxes 320 in sequence in a direction from, for example, "sequence number" to "22" to achieve the alignment process on the object region boxes 320 in each row, and the expansion process may be performed on each column of object region boxes 320 in sequence in a direction from, for example, "sequence number" to "reference value" to achieve the alignment process on the object region boxes 320 in each column.
For example, in the process of performing the expansion processing on the plurality of object region frames 320, the plurality of corresponding region label frames may be obtained by sequentially performing the expansion processing on the plurality of object region frames 320 once, or the plurality of corresponding region label frames may be obtained only after performing the expansion processing on the plurality of object region frames 320 repeatedly and repeatedly, that is, each object region frame 320 may be subjected to the expansion processing once or multiple times to obtain the final expanded region label frame. The number of swelling treatments is not particularly limited by the embodiments of the present disclosure.
For example, taking the determination of the second termination coordinate of the object region box 320 as an example, as shown in fig. 3C and 3D, the object region box 321 is expanded in the second direction R2 such that the termination column of the coordinate grid region 311 occupied by the object region box 321 is moved by the reference value in the second direction R2 in a direction away from the start column of the coordinate grid region 311 occupied by the object region box 321 until the second termination coordinate of the object region box 321 is made equal to the second start coordinate of the object region box 325 (i.e., the second start coordinate of the object region box 326, the object region box 327, or the object region box 328), thereby determining the second termination coordinate of the object region box 321; expanding the object region box 322 in the second direction R2 such that the ending column of the coordinate grid region 311 occupied by the object region box 322 is moved by the reference value each time in the second direction R2 in a direction away from the starting column of the coordinate grid region 311 occupied by the object region box 322 until the second ending coordinate of the object region box 322 is made equal to the second starting coordinate of the object region box 325, thereby determining the second ending coordinate of the object region box 322; expanding the object region box 323 in the second direction R2 such that the ending column of the grid region 311 occupied by the object region box 323 is moved by the reference value each time in the second direction R2 in a direction away from the starting column of the grid region 311 occupied by the object region box 323 until the second ending coordinate of the object region box 323 is made equal to the second starting coordinate of the object region box 325, thereby determining the second ending coordinate of the object region box 323; the object area box 324 is expanded in the second direction R2 such that the ending column of the coordinate grid area 311 occupied by the object area box 324 is moved in the second direction R2 in a direction away from the starting column of the coordinate grid area 311 occupied by the object area box 324 each time by the base reference value until the second ending coordinate of the object area box 324 is made equal to the second starting coordinate of the object area box 325, thereby determining the second ending coordinate of the object area box 324. The second start coordinate of the object region frame and the method for determining the first start coordinate and the first end coordinate may refer to the process for determining the second end coordinate, and are not described herein again.
Thus, after the expansion process is performed on each of the object region boxes 320 in the first direction R1 and the second direction R2, a plurality of region labeling boxes aligned in one-to-one correspondence with the plurality of object region boxes 320 can be obtained.
It should be noted that, for the relative position or specific arrangement condition between the multiple object regions included in the input image, in some other examples of the present disclosure, for example, when there is a large distance between adjacent object region frames in the second direction, the multiple object region frames may be expanded only in the first direction, and the object region frames are not expanded in the second direction, so as to obtain the region labeling frame aligned in the first direction, further simplify the alignment process for the multiple object region frames, and optimize the implementation process of the image processing method provided by the present disclosure.
Fig. 10 is a partial flowchart of step S402 in an image processing method according to at least one embodiment of the present disclosure.
For example, as shown in fig. 10, the determination of at least one dividing line based on the plurality of region labeling boxes in step S402 includes the following steps S421 to S426.
Step S421: and establishing a pixel coordinate system based on the table area frame by taking the pixel as a coordinate unit. For example, the pixel coordinate system includes a plurality of pixel units, a first coordinate axis of the pixel coordinate system is parallel to the first direction, and a second coordinate axis of the pixel coordinate system is parallel to the second direction.
Step S422: and determining the coordinates of the object area frames in a pixel coordinate system to obtain a plurality of pixel areas corresponding to the object area frames one by one.
Step S423: the pixel units occupied by the plurality of pixel areas in the pixel coordinate system are marked as first pixel units, and the pixel units except the first pixel units occupied by the plurality of pixel areas in the pixel coordinate system are marked as second pixel units.
Step S424: and sequentially determining the number of the first pixel units included in each row of pixel units in the pixel coordinate system along the second direction.
Step S425: and in response to the number of the first pixel units included in any one column of pixel units being less than or equal to the first pixel reference value, taking any one column of pixel units as a first intermediate dividing line to obtain at least one first intermediate dividing line.
Step S426: at least one first dividing line extending in a first direction is determined in the table coordinate system based on the at least one first intermediate dividing line. For example, the at least one dividing line includes at least one first dividing line.
For example, taking the input image shown in fig. 3A as an example, as shown in fig. 3E and 3F, a pixel coordinate system in units of pixels as shown in fig. 3E is established based on the table area frame 310 shown in fig. 3B, and after the coordinates of the object area frames 320 in the pixel coordinate system are determined, a plurality of pixel areas 321 corresponding to the object area frames 320 one by one can be obtained.
For example, as shown in fig. 3E, the pixel cell occupied by pixel region 321 in the pixel coordinate system is labeled as a first pixel cell PX1, such as the white pixel cell shown in fig. 3E; the pixel units in the pixel coordinate system other than the first pixel unit PX1 occupied by the plurality of pixel areas 321 are each labeled as a second pixel unit PX2, such as a black pixel unit shown in fig. 3E, whereby the relative positions of the plurality of pixel areas 321 in the pixel coordinate system corresponding to the plurality of object-area frames 320 can be represented by the first pixel unit PX1 and the second pixel unit PX 2.
Further, after the pixel units in the pixel coordinate system are respectively labeled as the first pixel unit PX1 and the second pixel unit PX2, the number of the first pixel units PX1 included in each column of pixel units in the pixel coordinate system is sequentially determined in the second direction R2, and the first intermediate dividing line is determined based on the number of the first pixel units PX1 included in each column of pixel units. When the number of the first pixel units PX1 included in a column of pixel units is less than or equal to the first pixel reference value, for example, the number of the first pixel units PX1 included in the column of pixel units is 0 or the number of the first pixel units PX1 included in the column of pixel units is significantly smaller than the number of the first pixel units PX1 included in other columns of pixel units, the column of pixel units may serve as a first intermediate division line. Thus, one or more first dividing lines extending in the first direction R1, for example, the one or more first dividing lines, that is, corresponding to the line segment CL1 shown in fig. 3F, may be determined in the table coordinate system based on the one or more first intermediate dividing lines determined in the pixel coordinate system, so that the dividing process of the plurality of region labeling boxes in the second direction R2 may be realized based on the obtained one or more first dividing lines in the subsequent step.
For example, in some examples, the first pixel reference value may be 0, and thus in response to the number of first pixel cells PX1 included in a column of pixel cells being equal to the first pixel reference value (i.e., equal to 0), i.e., when a column of pixel cells does not include the first pixel cell PX1, and each pixel cell in the column is a second pixel cell PX2, the column of pixel cells may be taken as a first intermediate division line. Alternatively, in some examples, the first pixel reference value may be determined based on an image height of the input image, e.g., the first pixel reference value may be determined to be a positive number greater than 0 based on the image height of the input image, whereby a column of pixel cells is treated as a first intermediate dividing line in response to the number of first pixel cells PX1 included in the column being less than the first pixel reference value. For example, when the first pixel reference value is determined to be a positive number PR1 greater than 0 based on the image height of the input image, if the number N1 of the first pixel cells PX1 included in a column of pixel cells satisfies 0 ≦ N1 < PR1, the column of pixel cells may be treated as one first intermediate dividing line. Therefore, the part of the input image which is not occupied by the object area or the object included in the object area or the part which is relatively less occupied by the object included in the object area or the object included in the object area can be accurately determined, the segmentation processing of the area labeling frame corresponding to the object area is realized based on the part, and the table structure corresponding to the table area is generated based on the area labeling frame after the segmentation processing.
For example, taking the example of determining the first pixel reference value PR1 based on the image height of the input image, the image height of the input image is, for example, the total number PN1 of pixels included in an entire column of pixels along the first direction R1 (i.e., the column direction), the first pixel reference value PR1 may be, for example, 0.3 times the image height of the input image, that is, PR1 is 0.3 × PN1, whereby when the number of first pixel cells PX1 included in a row of pixel cells in the pixel coordinate system is less than 0.3 x PN1, it may be determined that the number of first pixel cells PX1 included in the column of pixel cells is significantly or relatively much smaller than the number of first pixel cells PX1 included in the other columns of pixel cells, and the row of pixel units can be used as a first middle dividing line, so that the accuracy and reliability of the first middle dividing line determined in the pixel coordinate system are improved, and the accurate division of the region labeling frame is facilitated.
It should be noted that, in some other examples of the present disclosure, the first pixel reference value may also be 0.1 times, 0.15 times, 0.2 times, 0.25 times, 0.35 times, 0.4 times, or another suitable value of the image height of the input image, and the embodiments of the present disclosure are not limited thereto.
It should be noted that, in some other examples of the present disclosure, the first pixel reference value may also be determined based on the image height of the table area or the table area frame, that is, the first pixel reference value may be based on the total number of pixels included in an entire column of pixels along the first direction R1 in the table area, for example, the first pixel reference value may be 0.15 times, 0.2 times, 0.3 times, or another suitable value of the image height of the table area or the table area frame, which is not limited by the embodiments of the present disclosure.
For example, step S426 may include the following step S426A and step S426B.
Step S426A: in response to any one of the at least one first intermediate dividing line not having an adjacent first intermediate dividing line in the second direction, mapping any one of the first intermediate dividing lines from the pixel coordinate system into the table coordinate system to obtain one first dividing line corresponding to any one of the first intermediate dividing lines in the table coordinate system.
Step S426B: in response to the at least one first intermediate dividing line including X first intermediate dividing lines continuing in the second direction, mapping any one of the X first intermediate dividing lines from the pixel coordinate system into the table coordinate system to obtain one first dividing line corresponding to the X first intermediate dividing lines in the table coordinate system. For example, X is a positive integer.
For example, when one of the obtained first intermediate dividing lines does not have an adjacent first intermediate dividing line in the second direction R2, that is, when the number of first pixel units PX1 included in a column of pixel units is equal to 0 and the number of first pixel units PX1 included in any one column of pixel units adjacent to the column of pixel units in the second direction R2 is greater than 0, or when the number of the first pixel units PX1 included in a column of pixel units is less than the first pixel reference value PR1(PR1 > 0) and the number of the first pixel units PX1 included in any one column of pixel units adjacent to the column of pixel units in the second direction R2 is greater than or equal to the first pixel reference value PR1, the row of pixel cells may be taken as a first intermediate partition line and the first intermediate partition line is mapped from the pixel coordinate system to the table coordinate system to obtain a first partition line.
For example, when one of the obtained first intermediate dividing lines has an adjacent first intermediate dividing line in the second direction R2, that is, when the number of the first pixel units PX1 included in a column of pixel units is less than the first pixel reference value PR1 and the number of the first pixel units PX1 included in any column of pixel units adjacent to the column of pixel units in the second direction R2 is also less than the first pixel reference value PR1, that is, when the number of the first pixel units PX1 included in one column of pixel units is equal to 0 or a positive integer smaller than PR1 and the number of the first pixel units PX1 included in any column of pixel units adjacent to the column of pixel units in the second direction R2 is also equal to 0 or a positive integer smaller than PR1, any one of the adjacent first intermediate dividing lines can be mapped from the pixel coordinate system to the table coordinate system to obtain one first dividing line.
For example, in some examples, when there are a plurality of first intermediate dividing lines that are continuous in the second direction R2 in the pixel coordinate system, a first intermediate dividing line located at an intermediate position in the plurality of first intermediate dividing lines in the second direction R2 may be mapped from the pixel coordinate system into the table coordinate system to obtain a first dividing line, and for example, a central line of the plurality of first intermediate dividing lines may be taken and mapped from the pixel coordinate system into the table coordinate system to obtain a first dividing line, so as to improve accuracy and reliability of the dividing process of the plurality of region labeling frames based on the obtained dividing line in the subsequent step, thereby optimizing the obtained table structure corresponding to the table region in the input image.
For example, as shown in fig. 10, the determination of at least one dividing line based on a plurality of object region frames in step S402 further includes the following steps S427 to S432.
Step S427: an object included in each of the plurality of object regions is identified.
Step S428: the coordinates of the object comprised by each object region in the pixel coordinate system are determined.
Step S429: and marking the pixel units occupied by the objects included in each object area in the pixel coordinate system as third pixel units, and marking the pixel units except the third pixel units occupied by the objects included in each object area in the pixel coordinate system as fourth pixel units.
Step S430: and sequentially determining the number of third pixel units included in each row of pixel units in the pixel coordinate system along the first direction.
Step S431: and in response to the number of third pixel units included in any row of pixel units being less than or equal to the second pixel reference value, taking any row of pixel units as a second intermediate dividing line to obtain at least one second intermediate dividing line.
Step S432: at least one second dividing line extending in the second direction is determined in the table coordinate system based on the at least one second intermediate dividing line. For example, the at least one dividing line comprises at least one second dividing line.
For example, taking the input image shown in fig. 3A as an example, referring to fig. 3G and 3H, after a pixel coordinate system having pixels as coordinate units is established based on the table area frame 310 shown in fig. 3B, the coordinates of the object included in the identified object area in the pixel coordinate system are determined.
For example, as shown in fig. 3G, a pixel unit occupied by the object included in each object region in the pixel coordinate system is labeled as a third pixel unit PX3, for example, a white pixel unit shown in fig. 3G; the pixel units in the pixel coordinate system other than the third pixel unit PX3 occupied by the object included in each object area are each labeled as a fourth pixel unit PX4, for example, a black pixel unit shown in fig. 3G, whereby the relative position of the object included in each object area in the pixel coordinate system can be represented by the third pixel unit PX3 and the fourth pixel unit PX 4.
Further, after the pixel units in the pixel coordinate system are respectively labeled as the third pixel unit PX3 and the fourth pixel unit PX4, the number of the third pixel units PX3 included in each row of the pixel units in the pixel coordinate system is sequentially determined along the first direction R1, and the second intermediate dividing line is determined based on the number of the third pixel units PX3 included in each row of the pixel units. When the number of the third pixel units PX3 included in a row of pixel units is less than or equal to the second pixel reference value, for example, the number of the third pixel units PX3 included in the row of pixel units is 0 or the number of the third pixel units PX3 included in the row of pixel units is significantly smaller than the number of the third pixel units PX3 included in other rows of pixel units, the row of pixel units may serve as a second intermediate dividing line. Thus, one or more second dividing lines extending in the second direction R2, for example, the one or more second dividing lines, that is, corresponding to the line segment CL2 shown in fig. 3H, may be determined in the table coordinate system based on the one or more second intermediate dividing lines determined in the pixel coordinate system, so that the dividing process of the plurality of region labeling boxes in the first direction R1 may be realized in the subsequent step based on the obtained one or more second dividing lines.
In the above example, since the distance between objects, such as text or data, adjacent in the first direction R1 in the input image is small, a corresponding second intermediate dividing line extending in the second direction R2 can be determined in the pixel coordinate system based on the determined coordinates of the objects included in each object region, thereby improving the accuracy and reliability of the second dividing line located in the table coordinate system based on the second intermediate dividing line, and further improving the accuracy and reliability of the subsequent dividing process of the plurality of region labeling boxes based on the dividing lines.
It should be noted that, in some other examples of the present disclosure, the second intermediate dividing line may also be determined based on the object region frame with reference to the determination method of the first intermediate dividing line; alternatively, according to a specific arrangement manner of the objects in the input image, the first intermediate dividing line may also be determined based on the objects included in each object region in the input image with reference to a determination method of the second intermediate dividing line, and the embodiment of the present disclosure is not limited thereto.
For example, in some examples, the second pixel reference value may be 0, and thus in response to the number of third pixel cells PX3 included in a row of pixel cells being equal to the second pixel reference value (i.e., equal to 0), i.e., when the third pixel cell PX3 is not included in a row of pixel cells, each pixel cell in the row being the fourth pixel cell PX4, the row of pixel cells may be treated as a second intermediate division line. Alternatively, in some examples, the second pixel reference value may be determined based on an image width of the input image or an image length of the input image, for example, the second pixel reference value may be determined to be a positive number greater than 0 based on the image width of the input image, thereby regarding a row of pixel cells as one second intermediate dividing line in response to the number of third pixel cells PX3 included in the row being smaller than the second pixel reference value. For example, when the second pixel reference value is determined to be a positive number PR2 greater than 0 based on the image width of the input image, if the number N2 of the third pixel cells PX3 included in a row of pixel cells satisfies 0 ≦ N2 < PR2, the row of pixel cells may be treated as one second intermediate dividing line. Therefore, the part of the input image which is not occupied by the object in the object area or the part which is relatively less occupied by the object in the object area can be accurately determined, the segmentation processing of the area labeling frame corresponding to the object area is realized based on the part, and the table structure corresponding to the table area is generated based on the area labeling frame after the segmentation processing.
For example, taking the example of determining the second pixel reference value PR2 based on the image width of the input image, the image width of the input image is, for example, the total number PN2 of pixels included in an entire row of pixels along the second direction R2 (i.e., the row direction), the second pixel reference value PR2 may be, for example, 0.3 times the image width of the input image, that is, PR2 is 0.3 × PN2, whereby when the number of the third pixel units PX3 included in one row of pixel units in the pixel coordinate system is less than 0.3 x PN2, it may be determined that the number of third pixel elements PX3 comprised in the row of pixel elements is significantly or relatively much smaller than the number of third pixel elements PX3 comprised in the other rows of pixel elements, and the pixel units of the row can be used as a second middle dividing line, so that the accuracy and reliability of the second middle dividing line determined in the pixel coordinate system are improved, and the accurate division of the region labeling frame is facilitated.
It should be noted that, in some other examples of the present disclosure, the second pixel reference value may also be 0.1 times, 0.15 times, 0.2 times, 0.25 times, 0.35 times, 0.4 times, or another suitable value of the image width of the input image, and embodiments of the present disclosure are not limited thereto.
It should be noted that, in some other examples of the present disclosure, the second pixel reference value may also be determined based on the image width or the image length of the table area or the table area frame, that is, the second pixel reference value may be based on the total number of pixels included in an entire row of pixels along the second direction R2 in the table area, for example, the second pixel reference value may be 0.15 times, 0.2 times, 0.3 times, or another suitable value of the image width of the table area or the table area frame, which is not limited by the embodiments of the present disclosure.
For example, step S432 may include the following step S432A step S432B.
Step S432A: in response to any one of the at least one second intermediate dividing line not having an adjacent second intermediate dividing line in the first direction, mapping any one of the second intermediate dividing lines from the pixel coordinate system into the table coordinate system to obtain one second dividing line corresponding to any one of the second intermediate dividing lines in the table coordinate system.
Step S432B: in response to the at least one second intermediate dividing line including Y second intermediate dividing lines continuing in the first direction, any one of the Y second intermediate dividing lines is mapped from the pixel coordinate system into the table coordinate system to obtain one second dividing line corresponding to the Y second intermediate dividing lines in the table coordinate system. For example, Y is a positive integer.
For example, when one second intermediate dividing line is obtained without having an adjacent second intermediate dividing line in the first direction R1, that is, when the number of the third pixel units PX3 included in a row of pixel units is equal to 0 and the number of the third pixel units PX3 included in any one row of pixel units adjacent to the row of pixel units in the first direction R1 is greater than 0, or when the number of the third pixel units PX3 included in a row of pixel units is less than the second pixel reference value PR2(PR2 > 0) and the number of the third pixel units PX3 included in any one row of pixel units adjacent to the row of pixel units in the first direction R1 is greater than or equal to the second pixel reference value PR2, the row of pixel cells may be taken as a second intermediate partition line and the second intermediate partition line is mapped from the pixel coordinate system to the table coordinate system resulting in a second partition line.
When, for example, one of the obtained second intermediate dividing lines has an adjacent second intermediate dividing line in the first direction R1, that is, when the number of the third pixel cells PX3 included in a row of pixel cells is less than the second pixel reference value PR2 and the number of the third pixel cells PX3 included in any row of pixel cells adjacent to the row of pixel cells in the first direction R1 is also less than the second pixel reference value PR2, that is, when the number of the third pixel units PX3 included in one row of pixel units is equal to 0 or a positive integer smaller than PR2 and the number of the third pixel units PX3 included in any row of pixel units adjacent to the row of pixel units in the first direction R1 is also equal to 0 or a positive integer smaller than PR2, any one of the adjacent second intermediate dividing lines can be mapped from the pixel coordinate system to the table coordinate system to obtain a second dividing line.
For example, in some examples, when there are a plurality of second intermediate dividing lines that are continuous in the first direction R1 in the pixel coordinate system, a second intermediate dividing line that is located at an intermediate position in the plurality of second intermediate dividing lines in the first direction R1 may be mapped from the pixel coordinate system into the table coordinate system to obtain a second dividing line, for example, a central line of the plurality of second intermediate dividing lines may be taken and mapped from the pixel coordinate system into the table coordinate system to obtain a second dividing line, so as to improve accuracy and reliability of the dividing process for the plurality of region labeling frames based on the obtained dividing lines in the subsequent step, and further optimize the obtained table structure corresponding to the table region in the input image.
For example, in some examples of the present disclosure, the dividing process of the plurality of region labeling boxes by the at least one dividing line in step S402 to form the plurality of cells may include: and carrying out segmentation processing on the plurality of region labeling frames according to the at least one first segmentation line and the at least one second segmentation line to obtain a plurality of cells. Thus, after the first dividing line and the second dividing line in the table coordinate system are determined from the first intermediate dividing line and the second intermediate dividing line in the pixel coordinate system, the division processing of the plurality of area labeling frames can be realized in the table coordinate system based on the obtained first dividing line and second dividing line, and the corresponding table structure can be generated from the area labeling frames after the division processing.
For example, in some examples of the present disclosure, the performing, by at least one dividing line, a dividing process on the plurality of region labeling boxes to form the plurality of cells in step S402 further includes: and in response to the fact that the input image is subjected to table line detection processing and at least one table line segment is obtained through detection, at least one dividing line is subjected to correction processing based on the at least one table line segment, and the plurality of area labeling frames are subjected to division processing based on the at least one dividing line subjected to correction processing, so that a plurality of cells are obtained. Therefore, the accuracy and the reliability of the obtained dividing line can be further improved by correcting the determined dividing line according to the actual dividing mode of the object region in the input image, and the accuracy and the reliability of the subsequent dividing processing of the plurality of region labeling frames based on the dividing line are improved.
For example, in some examples of the present disclosure, step S403 includes the following step S4031.
Step S4031: the multiple cells are merged and/or divided to obtain multiple target cells based on the multiple cells. For example, the cell table includes a plurality of target cells.
For example, in some examples, step S4031 includes the following steps S4031A through S4031C.
Step S4031A: and responding to the situation that at least one division line penetrates through any region marking frame, and determining whether any region marking frame needs to be subjected to splitting processing.
Step S4031B: responding to the requirement of splitting any region labeling box, splitting any region labeling box into a plurality of splitting labeling boxes, and taking the cell to which each splitting labeling box in the splitting labeling boxes belongs as a target cell.
Step S4031C: and in response to that any region labeling frame does not need to be split, merging a plurality of cells occupied by any region labeling frame to obtain a target cell.
Therefore, in the image processing method provided by the embodiment of the present disclosure, the basic table structure of the unit table corresponding to the table area may be determined based on the plurality of area labeling frames after the alignment processing, for example, approximate relative positions between the respective unit cells corresponding to the table structure may be approximately obtained, and after the dividing line is determined based on the plurality of object area frames in the input image that are not subjected to the alignment processing, the plurality of area labeling frames may be divided by the dividing line, so that the accurate division and positioning of the respective unit cells are realized by combining the dividing line with the area labeling frames after the alignment processing. Furthermore, after obtaining the plurality of cells, the plurality of cells formed by the division processing of the division line may be further split according to the position of the division line relative to the region labeling frame to obtain a more accurate target cell, thereby optimizing the accuracy and reliability of the cell table corresponding to the table region in the input image generated based on the target cell.
Fig. 11 is a flowchart illustrating a further image processing method according to at least one embodiment of the present disclosure.
It should be noted that, except for steps S60 and S70, steps S10 to S40 shown in fig. 11 are substantially the same as steps S10 to S40 shown in fig. 1, and repeated parts are not repeated.
For example, as shown in fig. 11, the image processing method further includes the following step S60 and step S70.
Step S60: an object included in each of the plurality of object regions is identified.
Step S70: and filling objects included in the object areas into target cells of the cell table correspondingly to generate an object table.
With step S60, the object contained in each object region of the input image may be recognized by, for example, a character recognition model to enable extraction of the related information of the object contained in the input image. For example, the character recognition model may be implemented based on techniques such as optical character recognition and run on, for example, a general purpose computing device or a special purpose computing device, e.g., the character recognition model may be a pre-trained neural network model.
In step S70, the identified object is filled into each target cell of the corresponding cell table, so as to generate an object table containing information about the object in the input image, and the user can more intuitively and regularly acquire information such as data and text content in the input image through the generated object table.
For example, taking the input image shown in fig. 3A as an example, by using the image processing method provided by the embodiment of the present disclosure, an object table as shown in fig. 3I can be generated. Therefore, compared with the input image shown in fig. 3A, the object table shown in fig. 3I can enable the relevant information of the object in the input image to be more concise, normative and intuitively presented to the user, so that the efficiency of the user for acquiring the relevant information of the object in the input image is improved, and the user experience is improved.
In some embodiments of the present disclosure, after the step S70, the image processing method may further include: based on the input image, objects filled in the cells of the cell form are adjusted. For example, it may be determined whether the object filled in each cell is accurate, for example, whether an error or omission occurs, compared to the input image, thereby improving the accuracy and reliability of the generated object table. For example, the object in the input image includes text, that is, the related information of the object can be shown by text such as characters, numbers, symbols, and the like, so that the generated object form can be clearer and more normative by adjusting the font height and/or font style of the text filled in each cell of the cell form, thereby facilitating the user to intuitively and conveniently obtain the required information. For example, the word height of the original text may be recorded and the text filled in using a preset font.
For example, in the case where the table region of the input image includes a wired table, the above-described steps S60 and S70 are performed after step S50 shown in fig. 7, thereby generating an object table corresponding to the table region of the input image.
For example, taking the input image shown in fig. 2A as an example, by using the image processing method provided by the embodiment of the present disclosure, an object table as shown in fig. 2H can be generated. Therefore, compared with the input image shown in fig. 2A, the object table shown in fig. 2H can make the relevant information of the object in the input image more concise, normative and intuitive to present to the user, which is helpful for the user to more intuitively and conveniently obtain the required information.
At least one embodiment of the present disclosure further provides an image processing apparatus, and fig. 12 is a schematic block diagram of an image processing apparatus provided in at least one embodiment of the present disclosure.
As shown in fig. 12, the image processing apparatus 500 may include: an image acquisition module 501, an area recognition processing module 502, a table line detection processing module 503, and a cell table generation module 504.
For example, the image acquisition module 501 is configured to acquire an input image. For example, the input image includes a table region including a plurality of object regions, each of the plurality of object regions including at least one object.
For example, the region identification processing module 502 is configured to perform region identification processing on the input image to obtain a plurality of object region frames corresponding to the plurality of object regions one to one and a table region frame corresponding to the table region.
For example, the form line detection processing module 503 is configured to perform form line detection processing on the input image to determine whether the form area includes a wired form.
For example, the unit table generation module 504 is configured to, in response to the table region not including the wired table: aligning the object region frames to obtain a plurality of region labeling frames which are in one-to-one correspondence with the object region frames; determining at least one dividing line based on the object region frames, and dividing the region labeling frames through the at least one dividing line to form a plurality of cells; based on the plurality of cells, a cell table corresponding to the table area is generated.
For example, the image acquisition module 501, the area recognition processing module 502, the table line detection processing module 503, and the cell table generation module 504 may include codes and programs stored in a memory; the processor may execute the code and program to implement some or all of the functions of the image acquisition module 501, the area identification processing module 502, the table line detection processing module 503, and the cell table generation module 504 as described above. For example, the image acquisition module 501, the area recognition processing module 502, the table line detection processing module 503, and the cell table generation module 504 may be dedicated hardware devices for implementing some or all of the functions of the image acquisition module 501, the area recognition processing module 502, the table line detection processing module 503, and the cell table generation module 504 described above. For example, the image acquisition module 501, the area recognition processing module 502, the table line detection processing module 503, and the unit table generation module 504 may be one circuit board or a combination of a plurality of circuit boards for realizing the functions as described above. In the embodiment of the present application, the one or a combination of a plurality of circuit boards may include: (1) one or more processors; (2) one or more non-transitory memories connected to the processor; and (3) firmware stored in the memory executable by the processor.
It should be noted that the image obtaining module 501 is configured to implement step S10 shown in fig. 1, the region identification processing module 502 is configured to implement step S20 shown in fig. 1, the table line detection processing module 503 is configured to implement step S30 shown in fig. 1, and the unit table generating module 504 is configured to implement step S40 shown in fig. 1, and includes, for example, steps S401 to S403. Therefore, for specific descriptions of functions that can be realized by the image obtaining module 501, the area identifying module 502, the table line detecting module 503, and the cell table generating module 504, reference may be made to the related descriptions of step S10 to step S40 in the above embodiment of the image processing method, and repeated descriptions are omitted. In addition, the image processing apparatus can achieve similar technical effects to the image processing method, and details are not repeated herein.
At least one embodiment of the present disclosure further provides an electronic device, and fig. 13 is a schematic diagram of an electronic device provided in at least one embodiment of the present disclosure.
For example, as shown in fig. 13, the electronic device includes a processor 601, a communication interface 602, a memory 603, and a communication bus 604. The processor 601, the communication interface 602, and the memory 603 communicate with each other via the communication bus 604, and components such as the processor 601, the communication interface 602, and the memory 603 may communicate with each other via a network connection. The present disclosure is not limited herein as to the type and function of the network. It should be noted that the components of the electronic device shown in fig. 13 are only exemplary and not limiting, and the electronic device may have other components according to the actual application.
For example, memory 603 is used to store computer readable instructions non-transiently. The processor 601 is configured to implement the image processing method according to any of the above embodiments when executing computer readable instructions. For specific implementation and related explanation of each step of the image processing method, reference may be made to the above-mentioned embodiment of the image processing method, which is not described herein again.
For example, other implementations of the image processing method implemented by the processor 601 executing the computer readable instructions stored in the memory 603 are the same as the implementations mentioned in the foregoing method embodiment, and are not described herein again.
For example, the communication bus 604 may be a peripheral component interconnect standard (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
For example, the communication interface 602 is used to enable communication between an electronic device and other devices.
For example, the processor 601 and the memory 603 may be located on a server side (or cloud side).
For example, the processor 601 may control other components in the electronic device to perform desired functions. The processor 601 may be a device having data processing capability and/or program execution capability, such as a Central Processing Unit (CPU), a Network Processor (NP), a Tensor Processor (TPU), or a Graphics Processing Unit (GPU); but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The Central Processing Unit (CPU) may be an X86 or ARM architecture, etc.
For example, memory 603 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer-readable instructions may be stored on the computer-readable storage medium and executed by the processor 601 to implement various functions of the electronic device. Various application programs and various data and the like can also be stored in the storage medium.
For example, in some embodiments, the electronic device may also include an image acquisition component. The image acquisition component is used for acquiring an input image. The memory 603 is also used for storing input images.
For example, the image acquisition component may be a camera of a smartphone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, or even a webcam.
For example, the input image may be an original image directly captured by the image capturing unit, or may be an image obtained by preprocessing the original image. The pre-processing may eliminate extraneous or noisy information in the original image to facilitate better processing of the input image. The preprocessing may include, for example, processing of an original image such as image expansion (Data augmentation), image scaling, Gamma (Gamma) correction, image enhancement, or noise reduction filtering.
For example, the detailed description of the process of executing the image processing by the electronic device may refer to the related description in the embodiment of the image processing method, and repeated descriptions are omitted.
Fig. 14 is a schematic diagram of a non-transitory computer-readable storage medium according to at least one embodiment of the disclosure. For example, as shown in fig. 14, one or more computer-readable instructions 701 may be stored non-transitory on a storage medium 700. For example, the computer readable instructions 701, when executed by a processor, may perform one or more steps according to the image processing method described above.
For example, the storage medium 700 may be applied to the electronic device described above, and for example, the storage medium 700 may include the memory 603 in the electronic device.
For example, the description of the storage medium 700 may refer to the description of the memory in the embodiment of the electronic device, and repeated descriptions are omitted.
For the present disclosure, there are also the following points to be explained:
(1) the drawings of the embodiments of the disclosure only relate to the structures related to the embodiments of the disclosure, and other structures can refer to the common design.
(2) Thicknesses and dimensions of layers or structures may be exaggerated in the drawings used to describe embodiments of the present invention for clarity. It will be understood that when an element such as a layer, film, region, or substrate is referred to as being "on" or "under" another element, it can be "directly on" or "under" the other element or intervening elements may be present.
(3) Without conflict, embodiments of the present disclosure and features of the embodiments may be combined with each other to arrive at new embodiments.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and the scope of the present disclosure should be subject to the scope of the claims.

Claims (29)

1. An image processing method comprising:
acquiring an input image, wherein the input image comprises a table area, the table area comprises a plurality of object areas, and each object area in the plurality of object areas comprises at least one object;
performing region identification processing on the input image to obtain a plurality of object region frames corresponding to the object regions one by one and a table region frame corresponding to the table region;
performing table line detection processing on the input image to judge whether the table area comprises a wired table or not; and
in response to the table area not including a wired table:
aligning the object region frames to obtain a plurality of region labeling frames which are in one-to-one correspondence with the object region frames;
determining at least one dividing line based on the object region frames, and dividing the region labeling frames through the at least one dividing line to form a plurality of cells; and
based on the plurality of cells, a cell table corresponding to the table region is generated.
2. The image processing method according to claim 1, wherein aligning the plurality of object region frames to obtain the plurality of region labeling frames in one-to-one correspondence with the plurality of object region frames comprises:
dividing the table area frame into a plurality of coordinate grid areas which are arranged in M rows and N columns along a first direction and a second direction by taking a reference value as a coordinate unit to establish a table coordinate system, wherein the coordinate grid areas in M rows are arranged along the first direction, the coordinate grid areas in N columns are arranged along the second direction, and M and N are positive integers;
determining coordinates of the plurality of object region boxes in the table coordinate system; and
and performing expansion processing on the multiple object area frames based on the coordinates of the multiple object area frames in the table coordinate system to obtain the multiple area labeling frames.
3. The image processing method according to claim 2, wherein the base reference value is determined according to an average height of the plurality of object region frames in the first direction.
4. The image processing method of claim 2, wherein determining coordinates of the plurality of object region boxes in the table coordinate system comprises:
determining a plurality of slopes of the plurality of object region boxes, wherein the slope of each object region box of the plurality of object region boxes represents a slope of an edge of the each object region box extending in the second direction relative to the second direction;
according to a plurality of slopes of the object region frames, correcting the input image to obtain a corrected input image; and
determining coordinates of the plurality of object region frames in the table coordinate system based on the corrected input image.
5. The image processing method according to claim 4, wherein performing correction processing on the input image according to the slopes of the object region frames to obtain the corrected input image comprises:
calculating an average value of a plurality of slopes of the plurality of object region frames according to the plurality of slopes; and
rotating the input image in a plane formed by the first direction and the second direction based on an average value of the plurality of slopes so that the average value of the plurality of slopes approaches 0.
6. The image processing method according to any one of claims 2 to 5, wherein performing dilation processing on the plurality of object region frames based on coordinates of the plurality of object region frames in the table coordinate system to obtain the plurality of region labeling frames comprises:
determining first start coordinates and first end coordinates of the plurality of object region frames in the first direction in the table coordinate system and second start coordinates and second end coordinates in the second direction, wherein the first start coordinates of any object region frame of the plurality of object region frames include coordinates of a start row of a coordinate grid region occupied by the any object region frame in the table coordinate system, the first end coordinates of any object region frame include coordinates of an end row of a coordinate grid region occupied by the any object region frame in the table coordinate system, the second start coordinates of any object region frame include coordinates of a start column of a coordinate grid region occupied by the any object region frame in the table coordinate system, and the second end coordinates of any object region frame include coordinates of a stop column of a coordinate grid region occupied by the any object region frame in the table coordinate system Marking;
dividing the object area frames into a plurality of rows and a plurality of columns, performing expansion processing on the object area frames row by row according to the direction of pointing a starting row to a terminating row in the table coordinate system, and sequentially performing expansion processing on each row of object area frames according to the direction of pointing the starting column to the terminating column in the table coordinate system;
for an ith object region box of the plurality of object region boxes, wherein i is a positive integer,
expanding the ith object area frame in the first direction, so that the start row of the grid area occupied by the ith object area frame is moved each time by the reference value in the first direction in a direction away from the end row of the grid area occupied by the ith object area frame, so that the end row of the grid area occupied by the ith object area frame is moved each time by the reference value in the first direction in a direction away from the start row of the grid area occupied by the ith object area frame, until the first start coordinate of the ith object area frame is equal to 0 or equal to the first end coordinate of any one of the object area frames except the ith object area frame, and the first end coordinate of the ith object area frame is equal to the maximum row value of the table coordinate system or equal to the maximum row value of the table coordinate system in the plurality of object area frames except for the end row of the grid area frame A first start coordinate of any one of the object region frames other than the ith object region frame,
expanding the ith object area frame in the second direction, so that the starting column of the grid area occupied by the ith object area frame is moved each time by the reference value in the second direction in a direction away from the ending column of the grid area occupied by the ith object area frame, so that the ending column of the grid area occupied by the ith object area frame is moved each time by the reference value in the second direction in a direction away from the starting column of the grid area occupied by the ith object area frame, until the second starting coordinate of the ith object area frame is equal to 0 or equal to the second ending coordinate of any one of the plurality of object area frames except for the ith object area frame, and the second ending coordinate of the ith object area frame is equal to the maximum column value of the table coordinate system or equal to the maximum column value of the table coordinate system in the plurality of object area frames except for the ith object area frame And obtaining a region labeling frame corresponding to the i-th object region frame by using the second start coordinate of any object region frame except the i-th object region frame.
7. The image processing method according to any one of claims 2 to 5, wherein determining the at least one dividing line based on the plurality of object region boxes comprises:
establishing a pixel coordinate system based on the table area frame by taking a pixel as a coordinate unit, wherein the pixel coordinate system comprises a plurality of pixel units, a first coordinate axis of the pixel coordinate system is parallel to the first direction, and a second coordinate axis of the pixel coordinate system is parallel to the second direction;
determining coordinates of the object region frames in the pixel coordinate system to obtain a plurality of pixel regions corresponding to the object region frames one by one;
marking pixel units occupied by the plurality of pixel areas in the pixel coordinate system as first pixel units, and marking pixel units except the first pixel units occupied by the plurality of pixel areas in the pixel coordinate system as second pixel units;
sequentially determining the number of first pixel units included in each row of pixel units in the pixel coordinate system along the second direction;
in response to the number of first pixel units included in any column of pixel units being smaller than or equal to a first pixel reference value, taking any column of pixel units as a first intermediate dividing line to obtain at least one first intermediate dividing line; and
determining at least one first dividing line extending in the first direction in the table coordinate system based on the at least one first intermediate dividing line, wherein the at least one dividing line includes the at least one first dividing line.
8. The image processing method according to claim 7, wherein determining the at least one first dividing line extending in the first direction in the table coordinate system based on the at least one first intermediate dividing line comprises:
mapping any one of the at least one first intermediate dividing line from the pixel coordinate system into the table coordinate system in response to the any one of the at least one first intermediate dividing line not having an adjacent first intermediate dividing line in the second direction to obtain one of the first intermediate dividing lines in the table coordinate system corresponding to the any one first intermediate dividing line;
in response to the at least one first intermediate dividing line including X first intermediate dividing lines that are continuous in the second direction, mapping any one of the X first intermediate dividing lines from the pixel coordinate system into the table coordinate system to obtain one of the first dividing lines corresponding to the X first intermediate dividing lines in the table coordinate system, where X is a positive integer.
9. The image processing method according to claim 7, wherein the first pixel reference value is 0 or the first pixel reference value is determined based on an image height of the input image.
10. The image processing method of claim 7, wherein determining the at least one partition line based on the plurality of object region boxes further comprises:
identifying an object included in each of the plurality of object regions;
determining coordinates of the object included in each object region in the pixel coordinate system;
marking pixel units occupied by the objects included in each object area in the pixel coordinate system as third pixel units, and marking pixel units except the third pixel units occupied by the objects included in each object area in the pixel coordinate system as fourth pixel units;
sequentially determining the number of third pixel units included in each row of pixel units in the pixel coordinate system along the first direction;
in response to the fact that the number of third pixel units included in any row of pixel units is smaller than or equal to a second pixel reference value, taking any row of pixel units as a second middle dividing line to obtain at least one second middle dividing line; and
determining at least one second dividing line extending in the second direction in the table coordinate system based on the at least one second intermediate dividing line, wherein the at least one dividing line includes the at least one second dividing line.
11. The image processing method according to claim 10, wherein determining the at least one second dividing line extending in the second direction in the table coordinate system based on the at least one second intermediate dividing line comprises:
mapping any one of the at least one second intermediate dividing line from the pixel coordinate system into the table coordinate system in response to the any one of the at least one second intermediate dividing line not having an adjacent second intermediate dividing line in the first direction to obtain one of the second dividing lines corresponding to the any one second intermediate dividing line in the table coordinate system;
in response to the at least one second intermediate division line including Y second intermediate division lines that are continuous in the first direction, mapping any one of the Y second intermediate division lines from the pixel coordinate system into the table coordinate system to obtain one of the second division lines corresponding to the Y second intermediate division lines in the table coordinate system, where Y is a positive integer.
12. The image processing method according to claim 10, wherein the second pixel reference value is 0 or the second pixel reference value is determined based on an image width of the input image.
13. The image processing method according to claim 1, wherein the performing segmentation processing on the plurality of region labeling boxes by the at least one segmentation line to form the plurality of cells comprises:
and in response to the input image being subjected to the table line detection processing and at least one table line segment being detected, correcting the at least one dividing line based on the at least one table line segment, and dividing the plurality of area labeling frames based on the at least one dividing line after the correction processing to obtain the plurality of cells.
14. The image processing method according to claim 1, wherein generating the cell table corresponding to the table region based on the plurality of cells includes:
merging and/or dividing the plurality of cells to obtain a plurality of target cells based on the plurality of cells, wherein the cell table comprises the plurality of target cells.
15. The image processing method of claim 14, wherein merging and/or segmenting the plurality of cells to obtain the plurality of target cells based on the plurality of cells comprises:
responding to the situation that the at least one dividing line penetrates any region marking frame, and determining whether the any region marking frame needs to be split;
responding to the requirement of splitting processing of any region labeling box, splitting the region labeling box into a plurality of splitting labeling boxes, and taking a cell to which each splitting labeling box in the splitting labeling boxes belongs as one target cell;
and in response to that any region labeling frame does not need to be split, merging the multiple cells occupied by any region labeling frame to obtain one target cell.
16. The image processing method according to claim 14 or 15, further comprising:
identifying an object included in each of the plurality of object regions; and
and correspondingly filling the objects included in the object areas into the target cells of the cell table respectively to generate an object table.
17. The image processing method according to claim 1, wherein performing table line detection processing on the input image to determine whether the table area includes a wired table comprises:
under the condition that the input image is subjected to table line detection processing and one or more table line segments are obtained, judging whether the table area comprises a wired table or not based on the one or more table line segments;
in a case where form line detection processing is performed on the input image and it is detected that the input image does not have a form line segment, it is determined that the form area does not include a wired form.
18. The image processing method of claim 17, wherein performing form line detection processing on the input image to obtain the one or more form line segments comprises:
performing line segment detection on the input image to obtain a plurality of detection line segments;
merging the detection line segments to redraw the detection line segments to obtain a plurality of first intermediate table line segments;
respectively expanding the plurality of first middle table line segments to obtain a plurality of second middle table line segments;
deleting a second intermediate table line segment of the plurality of second intermediate table line segments, which is located in any one of the plurality of object area frames, and taking the remaining second intermediate table line segments of the plurality of second intermediate table line segments as a plurality of third intermediate table line segments;
performing the merging processing on the third intermediate table line segments to obtain fourth intermediate table line segments; and
and respectively performing expansion processing on the plurality of fourth intermediate table line segments to obtain one or more fifth intermediate table line segments, and taking the one or more fifth intermediate table line segments as the one or more table line segments.
19. The image processing method according to claim 18, wherein the merging process includes: for a first segment to be merged and a second segment to be merged, in response to that the difference between the slope of the first segment to be merged and the slope of the second segment to be merged is smaller than a slope threshold and the distance between the end point of the first segment to be merged close to the second segment to be merged and the end point of the second segment to be merged close to the first segment to be merged is smaller than or equal to a distance threshold, merging the first segment to be merged and the second segment to be merged,
the first line segment to be merged and the second line segment to be merged are any two detection line segments of the plurality of detection line segments, or the first line segment to be merged and the second line segment to be merged are any two third middle table line segments of the plurality of third middle table line segments.
20. The image processing method of claim 17, wherein determining whether the table region includes a wired table based on the one or more table line segments comprises:
responsive to obtaining the one table line segment, determining that the table region does not include a wired table;
in response to obtaining the plurality of table segments:
determining intersections between the plurality of table line segments;
determining that the table region includes a wired table in response to the number of intersections being greater than or equal to a second reference value; and
determining that the table region does not include a wired table in response to the number of intersections being less than the second reference value.
21. The image processing method of claim 20, wherein determining the intersection between the plurality of table line segments comprises:
dividing the plurality of table line segments into a plurality of first table line segments and a plurality of second table line segments, wherein an included angle between each first table line segment and a third direction is in a first angle range, an included angle between each first table line segment and a fourth direction is in a second angle range, an included angle between each second table line segment and the third direction is in the second angle range, an included angle between each first table line segment and the fourth direction is in the first angle range, and the third direction and the fourth direction are perpendicular to each other;
dividing the plurality of first table line segments into a plurality of first line segment rows and marking the line number of the first line segment row to which each first table line segment in the plurality of first table line segments belongs, wherein each first line segment row comprises at least one first table line segment arranged along the third direction;
dividing the plurality of second table line segments into a plurality of second line segment columns and marking column numbers of the second line segment columns to which each second table line segment in the plurality of second table line segments belongs, wherein each second line segment column comprises at least one second table line segment arranged along the fourth direction; and
and identifying a plurality of intersection points between the plurality of first table line segments and the plurality of second table line segments, and determining coordinates of the plurality of intersection points, wherein the coordinates of any one of the plurality of intersection points comprise a row number corresponding to the first table line segment and a column number corresponding to the second table line segment which intersect to form the any one intersection point.
22. The image processing method of claim 21, wherein the first angular range is 0 ° to 45 °, and the second angular range is 45 ° to 90 °.
23. The image processing method according to claim 21, wherein the second reference value is a larger value of the number of the plurality of first line segment rows and the number of the plurality of second line segment columns.
24. The image processing method of claim 21, further comprising:
in response to the table area including a wired table:
generating a cell table corresponding to the table region based on the plurality of table line segments.
25. The image processing method of claim 24, wherein generating a cell table corresponding to the table region based on the plurality of table line segments comprises:
determining each cell in the cell table based on the plurality of intersection points, wherein a vertex of each cell in the cell table is comprised of at least three intersection points of the plurality of intersection points.
26. The image processing method of claim 25, wherein determining each cell in the cell table based on the plurality of intersection points comprises:
determining a current intersection point, wherein the current intersection point is any one of the plurality of intersection points;
determining a first current table line segment and a second current table line segment corresponding to the current intersection point based on the coordinates of the current intersection point,
the first current table line segment is any one first table line segment, and the second current table line segment is any one second table line segment;
determining a first intersection point adjacent to the current intersection point on the first current table line segment, and determining a second intersection point adjacent to the current intersection point on the second current table line segment; and
determining a cell based on the current intersection, the first intersection, and the second intersection.
27. An image processing apparatus comprising:
an image acquisition module configured to acquire an input image, wherein the input image includes a table region including a plurality of object regions, each of the plurality of object regions including at least one object;
a region identification processing module configured to perform region identification processing on the input image to obtain a plurality of object region frames corresponding to the object regions one to one and a table region frame corresponding to the table region;
a table line detection processing module configured to perform table line detection processing on the input image to determine whether the table area includes a wired table; and
a cell table generation module configured to, in response to the table region not including a wired table:
aligning the object region frames to obtain a plurality of region labeling frames which are in one-to-one correspondence with the object region frames;
determining at least one dividing line based on the object region frames, and dividing the region labeling frames through the at least one dividing line to form a plurality of cells;
based on the plurality of cells, a cell table corresponding to the table region is generated.
28. An electronic device comprising a processor and a memory,
wherein the memory is to store computer readable instructions;
the processor, when executing the computer readable instructions, performs the steps of the method of any of claims 1-26.
29. A non-transitory computer readable storage medium for non-transitory storage of computer readable instructions which, when executed by a processor, implement the steps of the method of any one of claims 1-26.
CN202110169261.6A 2021-02-07 2021-02-07 Image processing method and device, electronic equipment and storage medium Active CN112906532B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110169261.6A CN112906532B (en) 2021-02-07 2021-02-07 Image processing method and device, electronic equipment and storage medium
PCT/CN2022/073988 WO2022166707A1 (en) 2021-02-07 2022-01-26 Image processing method and apparatus, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110169261.6A CN112906532B (en) 2021-02-07 2021-02-07 Image processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112906532A true CN112906532A (en) 2021-06-04
CN112906532B CN112906532B (en) 2024-01-05

Family

ID=76123794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110169261.6A Active CN112906532B (en) 2021-02-07 2021-02-07 Image processing method and device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112906532B (en)
WO (1) WO2022166707A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657274A (en) * 2021-08-17 2021-11-16 北京百度网讯科技有限公司 Table generation method and device, electronic equipment, storage medium and product
WO2022166707A1 (en) * 2021-02-07 2022-08-11 杭州睿胜软件有限公司 Image processing method and apparatus, electronic device, and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774584A (en) * 1993-01-07 1998-06-30 Canon Kk Method and apparatus for identifying table areas in documents
CN109948507A (en) * 2019-03-14 2019-06-28 北京百度网讯科技有限公司 Method and apparatus for detecting table
CN110008923A (en) * 2019-04-11 2019-07-12 网易有道信息技术(北京)有限公司 Image processing method and training method and device, calculate equipment at medium
CN111160234A (en) * 2019-12-27 2020-05-15 掌阅科技股份有限公司 Table recognition method, electronic device and computer storage medium
CN111325110A (en) * 2020-01-22 2020-06-23 平安科技(深圳)有限公司 Form format recovery method and device based on OCR and storage medium
CN111368744A (en) * 2020-03-05 2020-07-03 中国工商银行股份有限公司 Method and device for identifying unstructured table in picture
CN111382717A (en) * 2020-03-17 2020-07-07 腾讯科技(深圳)有限公司 Table identification method and device and computer readable storage medium
CN112149561A (en) * 2020-09-23 2020-12-29 杭州睿琪软件有限公司 Image processing method and apparatus, electronic device, and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446264B (en) * 2018-03-26 2022-02-15 阿博茨德(北京)科技有限公司 Method and device for analyzing table vector in PDF document
CN109635268B (en) * 2018-12-29 2023-05-05 南京吾道知信信息技术有限公司 Method for extracting form information in PDF file
CN112906532B (en) * 2021-02-07 2024-01-05 杭州睿胜软件有限公司 Image processing method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774584A (en) * 1993-01-07 1998-06-30 Canon Kk Method and apparatus for identifying table areas in documents
CN109948507A (en) * 2019-03-14 2019-06-28 北京百度网讯科技有限公司 Method and apparatus for detecting table
CN110008923A (en) * 2019-04-11 2019-07-12 网易有道信息技术(北京)有限公司 Image processing method and training method and device, calculate equipment at medium
CN111160234A (en) * 2019-12-27 2020-05-15 掌阅科技股份有限公司 Table recognition method, electronic device and computer storage medium
CN111325110A (en) * 2020-01-22 2020-06-23 平安科技(深圳)有限公司 Form format recovery method and device based on OCR and storage medium
CN111368744A (en) * 2020-03-05 2020-07-03 中国工商银行股份有限公司 Method and device for identifying unstructured table in picture
CN111382717A (en) * 2020-03-17 2020-07-07 腾讯科技(深圳)有限公司 Table identification method and device and computer readable storage medium
CN112149561A (en) * 2020-09-23 2020-12-29 杭州睿琪软件有限公司 Image processing method and apparatus, electronic device, and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022166707A1 (en) * 2021-02-07 2022-08-11 杭州睿胜软件有限公司 Image processing method and apparatus, electronic device, and storage medium
CN113657274A (en) * 2021-08-17 2021-11-16 北京百度网讯科技有限公司 Table generation method and device, electronic equipment, storage medium and product
CN113657274B (en) * 2021-08-17 2022-09-20 北京百度网讯科技有限公司 Table generation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2022166707A1 (en) 2022-08-11
CN112906532B (en) 2024-01-05

Similar Documents

Publication Publication Date Title
CN110390269B (en) PDF document table extraction method, device, equipment and computer readable storage medium
CN110659647B (en) Seal image identification method and device, intelligent invoice identification equipment and storage medium
US20230222631A1 (en) Method and device for removing handwritten content from text image, and storage medium
CN107656922B (en) Translation method, translation device, translation terminal and storage medium
WO2020140698A1 (en) Table data acquisition method and apparatus, and server
CN112926421B (en) Image processing method and device, electronic equipment and storage medium
US9959475B2 (en) Table data recovering in case of image distortion
US12039763B2 (en) Image processing method and device, electronic apparatus and storage medium
CN113486828B (en) Image processing method, device, equipment and storage medium
US11823358B2 (en) Handwritten content removing method and device and storage medium
US8515176B1 (en) Identification of text-block frames
EP3940589B1 (en) Layout analysis method, electronic device and computer program product
WO2022166707A1 (en) Image processing method and apparatus, electronic device, and storage medium
CN109598185B (en) Image recognition translation method, device and equipment and readable storage medium
CN111368638A (en) Spreadsheet creation method and device, computer equipment and storage medium
US20180082456A1 (en) Image viewpoint transformation apparatus and method
US20230101426A1 (en) Method and apparatus for recognizing text, storage medium, and electronic device
CN107679442A (en) Method, apparatus, computer equipment and the storage medium of document Data Enter
US9734132B1 (en) Alignment and reflow of displayed character images
CN112580499A (en) Text recognition method, device, equipment and storage medium
US9047528B1 (en) Identifying characters in grid-based text
CN113436222A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN116402020A (en) Signature imaging processing method, system and storage medium based on OFD document
CN110245570B (en) Scanned text segmentation method and device, computer equipment and storage medium
CN111832551A (en) Text image processing method and device, electronic scanning equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant