CN114417792A - Processing method and device of form image, electronic equipment and medium - Google Patents

Processing method and device of form image, electronic equipment and medium Download PDF

Info

Publication number
CN114417792A
CN114417792A CN202111668079.1A CN202111668079A CN114417792A CN 114417792 A CN114417792 A CN 114417792A CN 202111668079 A CN202111668079 A CN 202111668079A CN 114417792 A CN114417792 A CN 114417792A
Authority
CN
China
Prior art keywords
target
image
form image
frame
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111668079.1A
Other languages
Chinese (zh)
Inventor
段纪伟
黄旭进
张治强
侯冰基
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Wuhan Kingsoft Office Software Co Ltd
Original Assignee
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Wuhan Kingsoft Office Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Office Software Inc, Zhuhai Kingsoft Office Software Co Ltd, Wuhan Kingsoft Office Software Co Ltd filed Critical Beijing Kingsoft Office Software Inc
Priority to CN202111668079.1A priority Critical patent/CN114417792A/en
Publication of CN114417792A publication Critical patent/CN114417792A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Input (AREA)

Abstract

The invention provides a processing method, a processing device, electronic equipment and a medium of a form image, wherein the method comprises the following steps: determining that a target form image is a borderless form, processing the target form image through a form identification structure to obtain a first frame, and adding the content in the target form image into the first frame. The processing method of the form image provided by the invention can quickly and effectively obtain the first frame, accurately realize the conversion of the form image into the editable form, and obtain more comprehensive and clear form information based on the form image, thereby improving the processing speed of the form image and improving the user experience.

Description

Processing method and device of form image, electronic equipment and medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for processing a form image, an electronic device, and a medium.
Background
With the continuous development of office software, the requirements of people on office software processing are higher and higher, and people hope to meet different application scenes under the condition of meeting normal office requirements.
Currently, in practical applications, a user may only obtain a form image, such as a form image searched from the internet, and an editable form cannot be obtained, and at this time, the form image needs to be converted into an editable form, and the form needs to be processed accordingly.
In the prior art, when a form image is converted into an editable form document, form lines are divided in an image dividing mode to obtain horizontal lines and vertical lines of a form, and the editable form is generated.
Disclosure of Invention
Based on the problems in the prior art, the invention provides a method, a device, an electronic device and a medium for generating table lines of a borderless table, which can meet the condition of any form of table image and have the advantages of improving the processing effect of the table image and improving the user experience.
In a first aspect, the present invention provides a method for processing a form image, including:
determining the target table image as a borderless table;
processing the target form image through a form identification structure to obtain a first frame;
and adding the content in the target form image into the first frame.
Further, according to the processing method of the form image provided by the present invention, the method further includes:
the table identification structure is a line, a sliding window or a graph convolution neural network model.
Further, according to the processing method of the form image provided by the present invention, the method further includes:
in a case that the table identification structure is the line or the sliding window, the processing the target table image through the table identification structure to obtain a first frame includes:
moving the table recognition structure along a first direction of the target table image, and recording a first area which does not cover the content;
and/or moving the table identification structure along a second direction of the target table image, and recording a second area which does not cover the content;
and generating a table line in at least one of the first area and the second area to obtain a first frame.
Further, according to the processing method of the form image provided by the present invention, the method further includes:
in the case that the table identification structure is the line, the table identification structure comprises a first direction line and a second direction line;
the processing the target form image through the form recognition structure to obtain a first frame, including:
translating the first direction line and the second direction line in the target form image based on the content distribution condition in the target form image, and determining a non-intersection area in the target form image;
and determining a target point in the non-intersection area, and connecting the target point to obtain a first frame.
Further, according to the processing method of the form image provided by the present invention, the determining a non-intersection region in the target form image by translating the first direction line and the second direction line in the target form image based on the content distribution in the target form image includes:
performing binarization processing on the target table image to obtain a first binary image; wherein, a first value in the first binary image is a pixel corresponding to a content distribution area in the target form image, and a second value in the first binary image is a pixel corresponding to a non-content distribution area in the target form image;
processing the pixels in the first binary image according to a first direction to obtain a first direction line;
processing the pixels in the first binary image according to a second direction to obtain a second direction line;
and intersecting the first direction lines and the second direction lines, and taking the intersected area which is not covered by the first direction lines or the second direction lines as a non-intersecting area in the target form image.
Further, according to the processing method of the form image provided by the present invention, the determining the target point in the non-intersecting area includes:
determining the contour of the non-intersection region through contour searching;
and taking a coordinate point of the non-intersection area as the target point according to the contour of the non-intersection area.
Further, according to the processing method of the form image provided by the present invention, the method further includes:
in the case that the table identification structure is the sliding window, the table identification structure comprises a first direction sliding window and a second direction sliding window;
the processing the target form image through the form recognition structure to obtain a first frame, including:
translating the first-direction sliding window in the target form image based on the content distribution condition in the target form image to obtain a first-direction frame body; translating the second-direction sliding window in the target form image to obtain a second-direction frame body;
and setting a first direction table line based on the first direction frame body, and setting a second direction table line based on the second direction frame body to obtain a first side frame.
Further, according to the processing method of the form image provided by the present invention, the method further includes:
and under the condition that the table identification structure is the graph convolution neural network model, determining a first frame according to the position relation between a first text box and a second text box in the target table image.
Further, according to the method for processing a table image provided by the present invention, before the processing the target table image by the table recognition structure to obtain the first frame, the method further includes:
carrying out binarization processing on the target table image to obtain a second binary image; wherein, the second value in the second two-value image is the pixel corresponding to the text character in the target form image, and the first value in the second two-value image is the pixel except the text character in the target form image;
and setting a table identification structure for the second binary image.
Further, according to the processing method of the form image provided by the present invention, the adding the content in the target form image into the first border frame includes:
performing text detection on the target form image, and obtaining a text box in the target form image according to a text detection result;
setting a text box in the first frame;
filling the content in the text box in the target form image into the text box of the first frame.
In a second aspect, the present invention also provides a form image processing apparatus, including:
the determining module is used for determining the target form image as a borderless form;
the processing module is used for processing the target form image through the form identification structure to obtain a first frame;
and the adding module is used for adding the content in the target form image into the first frame.
In a third aspect, the present invention also provides an electronic device, including: a processor, a memory, and a bus, wherein,
the processor and the memory are communicated with each other through the bus;
the memory stores program instructions executable by the processor to invoke steps of a method of processing the form image as described in any one of the above.
In a fourth aspect, the present invention also provides a computer-readable storage medium storing computer instructions for causing the computer to perform the steps of the method of processing a form image as described in any one of the above.
The invention provides a processing method, a processing device, electronic equipment and a medium of a form image, wherein the method comprises the following steps: determining the target table image as a borderless table; processing the target form image through a form identification structure to obtain a first frame; and adding the content in the target form image into the first frame. The processing method of the form image provided by the invention can quickly and effectively obtain the first frame, accurately realize the conversion of the form image into an editable form, and obtain more comprehensive and clear form information based on the form image, thereby improving the processing speed of the form image and improving the user experience.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a table line generating method for a borderless table according to the present invention;
FIG. 2 is a diagram of an exemplary borderless table provided by the present invention;
FIG. 3 is an exemplary graph of a first binary map provided by the present invention;
FIG. 4 is an exemplary diagram of a vertical form line area provided by the present invention;
FIG. 5 is an exemplary diagram of a horizontal form line region provided by the present invention;
FIG. 6 is an exemplary diagram of the invention after the horizontal form line regions and vertical form line regions have been superimposed;
FIG. 7 is an exemplary diagram of determining a center point of a non-intersecting region according to the present invention;
FIG. 8 is an exemplary diagram of the present invention in which vertically upper center points are connected;
FIG. 9 is an exemplary view of the present invention in which the lateral center points are connected;
FIG. 10 is an exemplary view of the invention with the center points connected both horizontally and vertically;
FIG. 11 is a diagram illustrating an exemplary text detection result according to the present invention;
FIG. 12 is an exemplary graph of a second binary map provided by the present invention;
FIG. 13 is an exemplary graph after morphological processing provided by the present invention;
FIG. 14 is an exemplary diagram of the present invention after text box splitting;
FIG. 15 is a schematic structural diagram of a table line generating apparatus for borderless tables according to the present invention;
fig. 16 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a method for processing a form image according to an embodiment of the present invention, and as shown in fig. 1, the method for processing a form image according to the present invention includes the following steps:
step 101: and determining the target table image as a borderless table.
In this embodiment, the target form image needs to be determined, and when it is determined that the target form image is a borderless form, the form line in the borderless form is processed by using the form image processing method provided by the present invention, so as to realize the form line generation of the borderless form.
The target form image is a form image to be processed by the method of the present invention. As the name implies, the table image is a table having an image format, and for example, the table is saved in an image format of jpg, png, or the like. The borderless table refers to a table in which table lines of all cells do not exist, and as shown in fig. 2, table lines of all cells do not exist and belong to the borderless table. The borderless table may be a separate table image or a table image formed by converting a table in a PDF document, and is not particularly limited herein.
It should be noted that in the borderless table shown in fig. 2, each cell is filled with text content, but in other embodiments, some cells in the borderless table may be blank. For such borderless tables, the method of the present invention may also be used to generate the table lines.
Step 102: and processing the target form image through the form recognition structure to obtain a first frame.
In this embodiment, the first frame is a frame formed of a table line obtained by performing reduction processing on a borderless table. In this embodiment, the target form image needs to be processed by the form recognition structure to obtain the first frame. The specific processing flow may be: the table recognition result is used to process the target table image, and then the table line corresponding to the borderless table in the target table image is determined according to the processing result to obtain the first frame.
The first frame is a frame formed of simple table lines, which has no text data, and is recognized and determined on the target table image, so that the table information in the target table image can be accurately restored.
Step 103: and adding the content in the target form image into the first frame.
In this embodiment, the content in the target form image needs to be added to the first frame, so as to realize the conversion between the target form image and the editable form. And the first frame is a table frame with table lines, text recognition is carried out on each cell in the target table image, and the text recognition result is filled into the corresponding cell of the first frame to obtain an editable table.
After the conversion process from the target form image to the editable form having the form lines is successfully performed, the text result recognized by each cell in the target form image needs to be filled in the cell corresponding to the first frame, so as to restore the data in the target form image.
According to the processing method of the table image, the target table image is determined to be a frame-free table, the target table image is processed through the table identification structure to obtain a first frame, and then the content in the target table image is added into the first frame. The processing method of the form image provided by the invention can quickly and effectively obtain the first frame, accurately realize the conversion of the form image to the editable form, and obtain more comprehensive and clear form information based on the form image, thereby improving the processing speed of the form image and improving the user experience.
Based on any one of the above embodiments, in this embodiment, the method further includes: the table identification structure is a line, a sliding window or a graph convolution neural network model.
In this embodiment, the table recognition structure is a line, a sliding window, or a graph convolution neural network model, where a line refers to processing a target table image in a manner of a horizontal line or a vertical line, a sliding window refers to processing a target table image in a manner of a closed graph of different shapes, and a graph convolution neural network model performs recognition processing on a target table image through a processing model obtained after training based on sample data.
It should be noted that the input of the graph-convolution neural network model is the position information of each text box (e.g. coordinate information of four points of the text box) and the target table image, and the output is the information of whether a certain text box and its adjacent text boxes are in the same row or column, which can be described in detail in the following embodiments.
According to the processing method of the form image, provided by the invention, the form identification structure can be set as a line, a sliding window or a graph convolution neural network model, so that various processing on the target form image can be realized, the processing effect of the form image is ensured, and the user experience is improved.
Based on any one of the above embodiments, in this embodiment, the method further includes: under the condition that the table identification structure is a line or a sliding window, the table identification structure is used for processing the target table image to obtain a first frame, and the method comprises the following steps:
moving the table recognition structure along a first direction of the target table image, and recording a first area of uncovered content;
and/or moving the table identification structure along a second direction of the target table image, and recording a second area of the uncovered content;
and generating a table line in at least one of the first area and the second area to obtain a first frame.
In the embodiment, the table identification structure needs to be moved along the first direction of the target table image, and a first area of uncovered content is recorded; the table recognition structure is moved along a second direction of the target table image, then a second area of the uncovered content is recorded, and a table line is generated from a determined area in the first area or the second area to obtain a first frame. It should be noted that the first direction refers to a longitudinal direction moving along the target form image, that is, moving from the top of the target form image to the bottom, or moving from the bottom to the top; the second direction refers to a lateral direction moving along the target form image, that is, moving from the left of the target form image to the right, or moving from the right to the left; accordingly, the first region is a region shifted in the longitudinal direction, and the second region is a region shifted in the transverse direction.
It should be noted that, in other embodiments, the first direction and the second direction may also be a circular arc direction or a diameter direction in a sector, may also be a direction of 45 degrees, and the like, and are not limited to the transverse direction and the longitudinal direction, and may implement movement in any direction according to the actual needs of the user, and are not limited specifically herein.
In this embodiment, the table identification structure moving along the longitudinal direction or the transverse direction of the target table image may be a line or a sliding window, and the area to be recorded refers to an area not covered with the content, the area not covered with the content refers to a blank area, and an area where no text data exists.
According to the processing method of the table image provided by the invention, the table identification structure is moved along the first direction of the target table image to record the first area without covering the content, and/or the table identification structure is moved along the second direction of the target table image to record the second area without covering the content, and then the table line is generated in at least one area of the first area and the second area to obtain the first frame, so that the purpose of generating the table line in the areas of the target table image in different directions to obtain the first frame can be realized, and the processing speed of the table image is improved.
Based on any one of the above embodiments, in this embodiment, the method further includes: under the condition that the table identification structure is a line, the table identification structure comprises a first direction line and a second direction line;
processing the target form image through the form recognition structure to obtain a first frame, comprising:
translating the first direction line and the second direction line in the target form image based on the content distribution condition in the target form image, and determining a non-intersection area in the target form image;
and determining target points in each non-intersecting area, and connecting the target points to obtain a first frame.
In this embodiment, the first direction lines are transverse lines, and the second direction lines are longitudinal lines. In this embodiment, translation processing needs to be performed on the obtained horizontal lines and the obtained longitudinal lines based on the content distribution in the target form image, non-intersecting regions in the target form image are determined, then target points are determined in each non-intersecting region, and all the target points obtained through connection can obtain the first frame framed frame. The target point may be a center point, or other two-thirds points, and the like, and the center point is preferred in this embodiment. The non-intersecting region is a blank region, and is a region where no text content is distributed and which is not covered by the horizontal lines or the vertical lines.
It should be noted that, when the form identification structure is a line, the text content in the target form image needs to be detected to obtain a plurality of text boxes, and then the text boxes are correspondingly processed to obtain a target point, so as to determine the first frame.
It should be noted that, in this embodiment, a plurality of text boxes are obtained by performing text detection processing on a target form image, then a longitudinal form line area and a transverse form line area in the target form image need to be determined according to the plurality of text boxes, then the obtained longitudinal form line area and transverse form line area are subjected to translation processing to obtain a form line area after translation, a plurality of non-intersecting areas are determined from the form line area, then center points of the plurality of non-intersecting areas are determined, and a first frame corresponding to the target form image is generated according to the plurality of center points.
It should be noted that, in this embodiment, the vertical form line area refers to an area formed by moving the form recognition structure along the target form image in the vertical direction, that is, the second area in the above embodiment, and the horizontal form line area refers to an area formed by moving the form recognition structure along the target form image in the horizontal direction, that is, the first area in the above embodiment; the plurality of non-intersecting regions refer to regions not covered with the transverse form line regions and the longitudinal form line regions, and regions not overlapped.
According to the processing method of the form image, provided by the invention, based on the text distribution condition in the target form image, the non-intersecting areas in the target form image are determined by the first direction lines and the second direction lines obtained by translation in the target form image, then the target points are determined in the non-intersecting areas, and the target points are connected to obtain the first frame.
Based on any one of the above embodiments, in this embodiment, translating the first direction line and the second direction line in the target form image based on the content distribution in the target form image, and determining the non-intersection region in the target form image includes:
performing binarization processing on the target table image to obtain a first binary image; the first value in the first binary image is a pixel corresponding to a content distribution area in the target table image, and the second value in the first binary image is a pixel corresponding to a non-content distribution area in the target table image;
processing pixels in the first binary image according to a first direction to obtain a first direction line;
processing the pixels in the first binary image according to a second direction to obtain a second direction line;
and intersecting the first direction lines and the second direction lines, and taking the intersected area which is not covered by the first direction lines or the second direction lines as a non-intersected area in the target form image.
In this embodiment, binarization processing needs to be performed on the target form image after the text box is determined, so as to obtain a first binary map in which pixels corresponding to the text box are a first value and pixels other than the text box are a second value, such as the first binary map shown in fig. 3. The first value refers to a numerical value of a pixel 0, the second value refers to a numerical value of a pixel 255, and the first binary image refers to a binary image generated by performing binarization processing on a text box area.
In this embodiment, the first direction may refer to a longitudinal direction of the target form image, for determining a longitudinal line, a pixel column with a certain width needs to be selected, the pixel column and the text box are subjected to closing operation, and pixel points corresponding to the text box in the longitudinal direction are connected to form the longitudinal line, so as to obtain a longitudinal form line region, such as the longitudinal form line region shown in fig. 4, where a pixel value corresponding to the longitudinal form line is 0. The closing operation is a morphological calculation method, and specifically is a calculation method of expansion first and then erosion, where the expansion is a process of combining all background points in contact with an object into the object and expanding the boundary to the outside to fill up a hole in the object. Erosion is a process of eliminating boundary points, shrinking the boundaries inward, and can be used to eliminate small, meaningless objects.
In this embodiment, the second direction may refer to a horizontal direction of the target form image, for the determination of the horizontal lines, the horizontal lines are obtained by selecting pixel rows with a certain width and performing a closing operation on the pixel rows and the text box, and all the horizontal lines form a horizontal form area, such as the horizontal form area shown in fig. 5.
It should be noted that, in this embodiment, the obtained horizontal table line area and vertical table line area need to be subjected to translation processing to obtain a table line area after translation. The table line areas after translation may be as shown in fig. 6, and it may be determined from fig. 6 that the areas which are not overlapped and not covered by the vertical table line areas or the horizontal table line areas are determined as non-intersecting areas, that is, the blank areas shown in fig. 6, that is, the blank areas are non-intersecting areas.
According to the processing method of the table image, the target table image is subjected to binarization processing to obtain the first binary image, then the pixels in the first binary image are processed according to the first direction to obtain the first direction lines, the pixels in the first binary image are processed according to the second direction to obtain the second direction lines, the first direction lines and the second direction lines are translated, and the regions which are not crossed and not covered by the first direction lines or the second direction lines are used as non-crossed regions in the target table image, so that the processing speed of the table image is improved.
Based on any one of the above embodiments, in the present embodiment, determining the target point in each non-intersecting area includes:
determining the outline of each non-intersection area through outline searching;
and determining target points in the non-intersection areas according to the outlines of the non-intersection areas.
In this embodiment, the coordinates of the target point of each non-intersecting area need to be determined by means of contour finding. Determining the outlines of a plurality of non-intersecting areas, then determining the coordinates of the outline target points of the non-intersecting areas according to the outlines of the non-intersecting areas, and obtaining a first frame according to the coordinate information of the target points. It should be noted that, the target point in this embodiment is set as a central point, and since the text box is generally a quadrilateral, the central point of the text box may be determined by determining coordinates of four vertices, as shown in fig. 7, where each small circle is the determined central point of each text box.
For example, a non-intersecting area is obtained, coordinates of four vertices of the non-intersecting area, which are respectively (2, 6) and (2, 9), (8, 6) and (8, 9), are found by means of contour search, coordinates of a central point, which is (5, 7.5), are obtained by calculation, and then the determined central points are connected by means of the above search, so that a table line corresponding to the target table image is obtained.
It should be noted that, connecting the central points in the longitudinal direction may obtain the area formed by each vertical line as shown in fig. 8, connecting a plurality of central points in the transverse direction may obtain the area formed by each transverse line as shown in fig. 9, and translating the obtained longitudinal connecting line area and transverse connecting line area to obtain the table line as shown in fig. 10, that is, the first frame of the borderless table is obtained.
In the present embodiment, the first frame is generated on the original target form image. In other embodiments, mapping points connecting a plurality of center points in a preset area according to the determined coordinates of the center points in the longitudinal direction and/or the transverse direction may be further connected to generate a table line corresponding to the target table image.
According to the processing method of the table image provided by the invention, the contour of each non-intersection area is determined through contour searching, and then the target point in each non-intersection area is determined according to the contour of each non-intersection area. The first frame of the target form image can be generated in a central point mode, and the accuracy of generating the first frame is improved.
Based on any of the above embodiments, in this embodiment, in the case that the table identification structure is a sliding window, the table identification structure includes a first direction sliding window and a second direction sliding window;
processing the target form image through the form recognition structure to obtain a first frame, comprising:
translating the first-direction sliding window in the target form image based on the content distribution condition in the target form image to obtain a first-direction frame body; translating the sliding window in the second direction in the target form image to obtain a frame in the second direction;
and setting a first direction table line based on the first direction frame body, and setting a second direction table line based on the second direction frame body to obtain a first frame.
In this embodiment, it is necessary to translate the first sliding window and the second sliding window based on the distribution of the text content in the target form image to obtain a first direction frame and a second direction frame, then set the first direction form line based on the first direction frame, set the second direction form line based on the second direction frame, and determine the first frame according to the form line obtained after translation.
In this embodiment, the first direction frame may be a frame along the longitudinal direction of the target form image, and may be a frame having any shape, such as a rectangle, a square, or the like; the second direction frame is a frame in the direction along the horizontal direction of the target form image, and can be a frame in any shape; in other embodiments, the first direction frame and the second direction frame may also be frames in any direction on the target form image, such as 45 degree direction frames, and may be specifically set according to the actual needs of the user, and are not specifically limited herein.
The first direction frame is a frame obtained by sliding along the first direction of the target form image according to the first direction sliding window; the second direction frame body is obtained by sliding along the second direction of the target form image according to the second sliding window; the first direction frame and the second direction frame refer to a blank area where no text data content exists in the present embodiment, that is, the first side frame obtained after the frame is translated according to the first direction frame and the second direction frame is an area where no text data content exists.
It should be noted that the first direction frame and the second direction frame obtained in this embodiment are capable of generating table lines, and the first direction table lines are provided based on the first direction frame, and the second direction table lines are provided based on the second direction frame, so that the first frame can be accurately obtained after the table lines of the first direction frame and the second direction frame are generated.
According to the processing method of the form image provided by the invention, based on the content distribution situation in the target form image, the first direction sliding window is translated in the target form image to obtain the first direction frame body, the second direction sliding window is translated in the target form image to obtain the second direction frame body, the first direction form line is arranged based on the first direction frame body, the second direction form line is arranged based on the second direction frame body to obtain the first frame body, the first frame body can be accurately obtained, and the processing speed of the form image can be improved.
Based on any one of the above embodiments, in this embodiment, based on the content distribution in the target form image, the first-direction sliding window is translated in the target form image, so as to obtain a first-direction frame; translating the second-direction sliding window in the target form image to obtain a second-direction frame body, wherein the second-direction frame body comprises:
in the process that the first direction sliding window slides along the direction vertical to the first direction, determining a sliding range, which is not intersected with the content, of the first direction sliding window according to the content distribution condition in the target form image, and determining a first direction frame body according to the shape of the first direction sliding window and the sliding range, which is not intersected with the content, of the first direction sliding window;
and in the process that the second direction sliding window slides along the direction vertical to the second direction, determining a sliding range, which is not intersected with the content, of the second direction sliding window according to the text distribution condition in the target form image, and determining a second direction frame body according to the shape of the second direction sliding window and the sliding range, which is not intersected with the content, of the second direction sliding window.
In this embodiment, it is necessary to set a first-direction sliding window on the target form image, determine a sliding range where the first-direction sliding window does not intersect with the text content according to the distribution of the text content in the target form image when the first-direction sliding window slides in a direction perpendicular to the first direction, and then determine a first-direction frame according to the shape of the first-direction sliding window and the sliding range. The first-direction sliding window is a sliding window set in the longitudinal direction of the target form image, and may be a rectangular sliding window or a square sliding window, which is not specifically limited herein.
It should be noted that, in this embodiment, the first direction is the longitudinal direction of the target form image, and when the first direction sliding window is set to slide along the direction perpendicular to the first direction, the determined sliding range that does not intersect with the text is the range area in the lateral direction, that is, the obtained first direction frame is the frame in the lateral direction. In other embodiments, the first-direction sliding window may slide along the first direction, and the obtained first-direction frame is a longitudinal frame, which may be specifically set according to the actual needs of the user, and is not specifically limited herein.
In this embodiment, it is further necessary to set a second direction sliding window on the target form image, determine a sliding range where the second direction sliding window does not intersect with the text content according to the distribution of the text content in the target form image when the second direction sliding window slides in a direction perpendicular to the second direction, and determine a second direction frame according to the shape of the second direction sliding window and the sliding range where the second sliding window does not intersect with the text content. The second-direction sliding window is a sliding window set in the transverse direction of the target form image, and may be a rectangular sliding window or a square sliding window, which is not specifically limited herein.
It should be noted that, in this embodiment, the second direction is the horizontal direction of the target form image, and when the second direction sliding window is set to slide along the direction perpendicular to the second direction, the determined sliding range that does not intersect with the text content is the range area in the vertical direction, that is, the obtained second direction frame is the frame in the vertical direction. In other embodiments, the second-direction sliding window may slide along the second direction, and the obtained second-direction frame is a horizontal frame, which may be specifically set according to the actual needs of the user, and is not specifically limited herein.
It should be noted that the setting of the first direction sliding window and the second direction sliding window may be calculated and designed by using a relatively mature sliding window algorithm in the prior art, and a specific setting method is not described in detail herein.
According to the processing method of the form image, provided by the invention, the sliding window is arranged on the target form image, then the sliding range which is not intersected with the text content is determined in the sliding process of the sliding window, and the first direction frame body and the second direction frame body are determined according to the shape and the sliding range of the sliding window, so that the operation is simple, and the processing speed of the form image is improved.
Based on any one of the embodiments, in this embodiment, in the case that the table identification structure is a atlas neural network model, the first frame is determined according to the text box in the target table image and the position relationship between the text boxes.
In this embodiment, when the table identification structure is a convolution neural network model, the first frame needs to be determined according to the position relationship between the text boxes in the target table image, and it should be noted that the position relationship between the text boxes can be determined according to the coordinate information of the center point of the text box.
It should be noted that, according to the output of the graph convolution neural network model provided in this embodiment, the text boxes in the same row and the text boxes in the same column can be determined, and the height of the cell can be determined according to the determined height value of the text boxes in the same row, and the width value of the text box in the same column determines the width of the cell, that is, the table line of the borderless table is drawn, which can be seen in the following embodiments.
According to the processing method of the table image, provided by the invention, under the condition that the table identification structure is the graph convolution neural network model, the first frame is determined according to the position relation between the text box and the text box in the target table image, and the position relation of each text box in the target table image can be accurately and quickly determined through the trained graph convolution neural network model, so that the first frame is obtained.
Based on any one of the above embodiments, in this embodiment, in a case where the table identification structure is a atlas neural network model, determining the first frame according to the position relationship between the text box and the text box in the target table image includes:
inputting the position information of a plurality of text boxes in the target form image and the target form image which are acquired in advance into a graph convolution neural network model which is trained in advance to obtain the position relation between any one text box in the plurality of text boxes and the adjacent text box; the graph convolution neural network model is obtained by training based on the position information of a plurality of text boxes in the sample table, the sample table image and the label information of the sample table;
determining text boxes of all lines and text boxes of all columns in the target form image according to the position relation between any text box in the text boxes and the adjacent text box;
and generating a first frame of the target form image according to the text boxes of all the lines and the text boxes of all the columns in the target form image.
In this embodiment, the obtained position information of the plurality of text boxes in the target form image and the target form image need to be input into a pre-trained convolution neural network model to obtain a position relationship between any text box and an adjacent text box, then, according to the obtained position relationship, text boxes of each row and each column in the target form image are determined, and a first frame corresponding to the target form image is generated. The graph convolution neural network model is an artificial intelligent model and is obtained by training based on position information of a plurality of text boxes in the sample table, the sample table image and label information of the sample table.
In this embodiment, the input of the atlas neural network model is the position information (e.g. coordinate information of four points of the text box) of each text box and the target table image, and the output is the information of whether a certain text box and its adjacent text boxes are in the same row or column.
For example, if the determined position information of the text box a is (1, 1), the position information of the text box adjacent to the determined position information is calculated and analyzed as follows: a1(0,1), a2(0,1), a3(1,0), a4(1, 0). Wherein, the first number in the position information represents whether the text box a is the same line or not, if 1 represents the same line, 0 represents different line, then the adjacent text boxes a1, a2 are in the same column with the text box a; the second number indicates whether it is the same column as textbox a, if 1 indicates the same column, 0 indicates a different column, the adjacent textboxes a1, a2 are the same column as textbox a, and the adjacent textboxes a3, a4 are the same row as textbox a. The setting may be specifically performed according to the actual needs of the user, and is not specifically limited herein.
It should be noted that, according to the output of the graph convolution neural network model provided in this embodiment, the text boxes in the same row and the text boxes in the same column can be determined, and the height of the cell can be determined according to the determined height value of the text boxes in the same row, and the width value of the text box in the same column determines the width of the cell, that is, the table line of the borderless table is obtained by drawing.
According to the processing method of the form image, the position information of a plurality of text boxes and the target form image are input into a graph convolution neural network model which is trained well in advance, the position relation between any text box and the adjacent text box is obtained, then the text box of each line and each column is determined according to the position relation, and the first frame corresponding to the target form image is generated according to the text box of each line and each column. The first frame corresponding to the target form image is generated according to the position relation of the text boxes, and the accuracy of the form line generation can be improved.
Based on any one of the above embodiments, in this embodiment, generating a first frame of the target form image according to the text box of each row and the text box of each column in the target form image includes:
setting the height value of a target line according to the height target value of each text box in the target line of the target form image; wherein the target row is any row of the target form image;
setting a width value of a target column according to a width target value of each text box in the target column of the target table image; wherein the target column is any column of the target form image;
drawing a table line of the target row according to the height value of the target row and the position of the target row in the target table image; and/or drawing the table line of the target column according to the width value of the target column and the position of the target column in the target table image to obtain a first frame of the target table image.
In this embodiment, it is necessary to set a height value of a target row according to a height target value of each text box in the target row, set a width value of a target column according to a width target value of each text box in the target column, draw a table line of a row corresponding to the target row according to the height value of the target row and position information of the target row in the target table image, draw a table line of a column corresponding to the target column according to the width value of the target column and position information of the target column in the target table image, and obtain a first frame from the table lines generated in each row and each column.
In this embodiment, the maximum height value of each text box in the target row is determined as the height value of the target row, or the maximum width corresponding to each text box in the target column is set as the width value of the target column, in other embodiments, the average height value of each text box in the target row may be selected as the height value of the target row, and the average width value of each text box in the target column may be selected as the width value of the target column. The specific user may also set according to actual needs, and is not specifically limited herein.
The position information of the target row or the target column may be coordinate information thereof, or may refer to a positional relationship with any other adjacent row or any adjacent column, and may be specifically set according to actual needs, and is not particularly limited herein.
For example, if the height values of the text boxes in the target line are 4cm, 3cm and 5cm respectively, determining the maximum height value of the text boxes in the target line as the height value of the target line, that is, determining 5cm as the height value of the target line; similarly, the maximum width value in the target column is determined as the width value of the target column, and the height value or the width value is sequentially determined for other rows or columns by the above-mentioned method, which is not described in detail herein.
It should be noted that, the height value of the target row or the width value of the target column may also be determined by using the pixel of each text box as a measurement unit, and if the maximum pixel of the target text box in a certain target row is 1024, and the value corresponding to the pixel is 8.67cm through calculation and analysis, then 8.67cm corresponding to the pixel is determined as the height value of the target row. In other embodiments, other numerical determination methods may be used. The determination method may be selected according to the specific needs of the user, and is not limited in detail herein.
According to the processing method of the table image provided by the invention, the height target value in each text box in the target row is determined as the height value of the target row, the width target value of each text box in the target column is determined as the width value of the target column, then the table line corresponding to the target row or the target column is drawn according to the height value of the target row, the position of the target row in the target table image, the width value of the target column and the position of the target column in the target table image, and the first frame is obtained, so that the accuracy of generating the table line can be ensured, and the generated table line is complete and clear.
Based on any of the above embodiments, in this embodiment, before the target form image is processed by the form recognition structure to obtain the first frame, the method further includes:
carrying out binarization processing on the target form image to obtain a second binary image; and the second value in the second two-value image is a pixel corresponding to the text character in the target form image, and the first value in the second two-value image is a pixel except the text character in the target form image.
And setting a table identification structure for the second binary image.
In this embodiment, binarization processing needs to be performed on the target form image containing the text detection result, so as to obtain a second binary image in which pixels corresponding to the text characters are second values and pixels except the text characters are first values, where the first values refer to numerical values of which the pixels are 0, and the second values refer to numerical values of which the pixels are 255. And then carrying out morphological processing on the obtained second binary image to obtain a morphological processing result, determining that a text box with an interval in the middle exists in the processing result, and carrying out splitting processing on the text box to obtain a plurality of text boxes in the target form image.
It should be noted that in this embodiment, a table identification structure needs to be further set for the obtained second binary image, where the table identification structure may be a line, a frame, or a graph convolution neural network model, and may be specifically set according to the needs of a user, and is not limited specifically herein.
According to the processing method of the form image, provided by the invention, binarization processing is carried out on a target form image containing a text detection result, so that a first-value pixel corresponding to a text character is obtained, a second-value image with a second value pixel except the text character in the target form image is obtained, and then a form identification structure is set for the second-value image, so that support is provided for determining a first frame according to the form identification structure.
Based on any one of the above embodiments, in this embodiment, adding the content in the target form image to the first frame includes:
performing text detection on the target form image, and obtaining a text box in the target form image according to a text detection result;
setting a text box in the first frame;
filling the content in the text box in the target form image into the text box of the first frame.
In this embodiment, text detection needs to be performed on the target form image, then the text boxes in the target form image are obtained according to the text detection result, a plurality of text boxes are arranged in the first frame, and data content existing in the plurality of text boxes in the target form image is filled in corresponding text boxes in the first frame, so that data restoration in the target form image is realized.
In this embodiment, as shown in fig. 11, a text detection unit performs text detection processing on a borderless table, identifies a plurality of text contents, and determines a plurality of text boxes in a target table image according to a file detection result, where a rectangular box represents one text box and a plurality of text boxes exist in fig. 11.
Specifically, after the text in the target form image is detected, binarization and morphological processing are performed on the text detection result, so as to obtain the text box in the target form image. The Image Binarization, that is, Image binarisation, is a process of setting the gray value of a pixel point on an Image to be 0 or 255, and actually, the whole Image shows an obvious black-and-white effect, and fig. 12 is a schematic diagram of a plurality of text boxes obtained by the text detection after being subjected to Binarization processing. Morphology, mathematical Morphology, is one of the most widely applied techniques in image processing, and is mainly used for extracting image components meaningful for expressing and describing region shapes from images, so that the most essential shape features of a target object can be captured by subsequent recognition work, wherein the most essential is the most distinguishing capability, i.e., most discriminating, such as boundaries, connected regions, and the like, as shown in fig. 13, and fig. 13 is a schematic diagram obtained by performing Morphology processing on a table image after binarization processing. The specific processing procedures are shown in the following examples, which are not described in detail herein.
It should be noted that, in this embodiment, after a plurality of text boxes in the target form image are determined, the text box with an interval in the middle is split, for example, fig. 14 is a schematic diagram obtained by splitting the text box with an interval in the middle, for example, a single text box "zhang san general manager 123456937" is split into three text boxes according to the existing interval, which are respectively "zhang san", "general manager", and "123456937".
In this embodiment, it is necessary to perform binarization processing on the multiple text boxes obtained after splitting, and confirm the horizontal table line regions and the vertical table line regions, and then determine the center points of the non-intersecting regions according to the determined table line regions, and connect the center points to determine the first frame corresponding to the target table image.
It should be noted that, in this embodiment, the horizontal table area and the vertical table area are determined, then the central point of the translated table line area is determined, and the table line corresponding to the target table image is generated by connecting the central points, which may be other generation manners in other embodiments, such as generating the table line based on the position relationship of the text box, or other generation manners, and is not limited specifically herein.
It should be noted that in this embodiment, a plurality of text boxes need to be set in the obtained first frame, and then text contents existing in the plurality of text boxes in the target form image are added into corresponding text boxes in the first frame, where a mature text recognition method in the prior art is adopted to perform recognition processing on the text contents in the plurality of text boxes in the target form image, and then the recognition result is filled into the corresponding text boxes in the first frame, and a specific process of the text recognition processing is not described in detail here.
According to the processing method of the form image, provided by the invention, the text detection is carried out on the target form image, the text box in the target form image is obtained according to the text detection result, then the text box is arranged in the first frame, and the content in the text box in the target form image is filled in the text box of the first frame. The method and the device can realize accurate reduction of the text data in the form image and improve the processing speed of the form image.
Based on any of the above embodiments, in this embodiment, determining that the target form image is a borderless form includes:
inputting the target form image into a pre-trained form classification model;
determining the target form image as a frame-free form according to the output result of the form classification model;
the table classification model is obtained by training based on the sample table and the class labels of the sample table.
In this embodiment, the target form image needs to be input into a pre-trained form classification model, and then the target form image is determined to be a borderless form according to an output result of the form classification model, where the form classification model is obtained by training based on the sample form and the class label of the sample form, the output result is two types, one type is a borderless form, and the other type is a framed form, and the target form image is determined to be a borderless form only when each cell in the target form image does not have a form line.
The table classification model may be a VGG model or a Resnet model, and is obtained by training a selected model, and is used to identify the type of the target table image. The VGG model is a model that can perform recognition and confirmation in multiple migration learning tasks. The CNN features are extracted from the image and the VGG model is the preferred algorithm. The Resnet model is a deep residual error network model, and the image recognition, voice recognition and other capabilities with high accuracy are realized through a deep network.
For example, if the acquired target form image 1 and the target form image 2 are input into a pre-trained form classification model, the model output result indicates that no border line exists in each cell in the target form image 1 and a form border line exists in part of the target form image 2, the target form image 1 is determined to be a borderless form, and the target form image 2 is determined to be a semi-border form.
According to the processing method of the table image, the type of the target table image can be determined through the pre-trained table classification model, the target table image is determined to be a borderless table according to the output result of the table classification model, the type of the target table image can be accurately identified, and the processing speed of the subsequent borderless table is improved.
Based on any of the above embodiments, in this embodiment, the content in the first bezel frame is in an editable state.
In this embodiment, the contents in the first frame obtained by the target form image conversion are in an editable state. According to the form image processing method, a target form image of which the type is a borderless form is converted to obtain a first frame with form lines, then each cell in the target form image needs to be subjected to text recognition, and the text recognition result is filled into the corresponding cell of the first frame to obtain an editable form.
According to the processing method of the form image, the obtained content in the first frame is in an editable state, the editing requirements of users are met, and the user experience is improved.
Based on any of the above embodiments, in this embodiment, first, a table classification model is used to confirm that a target form image is a borderless form, then, a text detection unit performs a text detection process on the borderless form to obtain a plurality of text boxes, performs a binarization process and a morphological process on the obtained target form image with the plurality of text boxes, and then, according to a processing result, performs a splitting process on the text boxes with intermediate intervals to obtain the split target form image with the plurality of text boxes.
In this embodiment, it is further required to perform table line generation processing according to a plurality of text boxes of the target table image, perform binarization processing first to obtain a first binary image, extract a longitudinal table line area in a vertical direction from the first binary image, and extract a transverse table line area in a horizontal direction, where the longitudinal table line and the transverse table line are given templates. Then, the two obtained regions are subjected to translation processing, and a white region on the translated region is set as a non-intersection region.
In this embodiment, the contour of each non-intersecting region is determined from the translated region by means of contour finding, the center point of the contour is calculated, the coordinate information of the non-intersecting region is determined, and the coordinate of the center point is determined according to the average value of the coordinate values of the vertices in each direction. And connecting the central points in the horizontal direction and the longitudinal direction to generate a table line of the target table image, and obtaining an editable first frame.
It should be noted that, in this embodiment, the table line may also be determined by using the position relationship between the text boxes. For example, a graph convolution network model is constructed according to the position relationship among the text boxes, the trained graph convolution network model is utilized to determine the text boxes in the same column and the text boxes in the same row, and after the determination, the table line completely surrounding all the text boxes in the target table image is determined based on the position relationship of each row and each column.
Fig. 15 is a processing apparatus of a form image according to the present invention, and as shown in fig. 15, the processing apparatus of a form image according to the present invention includes:
a determining module 1501, configured to determine that the target form image is a borderless form;
a processing module 1502, configured to process the target form image through a form identification structure to obtain a first frame;
an adding module 1503, configured to add content in the target form image to the first frame.
According to the processing device of the table image, the target table image is determined to be a frame-free table, the target table image is processed through the table identification structure to obtain the first frame, and then the content in the target table image is added into the first frame. The processing method of the form image provided by the invention can quickly and effectively obtain the first frame, accurately realize the conversion of the form image to the editable form, and obtain more comprehensive and clear form information based on the form image, thereby improving the processing speed of the form image and improving the user experience.
Further, the table identification structure is a line, a sliding window or a graph convolution neural network model.
According to the processing device of the form image, provided by the invention, the form identification structure can be set as a line, a sliding window or a graph convolution neural network model, so that various processing on the target form image can be realized, the processing effect of the form image is ensured, and the user experience is improved.
Further, the processing module 1502 is further configured to:
moving the table recognition structure along a first direction of a target table image, and recording a first area of uncovered content;
and/or moving the table identification structure along a second direction of the target table image, and recording a second area of the uncovered content;
and generating a table line in at least one of the first area and the second area to obtain a first frame.
According to the form image processing apparatus provided by the present invention, the first area of the uncovered content is recorded by moving the form recognition structure in the first direction of the target form image, and/or the second area of the uncovered content is recorded by moving the form recognition structure in the second direction of the target form image, and then the form line is generated in at least one of the first area and the second area, thereby obtaining the first frame, so that the purpose of generating the form line in the areas in different directions of the target form image, obtaining the first frame can be achieved, and the speed of form image processing can be increased.
Further, in a case that the table identification structure is a line, the table identification structure includes a horizontal line and a vertical line, and the processing module 1502 is further configured to:
translating the first direction line and the second direction line in the target form image based on the content distribution condition in the target form image, and determining a non-intersection area in the target form image;
and determining a target point in the non-intersection area, and connecting the target point to obtain a first frame.
According to the processing device of the form image, the non-intersecting area in the target form image is determined by overlapping the obtained first direction line and the second direction line in the target form image based on the text distribution condition in the target form image, then the target point is determined in each non-intersecting area, and the target point is connected to obtain the first frame.
Further, the processing module 1502 is further configured to:
performing binarization processing on the target table image to obtain a first binary image; wherein, the first value in the first binary image is the pixel corresponding to the content distribution area in the target table image, and the second value in the first binary image is the pixel corresponding to the non-content distribution area in the target table image;
processing the pixels in the first binary image according to a first direction to obtain a first direction line;
processing the pixels in the first binary image according to a second direction to obtain a second direction line;
and intersecting the first direction lines and the second direction lines, and taking the intersected area which is not covered by the first direction lines or the second direction lines as a non-intersecting area in the target form image.
According to the processing device of the table image, the target table image is subjected to binarization processing to obtain the first binary image, then the pixels in the first binary image are processed according to the first direction to obtain the first direction lines, the pixels in the first binary image are processed according to the second direction to obtain the second direction lines, the first direction lines and the second direction lines are crossed, and the crossed area which is not covered by the first direction lines or the second direction lines is used as the non-crossed area in the target table image, so that the processing speed of the table image is improved.
Further, the processing module 1502 is further configured to:
determining the contour of the non-intersection region through contour searching;
and taking a coordinate point of the non-intersection area as the target point according to the contour of the non-intersection area.
According to the processing device of the table image provided by the invention, the contour of each non-overlapping area is determined through contour searching, and then the target point in each non-intersecting area is determined according to the contour of each non-intersecting area. The first frame of the target form image can be generated in a central point mode, and the accuracy of generating the first frame is improved.
Further, in the case that the table identification structure is a frame, the table identification structure includes a first direction frame and a second direction frame;
accordingly, the processing module 1502 is further configured to:
translating the first-direction sliding window in the target form image based on the content distribution condition in the target form image to obtain a first-direction frame body; translating the sliding window in the second direction in the target form image to obtain a frame in the second direction;
and setting a first direction table line based on the first direction frame body, and setting a second direction table line based on the second direction frame body to obtain a first frame.
According to the processing device of the form image, the first direction sliding window is translated in the target form image based on the content distribution condition in the target form image, and a first direction frame body is obtained; translating the sliding window in the second direction in the target form image to obtain a frame in the second direction; the first direction table lines are arranged on the basis of the first direction frame body, the second direction table lines are arranged on the basis of the second direction frame body, the first frame is obtained, the first frame can be accurately obtained, and the processing speed of the table image is improved.
Further, the processing module 1502 is further configured to:
in the process that the first direction sliding window slides along the direction vertical to the first direction, determining a sliding range, which is not intersected with the text, of the first direction sliding window according to the content distribution condition in the target form image, and determining a first direction frame body according to the shape of the first direction sliding window and the sliding range, which is not intersected with the text, of the first sliding window;
and in the process that the second direction sliding window slides along the direction vertical to the second direction, determining a sliding range, which is not intersected with the text, of the second direction sliding window according to the text distribution condition in the target form image, and determining a second direction frame body according to the shape of the second direction sliding window and the sliding range, which is not intersected with the text, of the second sliding window.
According to the processing device of the form image, provided by the invention, the sliding window is arranged on the target form image, then the sliding range which is not intersected with the text content is determined in the sliding process of the sliding window, and the first direction frame body and the second direction frame body are determined according to the shape and the sliding range of the sliding window, so that the operation is simple, and the processing speed of the form image is improved.
Further, under the condition that the table identification structure is a graph convolution neural network model, determining a first frame according to the position relation between the text box and the text box in the target table image.
According to the processing device of the table image provided by the invention, under the condition that the table identification structure is the graph convolution neural network model, the first frame is determined according to the position relation between the text box and the text box in the target table image, and the position relation of each text box in the target table image can be accurately and quickly determined through the trained graph convolution neural network model, so that the first frame is obtained.
Further, the processing module 1502 is further configured to:
inputting position information of a plurality of text boxes in a target form image acquired in advance and the target form image into a graph convolution neural network model trained in advance to obtain a position relation between any one text box in the plurality of text boxes and an adjacent text box; the graph convolution neural network model is obtained by training based on position information of a plurality of text boxes in a sample table, a sample table image and label information of the sample table;
determining a text box of each line and a text box of each column in the target form image according to the position relation between any text box in the plurality of text boxes and an adjacent text box thereof;
and generating a first frame of the target form image according to the text boxes of all the lines and the text boxes of all the columns in the target form image.
According to the processing device of the form image, the position information of the text boxes and the target form image are input into a graph convolution neural network model which is trained well in advance, the position relation between any text box and the adjacent text box is obtained, then the text box of the text line and the text box of each column of each line are determined according to the position relation, and the first frame corresponding to the target form image is generated according to the text box of each line and each column. The first frame corresponding to the target form image is generated according to the position relation of the text boxes, and the accuracy of the form line generation can be improved.
Further, the processing module 1502 is further configured to:
setting the height value of each text box in the target line of the target form image according to the height target value of the target line; wherein the target row is any row of the target form image;
setting a width value of a target column according to a width target value of each text box in the target column of the target form image; wherein the target column is any column of the target form image;
drawing a table line of the target row according to the height value of the target row and the position of the target row in the target table image; and/or drawing a table line of the target column according to the width value of the target column and the position of the target column in the target table image to obtain a first frame of the target table image.
According to the processing device of the table image provided by the invention, the height target value in each text box in the target row is determined as the height value of the target row, the width target value of each text box in the target column is determined as the width value of the target column, then the table line corresponding to the target row or the target column is drawn according to the height value of the target row, the position of the target row in the target table image, the width value of the target column and the position of the target column in the target table image, and the first frame is obtained, so that the accuracy of generating the table line can be ensured, and the generated table line is complete and clear.
Further, the processing device of the form image is further configured to:
carrying out binarization processing on the target form image to obtain a second binary image; the second value in the second binary image is a pixel corresponding to a text character in the target form image, and the first value in the second binary image is a pixel except the text character in the target form image.
And setting a table identification structure for the second binary image.
According to the processing device of the form image, the target form image containing the text detection result is subjected to binarization processing, pixels corresponding to the text characters are obtained to be first values, pixels in the target form image except the text characters are second-value images of second values, then a form identification structure is set for the second-value images, and support is provided for subsequently determining the first frame according to the form identification structure
Further, the adding module 1503 is further configured to:
performing text detection on the target form image, and obtaining a text box in the target form image according to a text detection result;
setting a text box in the first frame;
filling the content in the text box in the target form image into the text box of the first frame.
According to the processing device of the form image, the target form image is subjected to text detection, a plurality of text boxes in the target form image are obtained according to the text detection result, the plurality of text boxes are arranged in the first frame, and the contents in the plurality of text boxes in the target form image are filled in the plurality of text boxes in the first frame. The method and the device can realize accurate reduction of the text data in the form image and improve the processing speed of the form image.
Further, determining module 1501 is further configured to:
inputting the target form image into a pre-trained form classification model;
determining the target form image as a frame-free form according to the output result of the form classification model;
the table classification model is obtained by training based on the sample table and the class labels of the sample table.
According to the processing device of the table image, the type of the target table image can be determined through the pre-trained table classification model, the target table image is determined to be the borderless table according to the output result of the table classification model, the type of the target table image can be accurately identified, and the processing speed of the subsequent borderless table is improved.
Further, the content in the first bezel frame is in an editable state.
According to the processing device of the form image, the obtained content in the first frame is in an editable state, the editing requirement of a user is met, and the user experience is improved.
Since the principle of the apparatus according to the embodiment of the present invention is the same as that of the method according to the above embodiment, further details are not described herein for further explanation.
Fig. 16 is a schematic structural diagram of an electronic device provided in an embodiment of the present invention, and as shown in fig. 16, the present invention provides an electronic device, including: a processor (processor)1601, a memory (memory)1602, and a bus 1603;
the processor 1601 and the memory 1602 complete communication with each other through the bus 1603;
processor 1601 is configured to call program instructions in memory 1602 to perform the methods provided in the above-described method embodiments, including, for example: determining the target table image as a borderless table; processing the target form image through a form identification structure to obtain a first frame; and adding the content in the target form image into the first frame.
Embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the methods provided in the above-described method embodiments, for example, including: determining the target table image as a borderless table; processing the target form image through a form identification structure to obtain a first frame; and adding the content in the target form image into the first frame.
The present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method provided by the above methods, the method comprising: determining the target table image as a borderless table; processing the target form image through a form identification structure to obtain a first frame; and adding the content in the target form image into the first frame.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (13)

1. A method of processing a form image, comprising:
determining the target table image as a borderless table;
processing the target form image through a form identification structure to obtain a first frame;
and adding the content in the target form image into the first frame.
2. A method of processing form images according to claim 1, the method further comprising:
the table identification structure is a line, a sliding window or a graph convolution neural network model.
3. A method of processing form images according to claim 2, the method further comprising:
in a case that the table identification structure is the line or the sliding window, the processing the target table image through the table identification structure to obtain a first frame includes:
moving the table recognition structure along a first direction of the target table image, and recording a first area which does not cover the content;
and/or moving the table identification structure along a second direction of the target table image, and recording a second area which does not cover the content;
and generating a table line in at least one of the first area and the second area to obtain a first frame.
4. A method of processing form images according to claim 2, the method further comprising:
in the case that the table identification structure is the line, the table identification structure comprises a first direction line and a second direction line;
the processing the target form image through the form recognition structure to obtain a first frame, including:
translating the first direction line and the second direction line in the target form image based on the content distribution condition in the target form image, and determining a non-intersection area in the target form image;
and determining a target point in the non-intersection area, and connecting the target point to obtain a first frame.
5. The method of processing a form image according to claim 4, wherein the determining a non-intersection region in the target form image by translating the first direction line and the second direction line in the target form image based on the content distribution in the target form image comprises:
performing binarization processing on the target table image to obtain a first binary image; wherein, a first value in the first binary image is a pixel corresponding to a content distribution area in the target form image, and a second value in the first binary image is a pixel corresponding to a non-content distribution area in the target form image;
processing the pixels in the first binary image according to a first direction to obtain a first direction line;
processing the pixels in the first binary image according to a second direction to obtain a second direction line;
and intersecting the first direction lines and the second direction lines, and taking the intersected area which is not covered by the first direction lines or the second direction lines as a non-intersecting area in the target form image.
6. The method of processing a form image of claim 4, wherein the determining target points in non-intersecting regions comprises:
determining the contour of the non-intersection region through contour searching;
and taking a coordinate point of the non-intersection area as the target point according to the contour of the non-intersection area.
7. A method of processing form images according to claim 2, the method further comprising:
in the case that the table identification structure is the sliding window, the table identification structure comprises a first direction sliding window and a second direction sliding window;
the processing the target form image through the form recognition structure to obtain a first frame, including:
translating the first-direction sliding window in the target form image based on the content distribution condition in the target form image to obtain a first-direction frame body; translating the second-direction sliding window in the target form image to obtain a second-direction frame body;
and setting a first direction table line based on the first direction frame body, and setting a second direction table line based on the second direction frame body to obtain a first side frame.
8. A method of processing form images according to claim 2, the method further comprising:
and under the condition that the table identification structure is the graph convolution neural network model, determining a first frame according to the position relation between a first text box and a second text box in the target table image.
9. The method of processing a form image of claim 1, wherein prior to said processing the target form image by the form recognition structure resulting in a first frame, the method further comprises:
carrying out binarization processing on the target table image to obtain a second binary image; wherein, the second value in the second two-value image is the pixel corresponding to the text character in the target form image, and the first value in the second two-value image is the pixel except the text character in the target form image;
and setting a table identification structure for the second binary image.
10. The method of processing a form image of claim 1, wherein the adding content in the target form image into the first bounding box frame comprises:
performing text detection on the target form image, and obtaining a text box in the target form image according to a text detection result;
setting a text box in the first frame;
filling the content in the text box in the target form image into the text box of the first frame.
11. A form image processing apparatus, comprising:
the determining module is used for determining the target form image as a borderless form;
the processing module is used for processing the target form image through the form identification structure to obtain a first frame;
and the adding module is used for adding the content in the target form image into the first frame.
12. An electronic device, comprising: a processor, a memory, and a bus, wherein,
the processor and the memory are communicated with each other through the bus;
the memory stores program instructions executable by the processor for invoking steps of a method of processing a form image capable of performing any of claims 1-10.
13. A computer-readable storage medium, characterized in that it stores computer instructions that cause a computer to perform the steps of the method of processing a form image according to any one of claims 1 to 10.
CN202111668079.1A 2021-12-31 2021-12-31 Processing method and device of form image, electronic equipment and medium Pending CN114417792A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111668079.1A CN114417792A (en) 2021-12-31 2021-12-31 Processing method and device of form image, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111668079.1A CN114417792A (en) 2021-12-31 2021-12-31 Processing method and device of form image, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN114417792A true CN114417792A (en) 2022-04-29

Family

ID=81272225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111668079.1A Pending CN114417792A (en) 2021-12-31 2021-12-31 Processing method and device of form image, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN114417792A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104094282A (en) * 2012-01-23 2014-10-08 微软公司 Borderless table detection engine
US20190266394A1 (en) * 2018-02-26 2019-08-29 Abc Fintech Co., Ltd. Method and device for parsing table in document image
US20200089946A1 (en) * 2018-06-11 2020-03-19 Innoplexus Ag System and method for extracting tabular data from electronic document
CN113239818A (en) * 2021-05-18 2021-08-10 上海交通大学 Cross-modal information extraction method of tabular image based on segmentation and graph convolution neural network
CN113408323A (en) * 2020-03-17 2021-09-17 华为技术有限公司 Extraction method, device and equipment of table information and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104094282A (en) * 2012-01-23 2014-10-08 微软公司 Borderless table detection engine
US20190266394A1 (en) * 2018-02-26 2019-08-29 Abc Fintech Co., Ltd. Method and device for parsing table in document image
US20200089946A1 (en) * 2018-06-11 2020-03-19 Innoplexus Ag System and method for extracting tabular data from electronic document
CN113408323A (en) * 2020-03-17 2021-09-17 华为技术有限公司 Extraction method, device and equipment of table information and storage medium
CN113239818A (en) * 2021-05-18 2021-08-10 上海交通大学 Cross-modal information extraction method of tabular image based on segmentation and graph convolution neural network

Similar Documents

Publication Publication Date Title
JP7206309B2 (en) Image question answering method, device, computer device, medium and program
CN110796031B (en) Table identification method and device based on artificial intelligence and electronic equipment
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN109359538B (en) Training method of convolutional neural network, gesture recognition method, device and equipment
CN113221743B (en) Table analysis method, apparatus, electronic device and storage medium
CN113111871B (en) Training method and device of text recognition model, text recognition method and device
CN110390269A (en) PDF document table extracting method, device, equipment and computer readable storage medium
CN111832403B (en) Document structure recognition method, document structure recognition model training method and device
CN111488826A (en) Text recognition method and device, electronic equipment and storage medium
KR20160132842A (en) Detecting and extracting image document components to create flow document
CN113239818B (en) Table cross-modal information extraction method based on segmentation and graph convolution neural network
KR102399508B1 (en) Layout analysis method, reading assisting device, circuit and medium
WO2024041032A1 (en) Method and device for generating editable document based on non-editable graphics-text image
CN113869017B (en) Table image reconstruction method, device, equipment and medium based on artificial intelligence
CN112257665A (en) Image content recognition method, image recognition model training method, and medium
CN113591746B (en) Document table structure detection method and device
CN111666937A (en) Method and system for recognizing text in image
CN110879972A (en) Face detection method and device
CN114330234A (en) Layout structure analysis method and device, electronic equipment and storage medium
CN113591827B (en) Text image processing method and device, electronic equipment and readable storage medium
CN113537187A (en) Text recognition method and device, electronic equipment and readable storage medium
CN116259064B (en) Table structure identification method, training method and training device for table structure identification model
CN114417792A (en) Processing method and device of form image, electronic equipment and medium
CN112926569B (en) Method for detecting natural scene image text in social network
CN114399782A (en) Text image processing method, device, equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination