CN115082941A

CN115082941A - Form information acquisition method and device for form document image

Info

Publication number: CN115082941A
Application number: CN202211009514.4A
Authority: CN
Inventors: 孙铁; 苏志锋; 周博
Original assignee: Ping An Bank Co Ltd
Current assignee: Ping An Bank Co Ltd
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2022-09-20

Abstract

The application relates to the technical field of image processing, and provides a form information acquisition method and device for a form document image. The method comprises the following steps: inputting the obtained target form document image into a trained prediction model, and obtaining a form structure of a form in the target form document image and a first coordinate area of each cell in the form; matching a second coordinate area of each text line detected according to the trained text line detection model in the target form document image with each first coordinate area to determine a cell to which each text line belongs; and writing the text lines into corresponding cells after performing text recognition, and generating table information according to the table structure and the cells recorded with the text lines. The form information acquisition method of the form document image provided by the embodiment of the application can acquire the form information of any form of form document image.

Description

Form information acquisition method and device for form document image

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for obtaining form information of a form document image.

Background

Currently, in the acquisition of information for a bank flow, an excel form image recorded with bank form information is mainly analyzed to acquire the bank form information in the form image. In the related art, the form information of the form document image is obtained by collecting different flow templates and performing one-to-one position corresponding identification on the form document image according to the templates, so as to extract the information in the form document image.

However, the prior art depends on the template for obtaining the form information in the form document image, and the form document images of the banks do not have all consistent record flow forms, so that the prior art cannot accurately obtain the form information from any form of form document image.

Disclosure of Invention

The present application is directed to solving at least one of the technical problems occurring in the related art. Therefore, the present application provides a form information acquisition method for a form document image, which can acquire form information for any form of form document image.

The application also provides a form information acquisition device of the form document image.

The application also provides an electronic device.

The present application also provides a computer-readable storage medium.

According to a first aspect of the present application, a form information obtaining method for a form document image includes:

inputting the obtained target form document image into a trained prediction model, and obtaining a form structure of a form in the target form document image and a first coordinate area of each cell in the form;

matching a second coordinate region of each text line detected according to the trained text line detection model in the target table document image with each first coordinate region to determine the cell to which each text line belongs;

writing the text lines into corresponding cells after performing character recognition, and generating table information according to the table structure and the cells recorded with the text lines;

the prediction model is obtained by training a plurality of table image training samples, and the text line detection model is obtained by training a plurality of text line training samples.

According to the form information obtaining method of the form document image, after the trained prediction model is input through the target form document image, the form structure of the form in the target form document image and the first coordinate area of each cell in the form are obtained, the second coordinate area of each text line is matched with each first coordinate area, so that each text line is written into the cell to which the text line belongs, and form information is generated according to the form structure and the cell in which each text line is recorded. Form information can be extracted from the form document image without depending on a predetermined form template, and the method is not limited by the form of bank flow, so that the form information can be acquired from any form of form document image.

According to an embodiment of the present application, the inputting the obtained target form document image into a trained prediction model, obtaining a form structure of a form in the target form document image, and a first coordinate area of each cell in the form, includes:

identifying attribute information of cells of the table in the target table document image;

determining that the attribute information does not meet a preset condition, inputting the obtained target form document image into a trained prediction model, and obtaining a form structure of a form in the target form document image and a first coordinate area of each cell in the form;

wherein the attribute information includes the number of the cells and the height of the cells.

identifying the boundary of the target form document image, and acquiring a form boundary frame in the target form document image;

and inputting the image in the boundary box into a trained prediction model, and acquiring a table structure of the table in the target table document image and a first coordinate area of each cell in the table.

According to an embodiment of the present application, the generating table information according to the table structure and the cell in which each text line is recorded includes:

generating an html character string for constructing table information according to the table structure and the cells recorded with the text lines;

and converting the html character string into an excel file according to a tablepyxl library, and acquiring the table information from the excel file.

According to an embodiment of the present application, further comprising:

acquiring an initial form document image;

inputting the initial form document image into a trained classification model, and determining a target inclination angle corresponding to the initial form document image from all preset inclination angles of the classification model;

correcting the initial form document image according to the target inclination angle to obtain a target form document image;

the classification model is obtained by training each form document image sample marked with the estimated inclination angle.

According to an embodiment of the present application, the correcting the initial form document image according to the target tilt angle to obtain the target form document image includes:

correcting the initial form document image according to the target inclination angle to obtain a corrected form document image;

acquiring each text box corresponding to each text line from the corrected form document image;

determining the slope of the corresponding text line according to the two-dimensional coordinates of each vertex of the text box;

and performing rotation correction on the corrected form document image according to the slopes to obtain the target form document image.

According to an embodiment of the present application, the determining a slope of the corresponding text line according to the two-dimensional coordinates of each vertex of the text box includes:

acquiring a long edge and a wide edge of the text box;

and determining that the length of the long edge is greater than the width edge, the length difference between the long edge and the width edge is greater than a preset value, and determining the slope of the corresponding text line according to the vertex coordinates of the two ends of the long edge.

The form information acquiring apparatus of the form document image according to the embodiment of the second aspect of the present application includes:

the table structure identification module is used for inputting the obtained target table document image into a trained prediction model, and obtaining a table structure of a table in the target table document image and a first coordinate area of each cell in the table;

a text region determining module, configured to match a second coordinate region of each text line detected in the target form document image according to the trained text line detection model with each first coordinate region, and determine the cell to which each text line belongs;

the table information acquisition module is used for writing the text lines into the corresponding cells after performing character recognition, and generating table information according to the table structure and the cells recorded with the text lines;

The electronic device according to the third aspect of the present application includes a processor and a memory storing a computer program, and the processor implements the table information acquiring method for a table document image according to any one of the embodiments when executing the computer program.

The computer-readable storage medium according to a fourth aspect of the present application stores thereon a computer program, which when executed by a processor implements the method for acquiring form information of a form document image according to any one of the embodiments described above.

The computer program product according to an embodiment of the fifth aspect of the application comprises: the computer program, when executed by a processor, implements a form information acquisition method for a form document image according to any of the embodiments described above.

One or more technical solutions in the embodiments of the present application have at least one of the following technical effects:

and after the table structure of the table in the target table document image and the first coordinate area of each cell in the table are obtained by inputting the trained prediction model through the target table document image, matching the second coordinate area of each text line with each first coordinate area to write each text line into the cell to which the text line belongs, and generating table information according to the table structure and the cell in which each text line is recorded. Form information can be extracted from the form document image without depending on a predetermined form template, and the method is not limited by the form of bank flow, so that the form information can be acquired from any form of form document image.

Furthermore, when the attribute information of the cells of the table in the target table document image is determined not to meet the preset condition, the table structure and the first coordinate area of the cells are acquired, so that the processing amount of the target table document image needing to be acquired by the table structure and the first coordinate area of the cells is reduced, and the processing efficiency of the table document image is improved.

Furthermore, the boundary frame of the table in the target table document image is acquired by performing boundary identification on the target table document image, and the image in the boundary frame is input into the trained prediction model to perform table structure and first coordinate region acquisition, so that interference of other information irrelevant to the table in the target table document image on the identification result is avoided, the accuracy of the acquired table structure and the first coordinate region of the cell is improved, and the accuracy of the subsequently acquired table information is further improved.

Further, the form document images of the bank are input into a classification model obtained by training an image sample marked with an estimated inclination angle, the classification model is used for classifying the inclination angles of the form document images, a target inclination angle corresponding to the form document images is determined, and the form document images are corrected according to the determined target inclination angle, so that the detection of the inclination angles of the form document images is converted into a classification problem which can be solved through the trained classification model, and then the classification model obtained by training the form document image samples with various inclination angles is used, so that the form document images in any 360-degree direction can be subjected to inclination correction, and the accuracy of the inclination correction result of the form document images is improved.

Furthermore, after a plurality of text lines are determined to exist in the corrected form document image, the slope of each text line is obtained, and the corrected form document image is subjected to rotation correction by utilizing the slope of each text line to obtain the target form document image, so that the slight-angle inclination correction is realized, and the form information obtaining effect of the form document image is further improved.

Furthermore, the long sides and the wide sides of the text box are compared to judge the direction of the corresponding text line according to the comparison result, so that the corresponding slope obtaining mode is determined according to the direction, the slope of the text line in any direction can be accurately determined, and the accuracy of slight correction of the text line in any direction by using the slope subsequently is improved.

Drawings

In order to more clearly illustrate the technical solutions in the present application or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart illustrating a table information obtaining method of a table document image according to an embodiment of the present application;

fig. 2 is a flowchart illustrating further details of the table structure and the acquisition of the first coordinate area of each cell in the table information acquisition method of the table document image in fig. 1 according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating further details of the table information acquisition in the table information acquisition method of the table document image of FIG. 1 according to an embodiment of the present application;

fig. 4 is a flowchart illustrating a table information obtaining method of a table document image according to another embodiment of the present application;

fig. 5 is a schematic structural diagram of a form information acquiring apparatus of a form document image according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

To make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The following describes and explains the form information acquisition method and apparatus of the form document image according to the embodiments of the present application in detail by using several specific embodiments.

In an embodiment, a form information acquisition method of a form document image is provided, and the method is applied to a server and used for acquiring form information of the form document image. The server can be an independent server or a server cluster formed by a plurality of servers, and can also be a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN (content delivery network), big data and artificial intelligence sampling point equipment and the like.

As shown in fig. 1, the method for acquiring form information of a form document image according to the present embodiment includes:

step 101, inputting an obtained target form document image into a trained prediction model, and obtaining a form structure of a form in the target form document image and a first coordinate area of each cell in the form;

102, matching a second coordinate area of each text line detected in the target table document image according to the trained text line detection model with each first coordinate area, and determining the cell to which each text line belongs;

103, writing the text lines into corresponding cells after performing character recognition, and generating table information according to the table structure and the cells recorded with the text lines;

And after the table structure of the table in the target table document image and the first coordinate area of each cell in the table are obtained by inputting the trained prediction model through the target table document image, matching the second coordinate area of each text line with each first coordinate area to write each text line into the cell to which the text line belongs, and generating table information according to the table structure and the cell in which each text line is recorded. Form information can be extracted from the form document image without depending on a predetermined form template, and the method is not limited by the form of bank flow, so that the form information can be acquired from any form of form image.

In one embodiment, the target form document image may be an image of a flow sheet presented by any bank, the flow sheet being presented in a tabular form. The prediction model may be a rare (robust text recognizers with Automatic retrieval) model. After the target form document image is obtained, the target form document image is input into a trained RARE model, and the form structure of the form in the target form document image and four vertex coordinates of any cell in the form are obtained. Wherein the table structure is represented by html tags. After the four vertex coordinates of any cell in the table are determined, the first coordinate area of the cell can be determined according to the area formed by the four vertex coordinates.

In order to improve the processing efficiency of the table document image, in an embodiment, the inputting the acquired target table document image into the trained prediction model, and acquiring the table structure of the table in the target table document image and the first coordinate region of each cell in the table includes:

In one embodiment, considering the distinction between a standard form and a non-standard form in a flow list, a closed-loop cell, which is usually composed of solid bars, is called a standard form, and a non-standard form only if individual bars do not form a closed-loop cell, a table in which a cell contains very many lines of text or completely no bars. And the standard form can be quickly identified by template matching and other modes, so that the acquisition efficiency of form information is improved. Therefore, after the target table document image is obtained, the lines in the table can be predicted through the UNET network, then the horizontal lines and the vertical lines are combined through coordinate position calculation to form cell slices, and then the attribute information of the cells, such as the number of the cells and the height of the cells, is obtained. If the number of the cells is detected to be larger than a first preset value, such as 5, and the height of the cells is smaller than a second preset value, such as 3CM, defining the table as a standard table; otherwise, the table is defined as a non-standard table.

If the form is determined to be a standard form, form information can be obtained in a template matching mode, or a YOLOV4 model is used for detecting a form peripheral frame, then a cell slice which generates very fine text content is detected, a clear slice with a more appropriate proportion is obtained in a realESRGAN hyper-division model, and then the cell information is written into excel through an xlwt library to obtain the form information.

And for the non-standard form, the form information can not be acquired in a conventional mode, and at the moment, the acquired target form document image is input into a trained prediction model to acquire the form structure of the form in the target form document image and the first coordinate area of each cell in the form.

When the attribute information of the table cells in the target table document image is determined not to meet the preset condition, the table structure and the first coordinate area of the cells are obtained, so that the processing amount of the target table document image needing to be obtained by the table structure and the first coordinate area of the cells is reduced, and the processing efficiency of the table document image is improved.

In an embodiment, the obtaining of the first coordinate area of the cell and the table structure, as shown in fig. 2, includes:

step 201, identifying the boundary of the target form document image, and acquiring a form boundary frame in the target form document image;

step 202, inputting the image in the bounding box into a trained prediction model, and obtaining a table structure of the table in the target table document image and a first coordinate area of each cell in the table.

In an embodiment, the boundary recognition is performed on the target table document image through a trained detr (detection transformer) model to obtain a table boundary box in the target table document image. Wherein, the training data of the DETR model can be derived from the data set disclosed by PubTables-1M. After the bounding box is obtained, the image in the bounding box is input into a trained prediction model, such as rare (robust text with Automatic prediction) model, to perform table structure and cell coordinate prediction, and a table structure of a table in a target table document image and a first coordinate region of each cell in the table are obtained. The prediction model is trained from a large number of table document image samples, such as those from the PubTabNet public data set, and table document images generated by an automation program according to various bank flow patterns collected inside the company. The table structure is represented by html tags, and the table structure predicts, i.e., predicts, the html tags.

The boundary recognition is carried out on the target form document image to obtain the boundary box of the form in the target form document image, and the image in the boundary box is input into the trained prediction model to obtain the form structure and the first coordinate area, so that the interference of other information irrelevant to the form in the target form document image on the recognition result is avoided, the accuracy of the obtained form structure and the first coordinate area of the cell is improved, and the accuracy of the subsequently obtained form information is further improved.

In an embodiment, after the table structure of the table and the first coordinate regions of the cells in the table are obtained, the second coordinate regions of the text lines detected according to the trained text line detection model in the target table document image are matched with the first coordinate regions. In particular, the text line detection model may be a DBNet model. And acquiring four vertex coordinates of any text line by using the trained DBNet model, so that a region surrounded by the four vertex coordinates is determined as a second coordinate region of the text line. Then, the second coordinate regions of the text line are matched with the respective first coordinate regions. And when the second coordinate area of the text line is detected to be positioned in a certain first coordinate area, determining the cell corresponding to the first coordinate area as the cell to which the text line belongs.

In one embodiment, after determining the cell to which each Text line belongs, writing the Text of each Text line identified by a Text identification model, such as svtr (scene Text Recognition with a Single Visual model), into each corresponding cell, so that the identification result of the cell is combined by the coordinates of the Text line, the Text identification result, and the coordinates of the cell. After the text of the text line is identified through the text identification model, the identified text can be matched through a preset database, such as a database storing data of bank names, account information, dial numbers, point symbols, payment channels and the like, so as to verify the identified text. If the matched bank name is the same as the recognized character in the preset database, the character recognition is correct.

After the identification result of the cell is obtained, an excel table file can be formed based on the identification result of the cell and the table structure, so that table information is obtained. Specifically, as shown in fig. 3, the generating table information according to the table structure and the cell in which each text line is recorded includes:

step 301, generating an html character string for constructing table information according to the table structure and the cells recorded with the text lines;

step 302, converting the html character string into an excel file according to a tablepyxl library, and acquiring the table information from the excel file.

In one embodiment, after the identification result of the cell in which each text line is recorded, that is, the identification result of the cell, is obtained, the identification result of the cell and the html tag representing the table structure are combined to construct an html character string of the table, the obtained html character string is converted through a tablepyxl library, and after the html character string is written into an excel file, table information is read from the excel file.

In one embodiment, since the acquired form document image is obtained electronically, such as by shooting or scanning, there may be any tilt in any direction. In this case, the text lines in the table document image are also inclined, which may reduce the accuracy of the recognition result of the text lines, and further affect the accuracy of the generated table information. To this end, in one embodiment, as shown in fig. 4, the acquiring of the target form document image includes:

1001, acquiring an initial form document image;

step 1002, inputting the initial form document image into a trained classification model, and determining a target inclination angle corresponding to the initial form document image from each preset inclination angle of the classification model;

step 1003, correcting the initial form document image according to the target inclination angle to obtain the target form document image;

In one embodiment, the classification model is a deep network structure, which has 5 convolutional layers, each of which has 2 or 3 convolutional layers, and the end of each of the layers is connected to a largest pooling layer for reducing the size of the picture; the convolution kernels in each section are the same, and the closer to the full connection layer, the more the convolution kernels are. The classification model is created with a parameter initialization function, such as at least one of a convolution operation con _ op function, a full-link layer operation fc _ op function, and a pooling operation mpool _ op function.

Illustratively, the first segment of the classification model has a convolution output size of 112 × 64, the second segment has a convolution output of 56 × 128, the third segment has a convolution output of 28 × 256, the fourth segment has a convolution output of 14 × 512, and the fifth segment has a convolution output of 7 × 512. Reshape flattens the results of pool5 using tf, i.e., a one-dimensional vector denoted 7 x 512. The first fully connected layer of the classification model is created using the fc _ op function, the hidden node is 4096 and the activation function is ReLu. The classification model uses tf.nn.dropout function to create a Dropout layer, when the classification model is trained, the node retention rate is 0.5, and when the classification model is used for inclination angle prediction, the node retention rate is 1. The second full-link layer of the classification model is consistent with the first full-link layer, a dropout layer is followed, the output node is 1000, classification probability output is obtained by using softmax processing, and the maximum class is obtained by using tf. Finally fc8, softmax, preditions and parameter list p are returned as function results.

In an embodiment, a plurality of categories of the tilt angles, i.e. a plurality of preset tilt angles, are preset in the classification model. Since it is found by observing a large amount of images that if a slight inclination is ignored, the inclination angle of the image can be divided into four directions, namely, a 0-degree inclination, a 90-degree inclination, a 270-degree inclination, and a 180-degree inclination, the plurality of preset inclination angles can be 0 °, 90 °, 180 °, and 270 °, respectively. It can be understood that, besides the above-mentioned plurality of preset inclination angles, other preset inclination angles can be set according to actual conditions. At this time, the plurality of preset tilt angles can be regarded as representing the corresponding categories respectively.

In one embodiment, the classification model is obtained by training each table document image sample marked with the estimated tilt angle. Specifically, each form document image sample is sequentially input into a pre-constructed classification model for model training, and after the form document image sample is input each time, parameters of the classification model are adjusted according to a preset inclination angle of the form document image sample output by the classification model and an angle difference between the preset inclination angle of the form document image sample and an estimated inclination angle of the form document image sample until an angle difference corresponding to any form document image sample meets a preset condition, so that the training of the classification model is completed.

In an embodiment, for training of the classification model, massive form document image samples of various banks can be collected first, and then estimated inclination angle labeling is performed on the form document image samples. The estimated tilt angle of the form document image sample may be an actual tilt angle measured from the form document image sample. If the actual tilt angle measured by the form document image sample is 20 degrees, the estimated tilt angle is 20 degrees. Or, according to the actual inclination angle measured by the form document image sample, acquiring a preset inclination angle closest to the actual inclination angle from all preset inclination angles of the classification model as an estimated inclination angle. If the actual inclination angle measured by the form document image sample is 20 degrees, the preset inclination angles are 0 degree, 90 degrees, 180 degrees and 270 degrees, the estimated inclination angle is 0 degree.

After collecting each form document image sample marked with the estimated inclination angle, sequentially inputting each form document image sample into a classification model for training. In the training process, for any form document image sample, acquiring a preset inclination angle output by the classification model aiming at the form document image sample, and then comparing the output preset inclination angle with the estimated inclination angle of the form document image sample. If the angle difference between the two is smaller than the preset value, no processing is performed; and if the angle difference between the two is greater than a preset value, adjusting the parameters of the classification model according to the preset value. And after the parameters of the classification model are adjusted, inputting the next form document image sample into the classification model for model training until all the form document image samples are input into the classification model after the parameters are adjusted for a certain time, and judging that the classification model training is finished if all the obtained angle differences are smaller than a preset value.

A large number of form document image samples are input into a pre-constructed classification model for model training, so that the parameters of the classification model are adjusted according to the preset inclination angle of the form document image samples output by the classification model and the angle difference of the estimated inclination angle of the form document image samples until the angle difference corresponding to any form document image sample meets the preset condition, and the training of the classification model is completed, so that the accuracy of classification of the preset inclination angle of the image by the classification model is improved, and the accuracy of subsequent detection of the inclination angle of the form document image of a bank is improved.

In an embodiment, after the training of the classification model is completed, the obtained initial form document image is input into the trained classification model, and the initial form document image may be classified by the trained classification model, so as to determine the preset inclination angle to which the initial form document image belongs from the preset inclination angles, and thus determine the preset inclination angle to which the initial form document image belongs as the target inclination angle.

The bank form document image is input into a classification model obtained by training an image sample marked with a pre-estimated inclination angle, the classification model is used for classifying the inclination angle of the form document image, a target inclination angle corresponding to the form document image is determined, and the form document image is corrected according to the determined target inclination angle, so that the detection of the inclination angle of the form document image is converted into a classification problem which can be solved by the trained classification model, and further, the classification model obtained by training the form document image samples with various inclination angles is used, so that the inclination correction of the form document image in any 360-degree direction can be realized, and the accuracy of the inclination correction result of the form document image is improved.

And after the target inclination angle is determined, performing rotation correction on the central point of the initial form document image based on the target inclination angle, thereby obtaining a target form document image.

It is considered that there may be a slight angle of inclination after the target form document image is obtained. Therefore, to further improve the table information obtaining effect of the table document image, in an embodiment, the correcting the initial table document image according to the target tilt angle to obtain the target table document image includes:

In one embodiment, since only a single line of text exists in the form document image, slight tilting does not affect the accuracy of OCR recognition. Therefore, in order to improve the processing efficiency, the initial form document image can be corrected according to the target inclination angle, and after the corrected form document image is obtained, the text line detection can be performed on the corrected form document image. If the corrected form document image only has one text line, taking the corrected form document image as a target form document image; otherwise, extracting each text line of the corrected form document image and acquiring the slope of each text line. After the slopes of the text lines are obtained, the slopes with the same number and the largest number can be extracted from the slopes of the text lines to serve as target slopes, and the target document is subjected to rotation correction according to the target slopes.

After a plurality of text lines exist in the corrected form document image, the slope of each text line is obtained, and the corrected form document image is subjected to rotation correction by utilizing the slope of each text line to obtain a target form document image, so that the slope correction of a slight angle is realized, and the form information obtaining effect of the form document image is further improved.

In order to make the obtained slope more accurate, in an embodiment, after the corrected form document image is obtained, each text line of the corrected form document image is detected, and a text box corresponding to each text line is obtained. After each text box is acquired, the four vertices of the text box are respectively identified as the top left vertex, the top right vertex, the bottom right vertex and the bottom left vertex of the text box, and then the 4 vertex coordinates of any text box are acquired from the two-dimensional coordinate system established based on the center point of the correction table document image, namely, the top left vertex (x 1, y1), the top right vertex (x2, y2), the bottom right vertex (x3, y3) and the bottom left vertex (x4, y 4). After the four vertex coordinates are obtained, the slope of the text line may be determined to be K = (y2-y1)/(x2-x1), or K = (y4-y3)/(x4-x3), according to the coordinates of the top left vertex and the top right vertex, or the coordinates of the bottom right vertex and the bottom left vertex.

The slope of the corresponding text line is determined by acquiring each text box corresponding to the text line and based on the two-dimensional coordinates of each vertex of the text box, so that the slope of the text line can be determined quickly and accurately, and the slight slope correction effect on the table document image is further improved.

In consideration of the correction table document image, the directions of the respective text lines may be different, for example, the direction of one of the text lines may be a horizontal direction, and the direction of the other text line may be a vertical direction. In this case, if the correction table document image is slightly corrected in terms of the inclination in the horizontal direction, the correction in the vertical direction may not be accurate enough. To this end, in an embodiment, determining a slope of the corresponding text line according to the two-dimensional coordinates of each vertex of the text box includes:

acquiring a long edge and a wide edge of the text box;

and determining that the length of the long edge is greater than that of the wide edge, the length difference between the long edge and the wide edge is greater than a preset value, and determining the slope of the corresponding text line according to the vertex coordinates of the two ends of the long edge.

In one embodiment, since the text box is rectangular, after the coordinates of the four vertices of the text box are determined, the long side of the text box can be determined according to the coordinates of the top left vertex and the top right vertex, or the coordinates of the bottom right vertex and the bottom left vertex. If the coordinates of the top left vertex are (x 1, y1) and the coordinates of the top right vertex are (x2, y2), the distance between the two vertices is the long side of the text box. It will be appreciated that the long side of the text box may also be determined by the coordinate distance of the lower left vertex and the lower right vertex. Similarly, the broad side of the text box may be determined according to the coordinate distance between the upper right vertex and the lower right vertex, or the coordinate distance between the upper left vertex and the lower left vertex.

After the long side and the wide side of the text box are determined, if the long side is larger than the frame side and the length difference between the long side and the wide side is larger than the preset value, the direction of the text box can be determined to be the horizontal direction, and at this time, the slope of the text line can be determined to be K = (y2-y1)/(x2-x1) according to the coordinates of the vertices at the two ends of the long side, such as the coordinates of the top left vertex (x 1, y1) and the coordinates of the top right vertex (x2, y 2). The preset value can be set according to actual conditions, such as 30 mm.

Similarly, in an embodiment, if it is determined that the length of the broadside is greater than the length of the long side, and the length difference between the broadside and the long side is greater than the preset value, it may be determined that the direction of the text box is the vertical direction, and at this time, the slope of the corresponding text line may be determined to be K = (y3-y2)/(x3-x2) according to the coordinates of the vertices at the two ends of the broadside, such as the coordinate (x2, y3) of the upper right vertex and the coordinate (x3, y3) of the lower right vertex.

The long sides and the wide sides of the text boxes are compared to judge the direction of the corresponding text line according to the comparison result, so that the corresponding slope obtaining mode is determined according to the direction, the slope of the text line in any direction can be accurately determined, and the accuracy of slight correction of the text line in any direction by using the slope is improved.

After the slope of each text line is obtained, in order to make the result of slight correction according to the slope more accurate, in an embodiment, after the slope of each text line is obtained, the slope of each text line may be directly averaged, and the obtained slope may be determined as an average slope. Alternatively, to make the obtained average slope more accurate, all calculated slopes may be averaged first, and the obtained slope may be determined as the baseline. And then increasing a preset threshold value for the upper limit and the lower limit of the baseline to obtain a target interval. And after the target interval is obtained, filtering the slope of each text line, filtering the slope outside the target interval from the slope of each text line, and acquiring the slope in the target interval as the target slope. Then, the target slopes are averaged to obtain an average slope.

After the average slope is obtained, the rotation angle is calculated through the average slope, the central position of the corrected form document image is obtained, the corrected form document image is rotated through an OPENCV method such as warpAffine, and finally the target form document image with the slight inclination angle corrected is obtained.

The following describes a form information acquiring apparatus of a form document image provided in the present application, and the form information acquiring apparatus of a form document image described below and the form information acquiring method of a form document image described above may be referred to in correspondence with each other.

In an embodiment, as shown in fig. 5, there is provided a form information acquiring apparatus of a form document image, including:

the table structure identification module 210 is configured to input the obtained target table document image into a trained prediction model, and obtain a table structure of a table in the target table document image and a first coordinate area of each cell in the table;

a text region determining module 220, configured to match a second coordinate region of each text line detected in the target form document image according to the trained text line detection model with each first coordinate region, and determine the cell to which each text line belongs;

a table information obtaining module 230, configured to perform text recognition on each text line, write the text line into each corresponding cell, and generate table information according to the table structure and the cell in which each text line is recorded;

In an embodiment, the table structure identifying module 210 is specifically configured to:

In an embodiment, the table information obtaining module 230 is specifically configured to:

In one embodiment, the table structure identification module 210 is further configured to:

acquiring an initial form document image;

acquiring a long edge and a wide edge of the text box;

Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor) 810, a Communication Interface 820, a memory 830 and a Communication bus 840, wherein the processor 810, the Communication Interface 820 and the memory 830 communicate with each other via the Communication bus 840. The processor 810 may call the computer program in the memory 830 to execute a table information obtaining method of a table document image, for example, including:

In addition, the logic instructions in the memory 830 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

On the other hand, an embodiment of the present application further provides a storage medium, where the storage medium includes a computer program, where the computer program is stored on a non-transitory computer-readable storage medium, and when the computer program is executed by a processor, the computer is capable of executing the table information obtaining method for a table document image provided in the foregoing embodiments, for example, the method includes:

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A form information acquisition method of a form document image is characterized by comprising the following steps:

2. The form information obtaining method of a form document image according to claim 1, wherein the step of inputting the obtained target form document image into a trained prediction model to obtain a form structure of a form in the target form document image and a first coordinate region of each cell in the form comprises:

3. The form information obtaining method of a form document image according to claim 1 or 2, wherein the step of inputting the obtained target form document image into a trained prediction model to obtain a form structure of a form in the target form document image and a first coordinate area of each cell in the form comprises:

4. The form information acquiring method of a form document image according to claim 1, wherein the generating form information based on the form structure and the cells in which each of the text lines is recorded, includes:

5. The form information acquisition method of a form document image according to claim 1, further comprising:

acquiring an initial form document image;

6. The form information acquiring method of a form document image according to claim 5, wherein the step of correcting the initial form document image according to the target tilt angle to acquire the target form document image comprises:

7. A form information acquiring method of a form document image according to claim 6, wherein said determining the slope of the corresponding text line according to the two-dimensional coordinates of each vertex of the text box comprises:

acquiring a long edge and a wide edge of the text box;

8. A form information acquisition device for a form document image, comprising:

the table information acquisition module is used for performing character recognition on each text line and writing the text line into each corresponding cell, and generating table information according to the table structure and the cells recorded with each text line;

9. An electronic device comprising a processor and a memory storing a computer program, wherein the processor implements the form information acquisition method of the form document image according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium on which a computer program is stored, the computer program being executed by a processor to implement the form information acquisition method of the form document image according to any one of claims 1 to 7.