WO2020164281A1

WO2020164281A1 - Form parsing method based on character location and recognition, and medium and computer device

Info

Publication number: WO2020164281A1
Application number: PCT/CN2019/118422
Authority: WO
Inventors: 周罡; 卢波
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-02-13
Filing date: 2019-11-14
Publication date: 2020-08-20
Also published as: CN109961008A

Abstract

A form parsing method based on character location and recognition. The method comprises: inputting a form picture into a pre-trained character location network to obtain location information of characters in the form picture (S11); performing graph segmentation on the form picture according to the location information to obtain a cell picture corresponding to the location information, and inputting the cell picture into a pre-trained character recognition network to perform character recognition so as to obtain the cell character content (S12); extracting a first form layout of the form picture according to the location information (S13); and generating a form file of the form picture according to the first form layout and the cell character content (S14). An established deep learning model can be used for locating and recognizing characters in a form picture, thereby improving the efficiency and accuracy of form picture recognition.

Description

Form analysis method, medium and computer equipment based on character positioning recognition To

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 13, 2019, the application number is 201910115364.7, and the invention title is "Form analysis method, medium and computer equipment based on text positioning recognition". The reference is incorporated in the application.

Technical field

This application relates to the field of computer processing technology, and in particular to a table analysis method, medium and computer equipment based on text positioning and recognition.

Background technique

At present, deep learning is developing rapidly in the field of image recognition. It has completely surpassed the accuracy and efficiency of traditional methods, and is deeply concerned in the field of image recognition. Deep learning is a new field in machine learning research. Its motivation lies in establishing and simulating a neural network for analysis and learning of the human brain. It mimics the mechanism of the human brain to interpret data, such as images, sounds and texts. However, the recognition of the table refers to the conversion of the table in the table picture into editable table text. In this process, text recognition and image recognition are required.

In the existing technology, deep learning is also used to analyze the table in the table picture, but the existing technical solution is to detect and recognize the table line in the table picture through deep learning, which has at least the following defects:

The existing technical solution is to perform table analysis based on the presence of table lines. When there is no table line, the table format picture cannot be extracted.

Summary of the invention

The present application provides a form analysis method and corresponding device based on text positioning and recognition, which mainly realizes the positioning and recognition of text in form pictures by using established deep learning models, and improves the efficiency and accuracy of form picture recognition. To

This application also provides a computer device and a readable storage medium for executing the table analysis method based on text positioning and recognition of this application.

To solve the above problems, this application adopts the following technical solutions:

In the first aspect, the present application provides a method for analyzing table images based on text positioning and recognition, the method including:

Input the form picture to the pre-trained text positioning network to obtain the position information of the characters in the form picture;

Graphically segment the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain cell character content;

Extracting the first table layout of the table picture according to the position information;

Generating a table file of the table picture according to the first table layout and the cell character content;

Wherein, the input form picture to a pre-trained text positioning network to obtain position information of characters in the form picture includes:

Input form picture to pre-trained text positioning network;

Acquiring several consecutive character strings in the table picture as a character string combination;

Obtaining the smallest rectangular frame surrounding the character string combination;

A rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.

Compared with the prior art, the technical solution of this application has at least the following advantages:

1. This application provides a form analysis method based on text positioning and recognition. By inputting form pictures to a pre-trained text positioning network, the position information of the characters in the form pictures is obtained; the form pictures are performed according to the position information. Graphic segmentation, segmenting the cell picture corresponding to the position information, inputting the cell picture into a pre-trained text recognition network for character recognition, and obtaining the cell character content; extracting the table picture according to the position information A first table layout; according to the first table layout and the cell character content, a table file of the table picture is generated. In this application, the established deep learning model can be used to locate and recognize the text in the table image, which improves the efficiency and accuracy of the table image recognition.

2. In this application, input form pictures into a pre-trained text positioning network; obtain several consecutive character strings in the form pictures as a character string combination; obtain the smallest rectangular frame surrounding the character string combination; establish rectangular coordinates System, acquiring the coordinates of each vertex of the rectangular frame as the position information. This application obtains the position information of the text in the table picture through this mechanism, and improves the accuracy and efficiency of text positioning.

3. This application can detect whether the table picture contains grid lines; if the table picture contains grid lines, extract the second table layout of the table picture; combine the second table layout with the first A table layout is compared, and when the result of the comparison is that the first table layout is consistent with the second table layout, it is verified that the first table layout is valid. This application can additionally detect whether there are table lines in the table picture. In the case that the table pictures have table lines, the table lines are directly extracted, and then the obtained first table layout and the extracted table line form the first The two table layouts are compared to verify whether the first table layout is valid. This application uses the text positioning network and the text recognition network to parse the table pictures, which can be compatible with the situations where there is no table line and the table line or the table line is incomplete, and the scope of application is wide.

4. The present application may further calculate the comparison result of the second table layout and the first table layout, and the comparison result is expressed as the difference between the first table layout and the second table layout, When the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, the text positioning network is retrained. This application can flexibly and intelligently learn through this mechanism, and intelligently adjust the pre-trained text positioning network, so that the analysis result of the table image becomes more and more accurate.

Description of the drawings

FIG. 1 is a flowchart of a table parsing method based on text positioning recognition in an embodiment;

Figure 2 is a text positioning network based on scene text detection in the prior art;

3 is a schematic diagram of obtaining position information of characters in the table picture in an embodiment;

4 is a structural block diagram of a table analysis device based on text positioning recognition in an embodiment;

Fig. 5 is a block diagram of the internal structure of a computer device in an embodiment.

The realization of the purpose, functional characteristics and advantages of the application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

detailed description

In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application.

In some processes described in the specification and claims of this application and the above drawings, multiple operations appearing in a specific order are included, but it should be clearly understood that these operations may not be in the order in which they appear in this text. Execution or parallel execution, the sequence numbers of operations such as S11, S12, etc., are only used to distinguish different operations, and the sequence numbers themselves do not represent any execution order. In addition, these processes may include more or fewer operations, and these operations may be executed sequentially or in parallel. It should be noted that the descriptions of "first" and "second" in this article are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, nor do they limit the "first" and "second" Are different types.

Those of ordinary skill in the art can understand that, unless specifically stated otherwise, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of this application refers to the presence of the described features, integers, steps, operations, elements, and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or groups thereof. It should be understood that when we refer to an element as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element, or intervening elements may also be present. In addition, “connected” or “coupled” used herein may include wireless connection or wireless coupling. The term "and/or" as used herein includes all or any unit and all combinations of one or more associated listed items.

A person of ordinary skill in the art can understand that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meanings as those of ordinary skill in the art to which this application belongs. It should also be understood that terms such as those defined in general dictionaries should be understood to have a meaning consistent with the meaning in the context of the prior art, and unless specifically defined as here, they will not be idealized or overly Explain the formal meaning.

The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application, wherein the same or similar reference numerals indicate the same or similar elements or elements with the same or similar functions. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative work are within the protection scope of this application.

Please refer to FIG. 1. An embodiment of the present application provides a table analysis method based on text positioning and recognition. As shown in FIG. 1, the method includes the following steps:

S11. Input the form picture to the pre-trained text positioning network to obtain the position information of the characters in the form picture.

In the embodiment of the present application, the deep network training is performed by inputting multiple target samples in advance, and the text positioning network capable of positioning the text of the table picture and the text recognition network capable of recognizing the text of the table picture are trained. Specifically, feature point extraction and feature fusion are performed on the sample picture, and finally the text positioning network and the text recognition network are output. Wherein, the target sample includes at least a picture sample and the coordinates of a marked rectangular frame with text.

Deep network training is a new field in machine learning research. Its motivation is to establish and simulate a neural network that simulates the human brain for analysis and learning. It mimics the mechanism of the human brain to interpret data, such as images, sounds, and text.

The general idea of this application is the text detection and recognition process based on deep network training, specifically through FasterRCNN (deep learning-based target detection technology), CTPN (natural scene text detection) and other positioning networks for text detection and recognition in pictures. Positioning to obtain the location information of the text, and then input the area pointed to by the location information to the RNN-based text recognition network such as RCNN for text recognition, and obtain the character string corresponding to the location information.

Please refer to Figure 2. Figure 2 is a text positioning network based on EAST (scene text detection). The text positioning network used in this application is an improvement based on the EAST text positioning network. Specifically, the text positioning network used in this application is the score in the network structure shown in FIG. 2 After the map is connected to the LSTM (Long Short-term Memory Network), the score map is brightened and evenly touched. Use dice during training Loss replaces focus-loss. Among them, LSTM is a time recurrent neural network, which is suitable for processing and predicting important events with relatively long intervals and delays in time series.

Further, inputting the form picture described in this application to the pre-trained text positioning network to obtain the position information of the characters in the form picture specifically includes: inputting the form picture to the pre-trained text positioning network; Several character strings of are used as a character string combination; the smallest rectangular frame surrounding the character string combination is obtained; a rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.

Please refer to FIG. 3, which is a schematic diagram of obtaining position information of characters in the table picture. As shown in Figure 3, the table picture contains several character string combinations. After the text positioning network is used, the smallest rectangular frame wrapping each character string combination is output. In the embodiment of the present application, the position information of the characters in the table picture is expressed as the coordinate value of the smallest rectangular frame that wraps the combination of character strings. In the present application, the coordinates of the four vertices of the rectangular frame surrounding the character string combination can be directly obtained through the character positioning network. Specifically, the position information is expressed as the coordinate values of the upper left corner and the lower right corner of the rectangular frame. In actual use, because the table text is basically horizontal, the obtained Quad In the Geometry function, the minimum and maximum values of the X axis and the minimum and maximum values of the Y axis constitute the coordinates of the upper left corner and the lower right corner of the rectangular frame, thereby obtaining a standard rectangular frame. For example, the coordinates of the four vertices of the smallest rectangular frame surrounding a certain string combination obtained through the text positioning network are: A (X1, Y1), A (X1, Y2), A (X2, Y1), and A (X2, Y2), according to the size of X1, X2, Y1, and Y2, select the coordinates of the upper left and lower right corners of the rectangle.

S12. Perform graphic segmentation on the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain cell character content .

In the embodiment of the present application, a rectangular frame is determined according to the position information, and a cell picture is determined according to the rectangular frame. Specifically, the present application performs image segmentation on the form picture according to the rectangular frame, and cuts out the cell picture corresponding to the rectangular frame from the form picture, wherein each cell picture contains a character string combination.

Further, the present application inputs the cell picture to the text recognition network to recognize the content of the character string combination in the cell picture to obtain the cell character content. In the embodiment of the present application, the character recognition network is a classic character recognition CRNN network, and the cell character content that can be edited is obtained through the network.

S13. Extract a first table layout of the table picture according to the location information.

In the embodiment of the present application, extracting the first table layout of the table picture according to the position information specifically includes: extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information; Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points in the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row; calculate the total number of rows and the total The number of columns is used as the first table layout.

In the embodiment of the present application, the rectangular frame wrapping each character string combination is divided into the positions of the rows and columns corresponding to the table pictures according to the overlap ratio of the position information in the horizontal direction and the vertical direction. Wherein, the ordinates of the vertices of the rectangular boxes in the same row are the same or similar, and the abscissas of the rectangular boxes in the same column are the same or similar. This application can set when the ordinates of two points are the same or the difference between the ordinates of the two points is within a preset range to determine that the two points are in the same row, and when the abscissas of the two points are the same or When the difference between the abscissas of the two points is within the preset range, it is determined that the two points are located in the same column. According to this principle, this application divides the vertices of the rectangular frame with the same or similar ordinates into the same row, and divides the same or similar abscissas into the same column.

Please continue to refer to Figure 3, as shown in Figure 3, the abscissas of the vertices of the rectangular boxes in the same column are the same or similar, but the abscissa ranges of different columns do not overlap. Rectangular boxes in the same row have the intersection of coincident ordinates, and there is no intersection of ordinate ranges that are not in the same row.

In the embodiment of the present application, the first table layout includes at least the number of rows and columns of the table. For the name content of the table, it has a text length that spans columns, so you can remove it first. Through the above rules, the number N of rows and the number M of columns of the table picture can be extracted, and further, the N×M layout format of the table picture can be extracted.

S14. Generate a table file of the table picture according to the first table layout and the cell character content.

In the embodiment of the application, the generating a table file of the table picture according to the first table layout and the cell character content specifically includes: drawing a table according to the first table layout; The characters are correspondingly filled in the cells of the drawn table to generate a table file of the table picture.

In the embodiment of the present application, after extracting the first table layout of the table picture, the table corresponding to the table picture is drawn, and the table contains the same number of cells as the combination of the character strings. Further, this application fills the identified cell character content into the cells of the table to generate a table file, whose content can be saved in csv or json format for data analysis and processing by the program, thereby realizing the analysis of the table image .

In the embodiment of the present application, before the input of the form picture to the pre-trained text positioning network and the position information of the characters in the form picture is obtained, the method further includes: detecting whether the form picture contains grid lines; if the form If the picture contains grid lines, extract the second table layout of the table picture; compare the second table layout with the first table layout, and when the comparison result is that the first table layout and the When the second table layout is consistent, it is verified that the first table layout is valid. In a possible design, if there are grid lines in the table in the table diagram, the second table layout can be extracted through the open and close operation of image science.

In fact, the present application can verify the reliability of the first table layout and the second table layout by comparing the first table layout with the second table layout.

Preferably, the present application may also calculate a comparison result of the second table layout and the first table layout, and the comparison result is expressed as the difference between the first table layout and the second table. When the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, the text positioning network is retrained to improve the recognition accuracy of the solution.

Please refer to Fig. 4, in another embodiment, the present application provides a form image analysis device based on text positioning recognition, including:

The input module 11 is used to input form pictures to a pre-trained text positioning network to obtain position information of characters in the form pictures.

Please continue to refer to FIG. 3, which is a schematic diagram of obtaining position information of characters in the table picture. As shown in Figure 3, the table picture contains several character string combinations. After the text positioning network is used, the smallest rectangular frame wrapping each character string combination is output. In the embodiment of the present application, the position information of the characters in the table picture is expressed as the coordinate value of the smallest rectangular frame that wraps the combination of character strings. In the present application, the coordinates of the four vertices of the rectangular frame surrounding the character string combination can be directly obtained through the character positioning network. Specifically, the position information is expressed as the coordinate values of the upper left corner and the lower right corner of the rectangular frame. In actual use, because the table text is basically horizontal, the obtained Quad In the Geometry function, the minimum and maximum values of the X axis and the minimum and maximum values of the Y axis constitute the coordinates of the upper left corner and the lower right corner of the rectangular frame, thereby obtaining a standard rectangular frame. For example, the coordinates of the four vertices of the smallest rectangular frame that wraps a certain string combination obtained through the text positioning network are: A(X1, Y1), A(X1, Y2), A(X2, Y1), and A (X2, Y2), according to the size of X1, X2, Y1, and Y2, select the coordinates of the upper left and lower right corners of the rectangle.

The segmentation module 12 is configured to perform graphic segmentation on the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain Cell character content.

The extraction module 13 is configured to extract the first table layout of the table picture according to the position information.

The generating module 14 is configured to generate a table file of the table picture according to the first table layout and the cell character content.

In another embodiment, an embodiment of the present application provides a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile readable storage medium. The computer-readable storage medium stores computer-readable instructions, and when the program is executed by a processor, the table analysis method based on text positioning and recognition according to any one of the technical solutions is implemented. Wherein, the computer-readable storage medium includes but is not limited to any type of disk (including floppy disk, hard disk, optical disk, CD-ROM, and magneto-optical disk), ROM (Read-Only Memory, read-only memory), RAM (Random AcceSS Memory, random memory), EPROM (EraSable Programmable Read-Only Memory, erasable programmable read-only memory), EEPROM (Electrically EraSable Programmable Read-Only Memory, electrically erasable programmable read-only memory), flash memory, magnetic card or optical card. That is, a storage device includes any medium that stores or transmits information in a readable form by a device (for example, a computer, a mobile phone), and may be a read-only memory, a magnetic disk, or an optical disk.

The computer-readable storage medium provided by the embodiment of the application can realize the input of form pictures to a pre-trained text positioning network to obtain the position information of the characters in the form pictures; graph the form pictures according to the position information Segmentation, segmenting the cell picture corresponding to the position information, inputting the cell picture into a pre-trained text recognition network for character recognition, and obtaining cell character content; extracting the first part of the table picture according to the position information A table layout; according to the first table layout and the cell character content, a table file of the table picture is generated. In this application, the established deep learning model can be used to locate and recognize the text in the table image, which improves the efficiency and accuracy of the table image recognition.

In addition, in another embodiment, the present application provides a computer device. As shown in FIG. 5, the computer device includes a processor 303, a memory 305, an input unit 307, and a display unit 309. Those skilled in the art can understand that the structural components shown in FIG. 5 do not constitute a limitation on all computer equipment, and may include more or less components than those shown in the figure, or combine certain components. The memory 305 may be used to store the application program 301 and various functional modules, and the processor 303 runs the application program 301 stored in the memory 305 to execute various functional applications and data processing of the device. The memory 305 may be internal memory or external memory, or include both internal memory and external memory. The internal memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, or random access memory. External storage can include hard disks, floppy disks, ZIP disks, U disks, tapes, etc. The memory disclosed in this application includes but is not limited to these types of memory. The memory 305 disclosed in this application is merely an example and not a limitation.

The input unit 307 is used for receiving input of signals and receiving keywords input by the user. The input unit 307 may include a touch panel and other input devices. The touch panel can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc., to operate on the touch panel or near the touch panel), and according to the preset The program drives the corresponding connection device; other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as playback control buttons, switch buttons, etc.), trackball, mouse, and joystick. The display unit 309 may be used to display information input by the user or information provided to the user and various menus of the computer device. The display unit 309 may take the form of a liquid crystal display, an organic light emitting diode, or the like. The processor 303 is the control center of the computer equipment. It uses various interfaces and lines to connect the various parts of the entire computer. By running or executing the software programs and/or modules stored in the memory 303, and calling the data stored in the memory, execute Various functions and processing data. The one or more processors 303 shown in FIG. 5 can execute and realize the functions of the input module 11, the recognition module 12, the extraction module 13, and the generation module 14 shown in FIG. 4.

In one embodiment, the computer device includes a memory 305 and a processor 303. The memory 305 stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor 303 executes the steps of a table analysis method based on character positioning recognition described in the above embodiment.

The computer device provided by the embodiment of the application can input form pictures to a pre-trained text positioning network to obtain position information of characters in the form pictures; perform graphic segmentation and segmentation on the form pictures according to the position information The cell picture corresponding to the position information is extracted, and the cell picture is input into a pre-trained text recognition network for character recognition to obtain the cell character content; according to the position information, the first table layout of the table picture is extracted ; According to the first table layout and the cell character content, a table file of the table picture is generated. In this application, the established deep learning model can be used to locate and recognize the text in the table image, which improves the efficiency and accuracy of the table image recognition.

In another embodiment, the present application can also detect whether the table picture contains grid lines; if the table picture contains grid lines, extract the second table layout of the table picture; The second table layout is compared with the first table layout, and when the comparison result is that the first table layout is consistent with the second table layout, it is verified that the first table layout is valid. This application can additionally detect whether there are table lines in the table picture. In the case that the table pictures have table lines, the table lines are directly extracted, and then the obtained first table layout and the extracted table line form the first The two table layouts are compared to verify whether the first table layout is valid. This application uses the text positioning network and the text recognition network to parse the table pictures, which can be compatible with the situations where there is no table line and the table line or the table line is incomplete, and the scope of application is wide.

The computer-readable storage medium provided in the embodiment of the present application can implement the above-mentioned embodiment of the table analysis method based on text positioning and recognition. For specific function realization, please refer to the description in the method embodiment, and will not be repeated here.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a computer-readable storage medium. When the program is executed, it may include the procedures of the above-mentioned method embodiments. Among them, the aforementioned storage medium may be a magnetic disk, an optical disk, a read-only storage memory (Read-Only Non-volatile storage media such as Memory, ROM, or Random Access Memory (RAM), etc.

The technical features of the above-mentioned embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, All should be considered as the scope of this specification.

The above-mentioned embodiments only express several implementation manners of this application, and their descriptions are more specific and detailed, but they should not be construed as limiting the scope of this application. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

A method for analyzing table pictures based on text positioning and recognition, characterized in that the method includes:

Input the form picture to the pre-trained text positioning network to obtain the position information of the characters in the form picture;

Graphically segment the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain cell character content;

Extracting the first table layout of the table picture according to the position information;

Generating a table file of the table picture according to the first table layout and the cell character content;

Wherein, the input form picture to a pre-trained text positioning network to obtain position information of characters in the form picture includes:

Input form picture to pre-trained text positioning network;

Acquiring several consecutive character strings in the table picture as a character string combination;

Obtaining the smallest rectangular frame surrounding the character string combination;

A rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
The method for analyzing table images based on text positioning and recognition according to claim 1, characterized in that it further comprises:

Input the sample of the table picture to train the deep network to train the text positioning network and the text recognition network.
The method for analyzing table pictures based on text positioning and recognition according to claim 1, wherein said extracting a first table layout of said table pictures according to said position information comprises:

Extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information;

Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points on the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row;

Calculate the total number of rows and the total number of columns as the first table layout.
The method for analyzing table pictures based on text positioning and recognition according to claim 1, wherein the generating a table file of the table pictures according to the first table layout and the cell character content includes:

Draw a table according to the first table layout;

Filling the cell characters into the cells of the drawn table correspondingly to generate a table file of the table picture.
The method for analyzing table pictures based on text positioning and recognition according to claim 1, wherein after extracting the first table layout of the table pictures according to the position information, the method comprises:

Detecting whether the table picture contains grid lines;

If the table picture contains grid lines, extract the second table layout of the table picture;

The second table layout is compared with the first table layout, and when the comparison result is that the first table layout is consistent with the second table layout, it is verified that the first table layout is valid.
The method for analyzing table pictures based on text positioning and recognition according to claim 5, wherein after generating the first table layout of the table pictures according to the position information, the method comprises:

Calculate the comparison result of the second table layout and the first table layout, and when the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, retrain The text positioning network.
A form picture analysis device based on text location recognition, characterized in that the form picture analysis device based on text location recognition includes:

The input module is used to input the form picture to the pre-trained text positioning network to obtain the position information of the characters in the form picture;

The recognition module is configured to perform graphic segmentation on the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain a cell Grid character content;

An extraction module, configured to extract the first table layout of the table picture according to the position information;

A generating module, configured to generate a table file of the table picture according to the first table layout and the cell character content;

Wherein, the input module is also used for:

Input form picture to pre-trained text positioning network;

Acquiring several consecutive character strings in the table picture as a character string combination;

Obtaining the smallest rectangular frame surrounding the character string combination;

A rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
8. The form image analysis device based on text location recognition according to claim 7, wherein the form image analysis device based on text location recognition further comprises:

The training module inputs the samples of the table pictures to train the deep network, and trains the text positioning network and the text recognition network.
8. The form image analysis device based on text positioning and recognition according to claim 7, wherein the extraction module is further configured to:

Extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information;

Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points on the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row;

Calculate the total number of rows and the total number of columns as the first table layout.
8. The form image analysis device based on text location recognition according to claim 7, wherein the form image analysis device based on text location recognition further comprises:

Detecting whether the table picture contains grid lines;

If the table picture contains grid lines, extract the second table layout of the table picture;

The second table layout is compared with the first table layout, and when the comparison result is that the first table layout is consistent with the second table layout, it is verified that the first table layout is valid.
10. The form image analysis device based on text location recognition according to claim 10, wherein the form image analysis device based on text location recognition further comprises:

Calculate the comparison result of the second table layout and the first table layout, and when the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, retrain The text positioning network.
A computer-readable storage medium, characterized in that computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the following steps are implemented:

Input the form picture to the pre-trained text positioning network to obtain the position information of the characters in the form picture;

Graphically segment the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain cell character content;

Extracting the first table layout of the table picture according to the position information;

Generating a table file of the table picture according to the first table layout and the cell character content;

Wherein, the input form picture to a pre-trained text positioning network to obtain position information of characters in the form picture includes:

Input form picture to pre-trained text positioning network;

Acquiring several consecutive character strings in the table picture as a character string combination;

Obtaining the smallest rectangular frame surrounding the character string combination;

A rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
The computer-readable storage medium of claim 12, wherein the extracting the first table layout of the table picture according to the location information comprises:

Extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information;

Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points on the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row;

Calculate the total number of rows and the total number of columns as the first table layout.
The computer-readable storage medium according to claim 12, wherein the generating a table file of the table picture according to the first table layout and the cell character content comprises:

Draw a table according to the first table layout;

Filling the cell characters into the cells of the drawn table correspondingly to generate a table file of the table picture.
The computer-readable storage medium according to claim 12, wherein after the extracting the first table layout of the table picture according to the location information, the method comprises:

Detecting whether the table picture contains grid lines;

If the table picture contains grid lines, extract the second table layout of the table picture;

The second table layout is compared with the first table layout, and when the comparison result is that the first table layout is consistent with the second table layout, it is verified that the first table layout is valid.
15. The computer-readable storage medium according to claim 15, wherein after the generating the first table layout of the table picture according to the location information, the method comprises:

Calculate the comparison result of the second table layout and the first table layout, and when the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, retrain The text positioning network.
A computer device, characterized by comprising a memory and a processor, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the following steps are implemented:

Input the form picture to the pre-trained text positioning network to obtain the position information of the characters in the form picture;

Graphically segment the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain cell character content;

Extracting the first table layout of the table picture according to the position information;

Generating a table file of the table picture according to the first table layout and the cell character content;

Wherein, the input form picture to a pre-trained text positioning network to obtain position information of characters in the form picture includes:

Input form picture to pre-trained text positioning network;

Acquiring several consecutive character strings in the table picture as a character string combination;

Obtaining the smallest rectangular frame surrounding the character string combination;

A rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
18. The computer device of claim 17, further comprising:

Input the sample of the table picture to train the deep network to train the text positioning network and the text recognition network.
18. The computer device of claim 17, wherein the extracting the first table layout of the table picture according to the location information comprises:

Extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information;

Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points on the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row;

Calculate the total number of rows and the total number of columns as the first table layout.
18. The computer device of claim 17, wherein the generating a table file of the table picture according to the first table layout and the cell character content comprises:

Draw a table according to the first table layout;

Filling the cell characters into the cells of the drawn table correspondingly to generate a table file of the table picture.