CN112949443B

CN112949443B - Table structure identification method and device, electronic equipment and storage medium

Info

Publication number: CN112949443B
Application number: CN202110206569.3A
Authority: CN
Inventors: 王文浩
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-02-24
Filing date: 2021-02-24
Publication date: 2023-07-25
Anticipated expiration: 2041-02-24
Also published as: WO2022178994A1; CN112949443A

Abstract

The invention relates to the technical field of data analysis, and discloses a table structure identification method, which comprises the following steps: acquiring a training data set and constructing a label; training the original table structure recognition model by using the training data set and the label to obtain a standard table structure recognition model; acquiring a form page to be identified, and constructing document node characteristics and form line characteristics; performing table detection and recognition on the document node characteristics and the table line characteristics by using the standard table structure recognition model to obtain a predicted table structure relation; and carrying out reduction processing on the to-be-identified form page according to the predicted form structure relationship to obtain a form structure. In addition, the invention also relates to a blockchain technology, and the to-be-identified table page can be stored in a node of the blockchain. The invention also provides a table structure identification device, electronic equipment and a computer readable storage medium. The invention can solve the problems of poor dependence on images and poor table recognition effect.

Description

Table structure identification method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of data analysis technologies, and in particular, to a method and apparatus for identifying a table structure, an electronic device, and a computer readable storage medium.

Background

With the advent of the big data age, how to obtain key and valuable information from massive data is becoming more and more important. If information is extracted from patient bill of charge, laboratory sheet, physical examination report and other documents in each big hospital and physical examination institution, the subsequent diagnosis efficiency of doctors can be improved. The table structure in the document can clearly show the logical and quantitative relation of the original document data, and a lot of information is usually presented in the form of a table, so that the table structure is required to be restored before the information is extracted from the table.

The conventional table structure identification adopts an image processing-based method, and adopts a detection or segmentation method in an image to identify and restore the table structure. However, the method is highly dependent on image quality, and when the image quality is low, the background is complex, and the table color shading is obvious, the detection and identification effects of the table structure are poor, and meanwhile, the method does not have good generalization capability.

Disclosure of Invention

The invention provides a method and a device for identifying a table structure and a computer readable storage medium, and mainly aims to solve the problems of poor dependence on images and poor table identification effect.

In order to achieve the above object, the present invention provides a method for identifying a table structure, including:

acquiring a training data set and constructing a label of the training data set;

training the pre-constructed original table structure recognition model by utilizing the training data set and the label to obtain a standard table structure recognition model;

acquiring a form page to be identified, and constructing document node characteristics and form line characteristics of the form page to be identified;

performing table detection and recognition on the document node characteristics and the table line characteristics by using the standard table structure recognition model to obtain a predicted table structure relation;

and carrying out reduction processing on the to-be-identified form page according to the predicted form structure relationship to obtain a form structure.

Optionally, the acquiring a training data set includes:

crawling a plurality of PDF documents from a webpage, and analyzing and screening the PDF documents to obtain a plurality of form pages;

converting each form page into a page picture, and performing text detection and recognition on the page picture to obtain a recognition result;

and deleting the page pictures which do not accord with the preset rule in the page pictures according to the identification result to obtain a training data set.

Optionally, the constructing the label of the training data set includes:

performing text box detection and recognition on the training data set to obtain a plurality of text boxes;

taking each text box in the plurality of text boxes as a node, and judging the adjacent relation between any two nodes according to a preset relation condition to obtain a table structure relation;

and constructing an adjacent matrix according to the table structure relationship to obtain the label.

Optionally, training the pre-constructed original table structure recognition model by using the training data set and the label to obtain a standard table structure recognition model, including:

preprocessing the training data set to obtain training characteristics;

performing table recognition on the training features through the original table structure recognition model to obtain a relation prediction matrix;

calculating a loss value of the relation prediction matrix according to the label and a preset loss function;

and adjusting parameters of the original table structure identification model according to the loss value, and returning to the step of carrying out table identification on the training features through the original table structure identification model to obtain a relation prediction matrix until the loss value is not reduced any more to obtain a standard table structure identification model.

Optionally, the constructing the document node feature and the form line feature of the form page to be identified includes:

performing text box detection and recognition on the form page to be recognized to obtain a text box, wherein the text box comprises a plurality of text strips and corresponding text box coordinates;

constructing the position characteristics of the text box according to the text box coordinates of the text box;

constructing text features of the text box according to the text bars of the text box;

constructing line type characteristics of the text box according to a preset line rule;

collecting the position characteristics of the text box, the text characteristics of the text box and the line type characteristics of the text box to obtain document node characteristics;

performing table line detection on the to-be-identified table page to obtain table grid lines;

constructing the position characteristics of the table grid lines according to the endpoint coordinates of the table grid lines;

constructing text features of the table grid lines according to preset text conditions;

constructing line type characteristics of the table lines according to the types of the table lines;

and collecting the position features of the table lines, the text features of the table lines and the type features of the table lines to obtain the table line features.

Optionally, the performing table detection and recognition on the document node feature and the table line feature by using the standard table structure recognition model to obtain a predicted table structure relationship includes:

integrating the document node characteristics and the form line characteristics to obtain input characteristics;

extracting the characteristics of the input characteristics by utilizing a translation layer of the standard table structure recognition model to obtain node characteristics;

performing bilinear transformation on the node characteristics by utilizing a transformation layer of the standard table structure identification model to obtain edge characteristics;

and carrying out relation prediction on the edge characteristics by utilizing the full connection layer of the standard table structure recognition model to obtain a predicted table structure relation, wherein the predicted table structure relation comprises a table relation, a row relation and a column relation.

Optionally, the restoring processing is performed on the form page to be identified according to the predicted form structure relationship to obtain a form structure, including:

performing text box detection and recognition on the form page to be recognized to obtain a plurality of text boxes;

taking each text box as a node according to the table relation in the predicted table structural relation, and constructing an undirected graph to obtain a table relation graph;

Dividing the nodes into a plurality of table sets by solving connected components of the table relation graph;

respectively constructing a row relationship diagram and a column relationship diagram for each table set according to the row relationship and the column relationship of the predicted table structure relationship;

solving a row maximum group in the row relation diagram by using a maximum group algorithm, and sequencing the row maximum groups from large to small according to the ordinate of the row maximum group to obtain a row set;

solving a column maximum group in the column relation graph by using a maximum group algorithm, and sequencing the column maximum groups from small to large according to the abscissa of the column maximum group to obtain a column set;

and integrating the row set and the column set to obtain a table structure.

In order to solve the above problems, the present invention also provides a table structure identifying apparatus, the apparatus comprising:

the data acquisition module is used for acquiring a training data set and constructing a label of the training data set;

the model training module is used for training the pre-constructed original table structure recognition model by utilizing the training data set and the label to obtain a standard table structure recognition model;

the characteristic construction module is used for acquiring a form page to be identified and constructing document node characteristics and form line characteristics of the form page to be identified;

The table detection module is used for carrying out table detection and identification on the document node characteristics and the table line characteristics by using the standard table structure identification model to obtain a predicted table structure relation;

and the table reduction module is used for carrying out reduction processing on the to-be-identified table pages according to the predicted table structure relationship to obtain a table structure.

In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:

a memory storing at least one instruction; a kind of electronic device with high-pressure air-conditioning system

And the processor executes the instructions stored in the memory to realize the table structure identification method.

In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored therein at least one instruction that is executed by a processor in an electronic device to implement the above-mentioned table structure identification method.

The invention trains the pre-constructed original table structure recognition model by utilizing the training data set and the label, wherein the training data set is a table data set in the general field and comprises a large number of table structures, so that the accuracy of a table recognition result can be improved; meanwhile, the input of the table structure recognition model is document node characteristics and table line characteristics, the output is predicted table structure relation, the table structure is restored by restoring the table structure relation, and the table is restored by a non-image processing method, so that the dependency on the image quality can be effectively avoided, the image recognition error caused by low image quality, complex background or obvious table color shading is reduced, and the recognition accuracy is improved. Therefore, the method, the device, the electronic equipment and the computer readable storage medium for identifying the table structure can solve the problems of poor dependence on images and poor table identification effect.

Drawings

FIG. 1 is a flowchart illustrating a method for identifying a table structure according to an embodiment of the present invention;

FIG. 2 is a functional block diagram of a table structure recognition device according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device for implementing the method for identifying a table structure according to an embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The embodiment of the application provides a table structure identification method. The execution subject of the table structure identification method includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided in the embodiments of the present application. In other words, the table structure identification method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.

Referring to fig. 1, a flow chart of a table structure recognition method according to an embodiment of the invention is shown. In this embodiment, the table structure identifying method includes:

s1, acquiring a training data set and constructing a label of the training data set.

In the embodiment of the invention, the training data set is a picture set containing a table structure, such as a picture converted from a PDF document containing table contents.

In detail, the acquiring the training data set includes:

For example, a large number of PDF documents in the general field are crawled from the internet, and the PDF documents are parsed by using a pdfplumber library to obtain PDF parsing results. Screening pages containing tables from the PDF document according to the analysis result to obtain a plurality of table pages, converting each table page into a picture by using a poppler tool, and detecting and identifying OCR text frames to obtain an OCR analysis result; and comparing and verifying the PDF analysis result with the OCR analysis result according to a predefined rule (such as that the number of the tables in the PDF analysis result of the same page is the same as that of the tables in the OCR analysis result, and the like), deleting the pictures which do not accord with the rule from the pictures containing the tables, and obtaining a training data set.

In detail, the constructing the tag of the training data set includes:

Further, the embodiment of the invention utilizes the OCR technology to detect and identify the text box of the training data set, and the preset relation condition comprises: whether two nodes belong to the same table (table), whether two nodes belong to the same row (row), and whether two nodes belong to the same column (col); the adjacency relationship comprises a table relationship, a row relationship and a column relationship; the table structure relationship includes a plurality of adjacency relationships.

The construction of the adjacency matrix according to the table structure relation is to consider a text box of a cell in a table of a training data set as a node in a Graph (Graph), if an adjacency relation exists between any two nodes, an element at a corresponding position in the adjacency matrix is 1, otherwise, the element is 0, and three adjacency matrices are constructed according to the adjacency relation and serve as labels for subsequent model training.

S2, training the pre-constructed original table structure recognition model by utilizing the training data set and the label to obtain a standard table structure recognition model.

The original table structure recognition model in the embodiment of the invention is a bert model based on a transformer structure, and can predict the adjacent relation of each node in the page according to the input characteristics.

In detail, training the pre-constructed original table structure recognition model by using the training data set and the label to obtain a standard table structure recognition model, including:

preprocessing the training data set to obtain training characteristics;

Wherein, the preset loss function can use a mean square error loss function or a cross entropy loss function.

Further, the preprocessing the training data set includes:

text box detection and recognition are carried out on the training data set to obtain a text detection result, and document node characteristics are constructed according to the text detection result;

performing table line detection on the training data set to obtain a detection result, and constructing table line characteristics according to the detection;

and merging the document node characteristics and the table line characteristics to obtain training characteristics.

S3, acquiring a form page to be identified, and constructing document node characteristics and form line characteristics of the form page to be identified.

The form page to be identified in the embodiment of the invention can be an actual bill picture in the medical field or a picture converted from a document containing a form in the medical field. The form page to be identified may be obtained from a pre-built database. To further ensure the privacy and security of the form page to be identified, the form page to be identified may also be obtained from a node of a blockchain.

In detail, the construction of the document node characteristics and the form line characteristics of the form page to be identified comprises the following steps:

and collecting the position characteristics of the table grid lines, the text characteristics of the table lines and the line type characteristics of the table grid lines to obtain the table line characteristics.

For example: each text box detected by the form page to be identified is taken as a node, and N nodes are assumed, wherein the document node characteristics D_F (N) = (F (1), F (2),. The total of F (N)), and the characteristics F (x) = [ x1, y1, x2, y2, (x1+x2)/2, (y1+y2)/2, x2-x1, y2-y1, num, char, space, other, 0] of each node. Wherein, [ x1, y1, x2, y2, (x1+x2)/2, (y1+y2)/2, x2-x1, y2-y1] is a position feature, (x 1, y 1) is coordinates of an upper left corner of the text box, (x 2, y 2) is coordinates of a lower right corner of the text box, ((x1+x2)/2, (y1+y2)/2) represents a center point of the text box, x2-x1 represents a width of the text box, and y2-y1 represents a height of the text box; [ num, char, space, other ] is a text feature, num, char, space, other representing the frequency of numbers, letters, spaces, or other types in the text box, respectively, calculated from the content of the text bar; the last two bits 0,0 are line type features, set to 0 according to a preset line rule, indicating that the node here is a text box, neither a horizontal line nor a vertical line.

For example: assuming that a total of M horizontal and vertical lines are detected in the form page to be identified, the form line feature l_f (M) = (F '(1), F' (2),... Wherein, [ x1', y1', x2', y2', (x 1 '+x2')/2, (y 1 '+y2')/2, x2'-x1', y2'-y1' ] are location features, (x 1', y 1') are left end coordinates of the horizontal line or upper end coordinates of the vertical line, (x 2', y 2') are right end coordinates of the horizontal line or lower end coordinates of the vertical line, ((x 1 '+x2')/2, (y 1 '+y2')/2) represents the midpoint of the table grid line, and x2'-x1', y2'-y1' represent the lengths of the horizontal line and the vertical line, respectively; the middle four bits [0, 0] are text features, are set to 0 according to preset text conditions, represent non-text nodes, the last two bits [0,1] are line type features, [0,1] represents line types, and [1,0] represents line types as vertical lines.

In the embodiment of the invention, the document node characteristics are information of text strips in the form page to be identified, namely, content information of the form, the form line characteristics are information of form lines in the form page to be identified, namely, frame information of the form, and the form page to be identified is represented by constructing the document node characteristics and the form line characteristics.

And S4, carrying out table detection and recognition on the document node characteristics and the table line characteristics by using the standard table structure recognition model to obtain a predicted table structure relation.

In detail, the S4 includes:

performing bilinear transformation on the node characteristic input by utilizing a transformation layer of the standard table structure recognition model to obtain edge characteristics;

Further, the table relationship, the row relationship and the column relationship in the predicted table structure relationship refer to the table relationship, the row relationship and the column relationship between any two document nodes in the table page to be identified.

Optionally, in the embodiment of the present invention, the standard table structure recognition model includes a translation layer, a transformation layer, and a full connection layer, where the translation layer obtains each node feature corresponding to a to-be-recognized table page by performing operations such as encoding and decoding on an input feature; the transformation layer obtains the characteristics between any two nodes, namely edge characteristics, through linear transformation; and the full-connection layer calculates the relation characteristic (adjacent relation) between any two nodes through preset parameters to obtain the predicted table structure relation.

S5, carrying out reduction processing on the to-be-identified form page according to the predicted form structure relation to obtain a form structure.

In detail, the restoring processing is performed on the form page to be identified according to the predicted form structure relationship to obtain a form structure, including:

and integrating the row set and the column set to obtain a table structure.

Further, the largest connected subgraph of the undirected graph is called the connected component of G, and any connected graph has only one connected component, i.e. is itself, and the undirected graph which is not connected has a plurality of connected components. A table can be regarded as a connected graph, and a plurality of tables in the to-be-identified table page can be divided by solving connected components.

Alternatively, a blob refers to a complete sub-graph of an undirected graph, a biggest blob being a locally largest blob, which is referred to as a biggest blob of the graph if it is not contained by any other blob, i.e., it is not a proper subset of any other blob.

The maximum clique algorithm is an algorithm for solving all maximum cliques of one undirected graph, and specifically comprises: generating all subgraphs of the undirected graph; judging whether the subgraph is a group or not, and deleting the subgraph which is not the group to obtain the group; judging whether the cluster is a maximum cluster or not, and deleting the cluster which is not the maximum cluster to obtain the maximum cluster.

According to the embodiment of the invention, the row information and the column information in one table can be obtained by respectively solving the maximum groups of the row relation diagram and the column relation diagram.

The embodiment of the invention realizes the restoration of the table structure by a non-image processing method, and can greatly reduce the dependence on the image quality.

Fig. 2 is a functional block diagram of a table structure recognition device according to an embodiment of the present invention.

The table structure recognition apparatus 100 of the present invention may be installed in an electronic device. Depending on the implemented functions, the table structure identifying apparatus 100 may include a data acquisition module 101, a model training module 102, a feature construction module 103, a table detection module 104, and a table restoration module 105. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.

In the present embodiment, the functions concerning the respective modules/units are as follows:

the data acquisition module 101 is configured to acquire a training data set, and construct a label of the training data set.

In detail, when acquiring the training data set, the data acquisition module 101 specifically performs the following operations:

In detail, in constructing the tag of the training data set, the data acquisition module 101 specifically performs the following operations:

The model training module 102 is configured to train the pre-constructed original table structure recognition model by using the training data set and the label, so as to obtain a standard table structure recognition model.

In detail, the model training module 102 is specifically configured to:

preprocessing the training data set to obtain training characteristics;

Further, the preprocessing the training data set includes:

The feature construction module 103 is configured to obtain a form page to be identified, and construct document node features and form line features of the form page to be identified.

In detail, when the document node feature and the form line feature of the form page to be identified are constructed, the feature construction module 103 specifically performs the following operations:

The table detection module 104 is configured to perform table detection and recognition on the document node feature and the table line feature by using the standard table structure recognition model, so as to obtain a predicted table structure relationship.

In detail, the table detection module 104 is specifically configured to:

The table restoration module 105 is configured to restore the to-be-identified table page according to the predicted table structure relationship, so as to obtain a table structure.

In detail, the table restoration module 105 is specifically configured to:

and integrating the row set and the column set to obtain a table structure.

Fig. 3 is a schematic structural diagram of an electronic device for implementing a table structure recognition method according to an embodiment of the present invention.

The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a table structure identification program 12, stored in the memory 11 and executable on the processor 10.

The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of the table structure recognition program 12, but also for temporarily storing data that has been output or is to be output.

The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective components of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device 1 and processes data by running or executing programs or modules (e.g., a table structure recognition program, etc.) stored in the memory 11, and calling data stored in the memory 11.

The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.

Fig. 3 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.

For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.

Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.

The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

The table structure identification program 12 stored in the memory 11 in the electronic device 1 is a combination of a plurality of instructions, which when run in the processor 10, can realize:

Specifically, the specific implementation method of the above instructions by the processor 10 may refer to descriptions of related steps in the corresponding embodiments of fig. 1 to 3, which are not repeated herein.

Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable storage medium may be volatile or nonvolatile. For example, the computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).

The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, can implement:

In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. A method of identifying a table structure, the method comprising:

acquiring a training data set, carrying out text box detection and recognition on the training data set to obtain a plurality of text boxes, taking each text box in the plurality of text boxes as a node, judging the adjacent relation between any two nodes according to a preset relation condition to obtain a table structure relation, and constructing an adjacent matrix according to the table structure relation to obtain a label;

training the pre-constructed original table structure recognition model by utilizing the training data set and the label to obtain a standard table structure recognition model, wherein the standard table structure recognition model comprises a translation layer, a transformation layer and a full connection layer;

integrating the document node characteristics and the table line characteristics to obtain input characteristics, encoding and decoding the input characteristics by utilizing the translation layer to obtain node characteristics, performing bilinear transformation on the node characteristic input between any two nodes by utilizing the transformation layer to obtain edge characteristics, and obtaining a predicted table structure relationship by utilizing the full-connection layer to obtain an adjacent relationship between any two nodes, wherein the predicted table structure relationship comprises a table relationship, a row relationship and a column relationship;

2. The method of claim 1, wherein the acquiring a training dataset comprises:

3. The method for identifying a table structure according to claim 1, wherein training the pre-constructed original table structure identification model by using the training data set and the tag to obtain a standard table structure identification model comprises:

preprocessing the training data set to obtain training characteristics;

4. The method for identifying a form structure according to claim 1, wherein the constructing the document node feature and the form line feature of the form page to be identified comprises:

5. The method for identifying a table structure according to claim 1, wherein the restoring the table page to be identified according to the predicted table structure relationship to obtain the table structure includes:

and integrating the row set and the column set to obtain a table structure.

6. A form structure identification device, the device comprising:

the data acquisition module is used for acquiring a training data set, carrying out text box detection and recognition on the training data set to obtain a plurality of text boxes, taking each text box in the plurality of text boxes as a node, judging the adjacent relation between any two nodes according to a preset relation condition to obtain a table structure relation, and constructing an adjacent matrix according to the table structure relation to obtain a label;

the model training module is used for training a pre-constructed original table structure identification model by utilizing the training data set and the label to obtain a standard table structure identification model, and the standard table structure identification model comprises a translation layer, a transformation layer and a full connection layer;

The table detection module is used for integrating the document node characteristics and the table line characteristics to obtain input characteristics, the translation layer is used for encoding and decoding the input characteristics to obtain node characteristics, the transformation layer is used for carrying out bilinear transformation on the node characteristic input between any two nodes to obtain edge characteristics, and the full-connection layer is used for carrying out adjacent relation between any two nodes to obtain a predicted table structure relation, wherein the predicted table structure relation comprises a table relation, a row relation and a column relation;

7. An electronic device, the electronic device comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the table structure identification method of any one of claims 1 to 5.

8. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the table structure identification method according to any one of claims 1 to 5.