CN113936287A

CN113936287A - Table detection method and device based on artificial intelligence, electronic equipment and medium

Info

Publication number: CN113936287A
Application number: CN202111222463.9A
Authority: CN
Inventors: 雷田子
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2021-10-20
Filing date: 2021-10-20
Publication date: 2022-01-14

Abstract

The invention relates to the technical field of artificial intelligence, and provides a form detection method, a form detection device, electronic equipment and a form detection medium based on artificial intelligence, wherein the method comprises the following steps: in response to an instruction for detecting a table in a table picture, calling a pre-trained table area detection model to detect a table area in the table picture; extracting text lines in the table area and determining position vectors of the text lines; generating a target network structure chart according to the text feature vector of the text line and the position vector of the text line; inputting the target network structure diagram into a graph convolution neural network trained in advance, and acquiring text entity labels of any two text lines output by the graph convolution neural network; and determining the cells in the table area according to the text entity labels. The invention can efficiently detect the table from the table picture, and the detection accuracy of the table is higher.

Description

Table detection method and device based on artificial intelligence, electronic equipment and medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a form detection method and device based on artificial intelligence, electronic equipment and a medium.

Background

The form is a common page object in various documents, and form detection is firstly carried out to correctly extract data in the form. The form detection task is to detect the area where the form is located from a picture, and the form structure identification is to identify the content and the logic structure of the form after taking the form.

At present, a method based on layout analysis and rules and a method based on deep learning are popular, however, the inventor finds that, in the process of implementing the present invention, because the sizes, the types and the styles of the tables are complex and various, for example, a row and column combination method, background filling, text types and the like in different tables are all different, the method based on the layout analysis and the rules is used for analyzing layout characteristics of the tables in the document, and then the characteristics are used for formulating some rules to further extract contents in the tables, so that the process is very complicated and the generalization performance is poor. The deep learning-based method needs a large amount of labeled data to train the model, and the table in the document usually has unknown information such as missing rules, rows, columns and the like, so the recognition accuracy is low.

Disclosure of Invention

In view of the above, it is necessary to provide a form detection method, apparatus, electronic device and medium based on artificial intelligence, which can efficiently detect a form from a form picture and identify the content in the form.

A first aspect of the present invention provides a table detection method based on artificial intelligence, the method comprising:

in response to an instruction for detecting a table in a table picture, calling a pre-trained table area detection model to detect a table area in the table picture;

extracting text lines in the table area and determining position vectors of the text lines;

generating a target network structure chart according to the text feature vector of the text line and the position vector of the text line;

inputting the target network structure diagram into a graph convolution neural network trained in advance, and acquiring text entity labels of any two text lines output by the graph convolution neural network;

and determining the cells in the table area according to the text entity labels.

According to an optional embodiment of the present invention, the training process of the table area detection model includes:

initializing a random number set, wherein each random number in the random number set is smaller than a preset threshold value;

distributing the random numbers in the random number set to convolution layers of a convolution neural network to obtain an initial table area detection model;

acquiring a sample table picture, and acquiring marking table information corresponding to the sample table picture;

and respectively taking the sample table picture and the corresponding labeled table information as the input and the expected output of the initial table area detection model, and training the initial table area detection model to obtain the table area detection model.

According to an optional embodiment of the present invention, the obtaining a sample table picture and obtaining annotation table information corresponding to the sample table picture includes:

determining a table area of the sample table and cell information and text information of the sample table;

generating a table picture according to the table area, the cell information and the text information;

and determining the table picture as the sample table picture, and determining the cell information as the labeling table information.

According to an optional embodiment of the present invention, the determining the position vector of the text line comprises:

acquiring vertex coordinates of a text box corresponding to the text line;

calculating to obtain a center coordinate according to the vertex coordinate;

calculating the width and height of the text line according to the vertex coordinates;

generating a position vector for the text line based on the vertex coordinates, the center coordinates, the width, and the height.

According to an optional embodiment of the present invention, the generating a target network structure diagram according to the text feature vector of the text line and the position vector of the text line includes:

regarding each text line as a vertex, and calculating the space distance between the vertex coordinates of other vertices and the vertex coordinates of the vertex for any vertex;

and constructing a non-directional edge between the vertex and the other vertexes with the closest spatial distance to obtain the target network structure chart.

According to an optional embodiment of the present invention, the determining the cells in the table area according to the text entity labels comprises:

acquiring a target line probability of which the line probability in the text entity label is greater than a preset line probability threshold;

acquiring a target column probability of which the column probability in the text entity label is greater than a preset column probability threshold;

acquiring a target text row corresponding to the target row probability and the target column probability;

and confirming the area of the target text line as a cell.

According to an alternative embodiment of the invention, the method further comprises:

identifying content in the cell using a variable convolutional neural network; and/or

Based on the line probability that any two text lines in the text entity label are in the same line and the column probability in the same column, performing structural recombination on the text lines to reconstruct the table area into a structural table.

A second aspect of the present invention provides an artificial intelligence based form detection apparatus, the apparatus comprising:

the detection module is used for responding to an instruction for detecting a table in the table picture, and calling a pre-trained table area detection model to detect a table area in the table picture;

the extraction module is used for extracting the text lines in the table area and determining the position vectors of the text lines;

the generating module is used for generating a target network structure diagram according to the text characteristic vector of the text line and the position vector of the text line;

the acquisition module is used for inputting the target network structure diagram into a graph convolution neural network trained in advance and acquiring text entity labels of any two text lines output by the graph convolution neural network;

and the determining module is used for determining the cells in the table area according to the text entity labels.

A third aspect of the invention provides an electronic device comprising a processor for implementing the artificial intelligence based table detection method when executing a computer program stored in a memory.

A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the artificial intelligence based form detection method.

In summary, the artificial intelligence based form detection method, apparatus, electronic device, and medium according to the present invention, in response to an instruction for detecting a form in a form picture, invoke a pre-trained form region detection model to detect a form region in the form picture, extract a text line in the form region, after determining a position vector of the text line, generate a target network structure diagram according to a text feature vector of the text line and the position vector of the text line, so as to input the target network structure diagram into a pre-trained graph convolution neural network, obtain text entity labels of any two text lines output by the graph convolution neural network, and determine a cell in the form region according to the text entity labels. The method is based on the graph convolution neural network (GCN) to detect the structured layout of the table in the text picture, and different from the traditional method for directly detecting the table position, the method identifies the table position by fusing the text feature vector and the position vector, can carry out accurate edge prediction on the table, obtains better effect, is a pure structure perception method, and does not need to rely on language and text identification quality.

Drawings

Fig. 1 is a flowchart of a table detection method based on artificial intelligence according to an embodiment of the present invention.

Fig. 2 is a structural diagram of an artificial intelligence based table detection apparatus according to a second embodiment of the present invention.

Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The form detection method based on artificial intelligence provided by the embodiment of the invention is executed by electronic equipment, and correspondingly, the form detection device based on artificial intelligence operates in the electronic equipment.

Example one

Fig. 1 is a flowchart of a table detection method based on artificial intelligence according to an embodiment of the present invention. The table detection method based on artificial intelligence specifically comprises the following steps, and the sequence of the steps in the flowchart can be changed and some steps can be omitted according to different requirements.

And S11, responding to the instruction for detecting the table in the table picture, and calling a pre-trained table area detection model to detect the table area in the table picture.

The table picture refers to a picture including a text and a table. Identifying the table in the table picture refers to detecting a table area in the table picture, analyzing the table area, and extracting cell contents in the table area.

The instruction for detecting the form in the form picture may be triggered when the user uploads the form picture to the electronic device, or may be triggered at a preset time point or within a preset time period after uploading the form picture to the electronic device, which is not limited in the present invention.

The form picture corresponding to the form to be identified can be collected in a scanning or photographing mode, and for electronic documents containing the form, such as html pages, PDF files, doc files and the like, the form picture can be collected in a document screenshot mode.

The electronic device may train a Convolutional Neural Network (CNN) on a data set including hundreds of thousands of table pictures, resulting in a table region detection model for detecting a table region in a table picture.

In an optional embodiment, the training process of the table area detection model includes:

and respectively taking the sample table picture and the corresponding labeled table information as the input and the expected output of the initial table area detection model, and training the initial table area detection model to obtain a table area detection model.

The convolutional neural network CNN is a multi-layer neural network, and includes an input layer, a hidden layer, and an output layer, where the hidden layer includes a convolutional layer, a pooling layer, an excitation function layer, and so on. The excitation functions in the excitation function layer may be the ReLU and various variant activation functions of the ReLU, Sigmoid functions, Tanh (hyperbolic tangent) functions, Maxout functions, etc.

The electronic device may generate a plurality of random numbers using a random number generation method, where the plurality of random numbers form a random number set, and the plurality of random numbers may include the number of convolution kernels in a convolution layer, the size of the convolution kernels, weights of respective neurons in the convolution kernels, a bias term corresponding to each convolution kernel, a step size between two adjacent convolutions, and the like.

The electronic equipment can obtain a plurality of sample form pictures as a training sample set, firstly, the sample form pictures are input into an initial form area detection model to obtain cell information corresponding to the sample form pictures, the difference between the obtained cell information and corresponding marked cell information is calculated, model parameters of the initial form area detection model are adjusted based on the calculated difference, and under the condition that a preset training end condition is met, training of the initial form area detection model is ended, so that the form area detection model is obtained. Model parameters of the initial table region detection model may be adjusted based on the obtained differences using Stochastic Gradient Descent (SGD), Newton's method, Quasi-Newton method, Conjugate Gradient method, heuristic optimization method, and various other optimization algorithms now known or developed in the future. The preset training end condition may include at least one of: the training time exceeds the preset duration, the training times exceeds the preset times, and the calculated difference is smaller than the preset difference threshold.

In this optional embodiment, by initializing random numbers smaller than a preset threshold, which is equivalent to using some small random numbers for parameters in the CNN, the small random numbers can ensure that the model CNN does not enter a saturated state due to an excessive weight, thereby ensuring that the training process can be smoothly performed.

In an optional embodiment, the obtaining a sample table picture and obtaining annotation table information corresponding to the sample table picture includes:

The table area of the sample table and the cell information included in the sample table may be set in a default designated manner or a random designated manner.

According to the optional implementation mode, table information does not need to be marked manually, so that the labor cost and the training time for training the table area detection model can be greatly reduced, and the training efficiency of the table area detection model is improved.

S12, extracting the text lines in the table area and determining the position vectors of the text lines.

OCR may be used to extract text lines in the table area and obtain location information of each text line, and determine location vectors of the text lines according to the location information.

In an optional embodiment, the determining the position vector of the text line includes:

acquiring vertex coordinates of a text box corresponding to the text line;

calculating to obtain a center coordinate according to the vertex coordinate;

After the text lines in the table region are extracted using OCR, the text lines may be framed in text boxes, one for each text line. The text box may be a rectangular box. The four vertex coordinates of the rectangular frame are the vertex coordinates corresponding to the text lines in the table picture.

Here, the vertex coordinates of the text line are expressed as absolute coordinates (x1, x2, y1, y2), where x2> x1 and y2> y1, which represent a rectangle made up of four vertex coordinates (x1, y1), (x1, y2), (x2, y1), and (x2, y 2).

The center coordinates are ((x1+ x2)/2, (y1+ y2)/2)), the width is (x2-x1), and the height is (y2-y 1).

The generated position vector is (vertex coordinates, center coordinates, width, height).

And S13, generating a target network structure diagram according to the text feature vector of the text line and the position vector of the text line.

The text lines may be viewed as vertices of the target network structure graph and the relationships between the text lines may be viewed as edges of the target network structure graph. Word2veb may be used to extract a character vector of each text character in the text line, and then the character vectors of each text character are spliced to obtain a text feature vector of the text line, and the text feature vector of the text line may be used as an attribute of a vertex. The relationship between text lines may be determined by a position vector of the text lines.

In an optional embodiment, the generating a target network structure diagram according to the text feature vector of the text line and the position vector of the text line includes:

and constructing a non-directional edge between the vertex and the other vertexes with the closest spatial distance to obtain a target network structure chart.

In the prior art, when a target network structure diagram is generated, a non-directional edge is constructed between any two vertexes. However, constructing a non-directional edge between any two vertices results in a large amount of computation.

In the optional implementation mode, the calculation amount can be greatly reduced by calculating the spatial distance between any two vertexes, determining the plurality of vertexes of the nearest neighbor according to the spatial distance and constructing the undirected edge between the plurality of vertexes of the nearest neighbor, so that the training time of the graph convolution neural network is reduced, and the training efficiency of the graph convolution neural network is improved.

S14, inputting the target network structure chart into a graph volume neural network trained in advance, and acquiring text entity labels of any two text lines output by the graph volume neural network.

And embedding the vertexes and edges of the target network structure diagram respectively, and classifying through a graph convolution neural network to obtain a text entity label of each text line, wherein the text entity label is used for judging the probability of a certain relation between the two vertexes. And obtaining adjacency matrixes representing the text row relation according to the probability between the two vertexes, and restoring the table structure according to the adjacency matrixes.

The graph convolutional neural network consists of an embedded layer, two graph residual blocks (GCN1 and GCN2), a node and an edge classifier. First, a multi-dimensional node space is projected to a higher order space encoding a single node feature using linear mapping (i.e., fully connected layer) at the embedding layer, and this new embedding will be used for neighboring nodes between the underlying shared information. Next, inputting the whole graph, performing convolution operation on the neighbors of each node once in GCN1, and updating the node by the convolution result; the training process may be repeated through the ReLU activation function, a layer of GCN2 and a layer of activation function until the desired depth is reached. The local output function of the GCN then transforms the state of the node into the label of the classification task. The node classifier allows for dropping connections between nodes that are unlikely to belong to the same area. This way lines of text with the same label can be extracted.

S15, determining the cells in the table area according to the text entity labels.

The text entity labels of any two text lines comprise a line probability and a column probability, and the line probability that any two text lines are located in the same line and the column probability that any two text lines are located in the same column are identified.

In an optional embodiment, the determining the cells in the table area according to the text entity labels includes:

and confirming the area of the target text line as a cell.

If the row probability is greater than the row probability threshold and the column probability is greater than the column probability threshold, it indicates that the corresponding two text rows are in the same row and in the same column, i.e., indicates that the corresponding two text rows correspond to one cell. If the row probability is not greater than the row probability threshold and/or the column probability is not greater than the column probability threshold, it indicates that the corresponding two text rows are in different rows and/or different columns, i.e., indicates that the corresponding two text rows correspond to different cells.

In an optional embodiment, after determining the cells in the table region according to the text entity labels, the method may further include: identifying content in the cell using a variable convolutional neural network.

And after the table structure is identified from the table picture, converting the table structure identification problem into a target detection problem. Conventional convolution operations have a fixed reception field, which is problematic for the top feature layer. As features may be present in any proportion or conversion in these layers. The identification of the table contents can therefore be done using a deformable convolutional neural network.

Since the deformable convolution operation is mainly used for the region where the complete object needs to be detected, all deformable convolution layers are located at the top of the feature hierarchy. To take advantage of the pre-trained normal ResNet model, the offset is initialized to zero and adjusted during training. The zero offset can be converted to a conventional convolutional network, making it directly equivalent to a non-deformable variable. Since a fully connected layer at the end of the network expects a fixed-size input, the ROI-posing layer is used to translate the results of the deformable convolution layer into fixed features while ensuring the differentiability of the results.

The method is based on the graph convolution neural network (GCN) to detect the structured layout of the table in the text picture, and different from the traditional method for directly detecting the table position, the method identifies the table position by fusing the text feature vector and the position vector, can carry out accurate edge prediction on the table, obtains better effect, is a pure structure perception method, and does not need to rely on language and text identification quality. After the table is detected, the cell content in the table is detected by using the deformable convolution network, so that the effect of general table detection is achieved.

In an optional embodiment, the method further comprises:

Since the form files such as the migrated document format (PDF) and the picture format are unstructured, which is not conducive to subsequent processing and application, the form is detected from the form picture and data and structure information in the form are extracted, and then the structure information is re-stacked into a new document, which is called form document reconstruction. Regarding the process of performing structured reorganization on the text lines based on the line probability that any two text lines in the text entity tag are in the same line and the column probability that any two text lines are in the same column, which is the prior art, the present invention is not elaborated.

Example two

In some embodiments, the artificial intelligence based form detection apparatus 20 may include a plurality of functional modules comprised of computer program segments. The computer program of each program segment in the artificial intelligence based form detection apparatus 20 may be stored in a memory of an electronic device and executed by at least one processor to perform (see detailed description of fig. 1) the functions of artificial intelligence based form identification.

In this embodiment, the artificial intelligence based form detection apparatus 20 may be divided into a plurality of functional modules according to the functions performed by the apparatus. The functional module may include: the system comprises a detection module 201, an extraction module 202, a generation module 203, an acquisition module 204, a determination module 205, a recognition module 206, a training module 207 and a reconstruction module 208. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.

The detection module 201 is configured to, in response to an instruction for detecting a table in a table picture, invoke a table area detection model trained in advance to detect a table area in the table picture.

The extracting module 202 is configured to extract a text line in the table area, and determine a position vector of the text line.

In an optional implementation, the extracting module 202 determines the position vector of the text line by:

acquiring vertex coordinates of a text box corresponding to the text line;

calculating to obtain a center coordinate according to the vertex coordinate;

The generating module 203 is configured to generate a target network structure diagram according to the text feature vector of the text line and the position vector of the text line.

In an optional embodiment, the generating module 203 generates the target network structure diagram according to the text feature vector of the text line and the position vector of the text line, including:

The obtaining module 204 is configured to input the target network structure diagram into a pre-trained convolutional neural network, and obtain text entity labels of any two text lines output by the convolutional neural network.

The determining module 205 is configured to determine the cells in the table area according to the text entity tag.

In an optional embodiment, the determining module 205 determines the cells in the table area according to the text entity tag includes:

and confirming the area of the target text line as a cell.

The identifying module 206 is configured to identify the content in the cell using a variable convolutional neural network after determining the cell in the table region according to the text entity tag.

The training module 207 is configured to train a table region detection model.

a table area of the sample table and cell information and text information of the sample table are determined.

And generating a table picture according to the table area, the cell information and the text information.

The reconstructing module 208 is configured to perform structured recombination on the text lines based on the line probability that any two text lines in the text entity tag are located in the same line and the column probability that any two text lines are located in the same column, so as to reconstruct the table region into a structured table.

EXAMPLE III

The present embodiment provides a computer-readable storage medium, which stores thereon a computer program, which when executed by a processor implements the steps in the above-mentioned artificial intelligence based table detecting method embodiment, such as S11-S15 shown in fig. 1:

s11, responding to an instruction for detecting the table in the table picture, and calling a pre-trained table area detection model to detect the table area in the table picture;

s12, extracting the text lines in the table area and determining the position vectors of the text lines;

s13, generating a target network structure diagram according to the text feature vector of the text line and the position vector of the text line;

s14, inputting the target network structure diagram into a pre-trained graph-convolution neural network, and acquiring text entity labels of any two text lines output by the graph-convolution neural network;

Alternatively, the computer program, when executed by the processor, implements the functions of the modules/units in the above-mentioned device embodiments, for example, the

module

201 and 208 in fig. 2:

the detection module 201 is configured to, in response to an instruction for detecting a table in a table picture, invoke a table area detection model trained in advance to detect a table area in the table picture;

the extracting module 202 is configured to extract a text line in the table area, and determine a position vector of the text line;

the generating module 203 is configured to generate a target network structure diagram according to the text feature vector of the text line and the position vector of the text line;

the obtaining module 204 is configured to input the target network structure diagram into a pre-trained convolutional neural network, and obtain text entity labels of any two text lines output by the convolutional neural network;

the determining module 205 is configured to determine a cell in the table area according to the text entity tag;

the identifying module 206, configured to identify the content in the cell using a variable convolutional neural network;

the training module 207 is used for training a table area detection model;

Example four

Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. In the preferred embodiment of the present invention, the electronic device 3 comprises a memory 31, at least one processor 32, at least one communication bus 33 and a transceiver 34.

It will be appreciated by those skilled in the art that the configuration of the electronic device shown in fig. 3 does not constitute a limitation of the embodiment of the present invention, and may be a bus-type configuration or a star-type configuration, and the electronic device 3 may include more or less other hardware or software than those shown, or a different arrangement of components.

In some embodiments, the electronic device 3 is a device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The electronic device 3 may also include a client device, which includes, but is not limited to, any electronic product that can interact with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, and the like.

It should be noted that the electronic device 3 is only an example, and other existing or future electronic products, such as those that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.

In some embodiments, the memory 31 has stored therein a computer program that, when executed by the at least one processor 32, performs all or part of the steps of the artificial intelligence based form detection method as described. The Memory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

In some embodiments, the at least one processor 32 is a Control Unit (Control Unit) of the electronic device 3, connects various components of the electronic device 3 by various interfaces and lines, and executes various functions and processes data of the electronic device 3 by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31. For example, the at least one processor 32, when executing the computer program stored in the memory, implements all or a portion of the steps of the artificial intelligence based table detection method described in embodiments of the present invention; or implement all or part of the functionality of an artificial intelligence based form detection apparatus. The at least one processor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips.

In some embodiments, the at least one communication bus 33 is arranged to enable connection communication between the memory 31 and the at least one processor 32 or the like.

Although not shown, the electronic device 3 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, an electronic device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the specification may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A form detection method based on artificial intelligence, the method comprising:

2. The artificial intelligence based form detection method of claim 1, wherein the training process of the form region detection model comprises:

3. The artificial intelligence based form detection method of claim 2, wherein the obtaining a sample form picture and obtaining labeled form information corresponding to the sample form picture comprises:

4. The artificial intelligence based table detection method of claim 1, wherein said determining the location vector of the text line comprises:

acquiring vertex coordinates of a text box corresponding to the text line;

calculating to obtain a center coordinate according to the vertex coordinate;

5. The artificial intelligence based table inspection method of claim 4, wherein the generating a target network structure diagram according to the text feature vectors of the text lines and the position vectors of the text lines comprises:

6. An artificial intelligence based form inspection method according to any one of claims 1 to 5, wherein said determining cells in said form area from said textual entity labels comprises:

and confirming the area of the target text line as a cell.

7. The artificial intelligence based form inspection method of claim 6, wherein the method further comprises:

8. An artificial intelligence based form detection apparatus, the apparatus comprising:

9. An electronic device, comprising a processor and a memory, wherein the processor is configured to implement the artificial intelligence based form detection method according to any one of claims 1 to 7 when executing a computer program stored in the memory.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the artificial intelligence based form detection method according to any one of claims 1 to 7.