CN114743204A

CN114743204A - Automatic question answering method, system, equipment and storage medium for table

Info

Publication number: CN114743204A
Application number: CN202210374208.4A
Authority: CN
Inventors: 陶德威; 王健宗; 于凤英
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2022-07-12

Abstract

The invention belongs to the technical field of artificial intelligence, and provides an automatic question answering method, an automatic question answering system, automatic question answering equipment and a storage medium for tables, wherein the method comprises the following steps: acquiring text information, position information and picture information corresponding to each text box in a target table area according to the preprocessed document to be recognized; inputting the text information, the position information and the picture information into a structure prediction model, and acquiring a structure position relation between each text box and other text boxes in the target table area; generating a target question-answering sentence by combining the text information corresponding to each text box; and inputting the target question-answer sentences into a table question-answer model to obtain answers. The embodiment of the invention integrates text information, position information and picture information in the form, effectively improves the accuracy rate of form identification by carrying out data sampling and data characteristic extraction from multiple aspects, and realizes automatic question answering sentences on the basis of the identification rate.

Description

Automatic question answering method, system, equipment and storage medium for table

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an automatic question answering method, system, equipment and storage medium for tables.

Background

With the development of technology, many models for text generation tasks are born. There is text generation based on existing sentences, such as translation; there is text generation based on keywords, such as keywords being expanded into a sentence; but also summary generation based on paragraphs, etc. But is very rare for relevant text generation like table structured data.

At present, the natural language processing model GPT-2 is widely applied in the generation field, and because the GPT-2 generates the next text based on the decoded information, the global information is lacked during the generation. Although a Bidirectional Encoder (BERT) can obtain global information, it is obviously more reasonable to generate subsequent texts based on the existing information during actual generation. The BART model combines the advantages of the GPT-2 model and the BERT model, acquires global information in an encoding part, and generates a subsequent text based on the decoded information in a decoding part.

However, the BART model is also an end-to-end model, and is suitable for text information, but there is no good way for processing form type data.

Disclosure of Invention

The invention provides an automatic question-answering method, an automatic question-answering system, automatic question-answering equipment and a storage medium for tables, and mainly aims to provide an automatic question-answering method for tables and effectively improve the automatic question-answering efficiency of documents.

In a first aspect, an embodiment of the present invention provides an automatic question answering method for a table, including:

according to the preprocessed document to be recognized, acquiring text information, position information and picture information corresponding to each text box in a target table area from the document to be recognized;

inputting text information, position information and picture information corresponding to each text box into a structure prediction model, and obtaining a structure position relation between each text box in the target table area, wherein the structure prediction model is obtained by training by taking the text information, the position information and the picture information corresponding to each text box in a sample table area as a sample and the structure position relation between each text box in the sample table area as a label;

generating a target question-answering sentence according to the structural position relation between the text boxes and the text information;

and inputting the target question-answer sentence into a table question-answer model to obtain an answer, wherein the table question-answer model is obtained by taking a preset question-answer sentence as a sample and training with an answer label.

Preferably, the structure prediction model includes a text feature extraction module, a coordinate feature extraction module, a picture feature extraction module, a multilayer perceptron and a graph convolution network, and the step of inputting text information, position information and picture information corresponding to each text box into the structure prediction model to obtain a structure position relationship between each text box in the target table region includes:

inputting the text information corresponding to each text box into the text feature extraction module to obtain text content features;

inputting the position information corresponding to each text box into the coordinate feature extraction module to obtain text position features;

inputting picture information corresponding to each text box and position information corresponding to each text box into the picture feature extraction module to obtain local image features, and inputting the local image features into the multilayer perceptron to obtain multilayer image features;

and inputting the multilayer image characteristics, the text content characteristics and the text position characteristics into the graph volume network to obtain the structural position relationship between each text box and other text boxes.

Preferably, the text feature extraction module includes a projection unit and an LSTM unit, and the inputting text information corresponding to each text box into the text feature extraction module to obtain text content features includes:

inputting text information corresponding to each text box into the projection unit, projecting the text information into a preset vector space, and acquiring a text projection vector;

and inputting the text projection vector into the LSTM unit, converting the character string into a text semantic vector, and acquiring the text content characteristics.

Preferably, the image feature extraction module is a convolutional neural network.

Preferably, the obtaining, according to the preprocessed document to be recognized, text information, position information, and picture information corresponding to each text box in the target table region from the document to be recognized includes:

based on an OCR technology, detecting the preprocessed document to be recognized, and acquiring text information corresponding to each text box and position information corresponding to each text box;

and acquiring the picture information corresponding to each text box according to the common line dividing line and column dividing line of each text box.

Preferably, the generating a target question-answering sentence according to the structural position relationship between the text boxes and the text information includes:

determining a reference text box according to the structural position relation between each text box and other text boxes;

acquiring reference text information according to the reference text box and the text information corresponding to each text box;

and generating the target question-answering sentence according to the reference text information.

Preferably, the preset question-answer sentence is obtained by expanding on the basis of a basic question-answer sentence.

In a second aspect, an embodiment of the present invention provides an automatic question answering system for a form, including:

the information extraction module is used for acquiring text information, position information and picture information corresponding to each text box in a target table area from the document to be identified according to the preprocessed document to be identified;

the structure prediction module is used for inputting text information, position information and picture information corresponding to each text box into a structure prediction model and acquiring a structure position relation between each text box in the target table area, wherein the structure prediction model is obtained by taking the text information, the position information and the picture information corresponding to each text box in the sample table area as a sample and taking the structure position relation between each text box in the sample table area as a label for training;

the sentence generating module is used for generating a target question and answer sentence according to the structural position relation between the text boxes and the text information;

and the prediction module is used for inputting the target question-answer sentence into a table question-answer model to obtain an answer, wherein the table question-answer model is obtained by training with a preset question-answer sentence as a sample and an answer label.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the above automatic question answering method for a table when executing the computer program.

In a fourth aspect, an embodiment of the present invention provides a computer storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above automatic question answering method for a table.

The invention provides a method, a system, equipment and a storage medium for automatically asking and answering tables, which are used for firstly fusing text information corresponding to each text box, position information corresponding to each text box and picture information corresponding to each document box in a target table area, accurately and effectively identifying information contained in the target table area by performing diversity sampling on data and performing feature extraction on the data through fusing the information of the text, the position and the picture, and generating a target asking and answering sentence on the premise so as to realize an automatic asking and answering system for tables. The embodiment of the invention integrates the text information, the position information and the picture information in the form, effectively improves the accuracy rate of form identification by data sampling and data characteristic extraction from multiple aspects, and realizes automatic question answering sentences on the basis of the identification rate.

Drawings

Fig. 1 is a schematic view of an application scenario of an automatic question answering method for a form according to an embodiment of the present invention;

FIG. 2 is a flowchart of an automatic question answering method for a form according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram obtained in relation to Table 1 in an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a structural prediction model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an automatic question answering system for a form according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a computer device provided in the embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Fig. 1 is a schematic view of an application scenario of an automatic question answering method for a form according to an embodiment of the present invention, as shown in fig. 1, a user inputs a preprocessed document to be recognized at a client, the client sends the preprocessed document to be recognized to a server after receiving the preprocessed document to be recognized, and the server receives the preprocessed document to be recognized and executes the automatic question answering method for the form to obtain a target question answering sentence and an answer.

It should be noted that the server may be implemented by an independent server or a server cluster composed of a plurality of servers. The client may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like. The client and the server may be connected through bluetooth, USB (Universal Serial Bus), or other communication connection manners, which is not limited in this embodiment of the present invention.

Fig. 2 is a flowchart of an automatic question answering method for a form according to an embodiment of the present invention, as shown in fig. 2, the method includes:

s210, according to the preprocessed to-be-identified document, acquiring text information, position information and picture information corresponding to each text box in a target table area from the to-be-identified document;

the document to be identified in the embodiment of the present invention may be a PDF document including documents of various formats and pictures from which text contents can be extracted, and if the document to be identified is a document of other formats, the document to be identified needs to be preprocessed first, and the document of other formats is converted into a PDF document of pictures, so as to obtain a preprocessed document to be identified, and the preprocessed document to be identified is a PDF document; if the document to be identified is the picture PDF document, the picture PDF document is directly used as the preprocessed PDF document without preprocessing the document to be identified.

In addition, the preprocessed to-be-recognized document is obtained, and text information corresponding to each text box, position information corresponding to each text box and picture information corresponding to each text box in the target table area are obtained based on the preprocessed to-be-recognized document. The text information corresponding to each text box, i.e. the text information in each text box table, can also be understood as the text content of the character string. The position information corresponding to each text box can be identified by coordinates of four upper, lower, left and right foot points of the text box, or can be represented by two coordinates of a horizontal axis and a vertical axis, and can be specifically determined according to actual conditions, and the embodiment of the invention is not repeated herein. The picture information corresponding to each text box can be regarded as a picture corresponding to the text box.

The embodiment of the invention performs diversity sampling on the data and performs feature extraction on the data by fusing the information of the text, the position and the picture to the text information corresponding to each text box, the position information corresponding to each text box and the picture information corresponding to each document box in the target table area, and can accurately and effectively identify the information contained in the target table area.

And obtaining text information corresponding to each text box and position information corresponding to each text box in the target table area according to the preprocessed document to be recognized. As an embodiment, a preprocessed document to be recognized is obtained, the preprocessed document to be recognized is input into an OCR (Optical Character Recognition) software, and each Character in the document, the lateral position information of the Character, and the longitudinal position information of the Character can be recognized by using an OCR technology. The font, the font size (representing the size of the character) and the color information of each character can be obtained, and in practical application, the OCR tool can also correct and preprocess the image aiming at the problems of inclination, uneven illumination, noise, distortion and the like of the image to be detected. The image information corresponding to each text box can be obtained by image segmentation of the target table area.

As another embodiment, the preprocessed document to be recognized may be input into a neural network, and finally, the text information corresponding to each text box, the position information corresponding to each text box, and the picture information corresponding to each text box are obtained, where the neural network is obtained by training using the document to be recognized as a sample, and using the text information corresponding to each text box, the position information corresponding to each text box, and the picture information corresponding to each text box as tags.

S220, inputting text information, position information and picture information corresponding to each text box into a structure prediction model, and obtaining a structure position relation between the text boxes in the target table area, wherein the structure prediction model is obtained by taking the text information, the position information and the picture information corresponding to each text box in a sample table area as a sample and taking the structure position relation between the text boxes in the sample table area as a label for training;

and then inputting the text information corresponding to each text box, the position information corresponding to each text box and the picture information corresponding to each text box obtained in the above step into a structure prediction model to obtain the structure position relation between each text box and other text boxes in the target table area. In the embodiment of the present invention, a text box in a target table area is taken as an example for description, and text information corresponding to the text box, position information corresponding to the text box, and picture information corresponding to the text box are input into a structure prediction model, so as to obtain a structure position relationship between the text box and other text boxes. Here, the structural positional relationship between the text box and the other text boxes may be the structural relationship between the text box and all other text boxes in the target table area; however, if the structural relationship between the text box and all other text boxes needs to be calculated, a large training amount is caused when the structure prediction model is trained, each text box has a relationship whether to have "same column" or "same row" with other text boxes, and the correlation between the text boxes which are far away from each other is considered to be small, so in order to save the calculation amount, as a preferred embodiment, a K nearest neighbor algorithm is adopted, and for each text box, only the relationship between the text box and the nearest K nodes is considered, that is, the structural relationship between the text box and other text boxes is specifically the structural relationship between the text box and the nearest K text boxes.

In addition, generally, a character string is regarded as a node, and the information of the table structure can be abstracted into the column-row relationship among the nodes, that is, the character strings in the same column in the table all form nodes with the "same column" relationship, the character strings in the same row in the table all form nodes with the "same row" relationship, and the character strings in the same text box have the "same row" and "same column" relationships at the same time. The problem of table structure extraction can be abstracted into prediction of the row-column relationship among the nodes of each character string, and finally, the structure relationship of the table can be accurately extracted through the row-column relationship among the nodes. Therefore, in the embodiment of the present invention, the structural relationship herein refers to a relationship of the same row or the same column.

The existing method for extracting the table in the PDF file basically extracts the table frame in the file, then extracts the area in the frame according to the table frame, and finally performs OCR recognition on the image of the area in the frame, thereby extracting the table content. The embodiment of the invention fuses the text information, the position information and the picture information in the table, and effectively improves the accuracy of table identification by carrying out data sampling and data characteristic extraction from multiple aspects.

Table 1 is a team statistics table in the embodiment of the present invention, fig. 3 is a schematic structural diagram obtained for table 1 in the embodiment of the present invention, and it can be known from table 1 and fig. 3 that only 8 character strings (text boxes) marked in the table are considered, and the structural relationship between the 8 text boxes is as shown in fig. 3, where a solid line represents a peer-to-peer relationship, and a dotted line represents a peer-to-column relationship.

TABLE 1

Name 1	Age 2	Team III
			Peter④	25⑦	A⑧
Mike⑤	29	A
			John⑥	30	B

S230, generating a target question-answering sentence according to the structural position relation between the text boxes and the text information;

then, according to the structural position relationship between each text box and other text boxes, finding a first horizontal row of text boxes, and combining the text information of each text box, finding the text information "name", "age" and "team" of the text boxes in the first horizontal row, and generating a target question-and-answer sentence according to the found "name", "age" and "team", for example, the target question-and-answer sentence may be "whose age is the greatest? "and" whose name strokes are the most? "and the like.

The text boxes in the first horizontal row and the second horizontal row may also be found according to the structural position relationship between each text box and other text boxes, and then, in combination with the text information of each text box, the text information "name", "Peter", "age", "25", "team", "a" and the like "of the text boxes in the first horizontal row is found, so as to generate a target question-and-answer sentence, for example, the target question-and-answer sentence may be" team in which Peter is the name? "how big is the age of Peter? "" which team Peter is on? "and the like.

S240, inputting the target question-answer sentence into a table question-answer model to obtain an answer, wherein the table question-answer model is obtained by training with preset question-answer sentences as samples and answer labels.

The target question-answering sentence is input into a table question-answering model, and an answer can be obtained by searching and searching in a table. In the embodiment of the invention, the table question-answer model is a neural network model, the trained table question-answer model can be obtained after the neural network model is trained by a sample and a label, and answers corresponding to target question-answer sentences are obtained by using the table question-answer model.

On the basis of the foregoing embodiment, preferably, the structure prediction model includes a text feature extraction module, a coordinate feature extraction module, a picture feature extraction module, a multilayer perceptron, and a graph convolution network, where inputting text information, position information, and picture information corresponding to each text box into the structure prediction model to obtain a structure position relationship between each text box in the target table region includes:

Fig. 4 is a schematic structural diagram of a structural prediction model according to an embodiment of the present invention, as shown in fig. 4, the structural prediction model includes a text feature extraction module 410, a coordinate feature extraction module 420, a picture feature extraction module 430, a multi-layer perceptron 440, and a graph convolution network 450, and text information corresponding to each text box is input into the text feature extraction module, and the text feature extraction module projects characters into a preset vector space first, and a character string can be converted into a text semantic vector through LSTM, and the text semantic vector is used as a text content feature; inputting the position information corresponding to the text box into a coordinate feature extraction module, generally, acquiring absolute coordinates of the text box through an OCR technology, and then converting the absolute coordinates into relative coordinates to obtain text position features; and finally, inputting the picture information corresponding to the text box and the position information corresponding to each text box into a picture feature extraction module, and acquiring the local image feature of the text box through a convolution neural network according to the four-point coordinates and the image of the table area for the picture information of the text box.

And then inputting the multilayer image characteristics, the text content characteristics and the text position characteristics into a graph convolution network to obtain the structural position relation between each text box and other text boxes.

On the basis of the foregoing embodiment, preferably, the text feature extraction module includes a projection unit and an LSTM unit, and the inputting the text information corresponding to each text box into the text feature extraction module to obtain the text content features includes:

Specifically, the text feature extraction module is composed of a projection unit and an LSTM unit, text information corresponding to each text box is input into the projection unit and projected into a preset vector space to obtain a text projection vector, then the text projection vector is input into the LSTM network, and a character string is converted into a text semantic vector to obtain text content features.

The LSTM (Long Short-Term Memory network) is a time-cycle neural network, and is specially designed to solve the Long-Term dependence problem of general RNNs (cyclic neural networks), and all RNNs have a chain form of a repetitive neural network module.

On the basis of the above embodiment, preferably, the image feature extraction module is a convolutional neural network.

Specifically, the image feature extraction module in the embodiment of the invention is a convolutional neural network,

convolutional Neural Networks (CNN) are a class of feed forward Neural Networks (feed forward Neural Networks) that include convolution calculations and have a deep structure, and are one of the representative algorithms for deep learning (deep learning). The convolutional neural network has a representation learning (representation learning) capability, and can perform translation invariant classification on input information according to a hierarchical structure of the convolutional neural network.

On the basis of the foregoing embodiment, preferably, the acquiring, according to the preprocessed to-be-identified document, text information, position information, and picture information corresponding to each text box in the target table region from the to-be-identified document includes:

The method comprises the steps of obtaining a preprocessed document to be recognized, inputting the preprocessed document to be recognized into OCR software, and recognizing each character in the document, the transverse position information of the character and the longitudinal position information of the character by using an OCR technology. And carrying out image segmentation on the target table area according to the common line segmentation line and column segmentation line of the text box.

On the basis of the foregoing embodiment, preferably, the generating a target question-answering sentence according to the structural position relationship between the text boxes and the text information includes:

Finding out the first horizontal text box, the second horizontal text box and the third horizontal text box according to the structural position relationship between each text box and other text boxes, and then combining the text information of each text box to find out the text information ' name ', ' Mike ', ' age ', ' 29 ', ' team ' and ' A ' of the text boxes in the first horizontal text box, and generating a target question and answer sentence according to the found text information, for example, the target question and answer sentence can be ' how big the age of Mike? "and" who is in team a? "and the like.

On the basis of the above embodiments, preferably, the preset question-answer sentence is obtained by expanding on the basis of a basic question-answer sentence.

When the table question-answer model is trained, besides the training mode and skill, the problem of training data needs to be solved, and for the same table, if only one problem is used for one table, one problem will cause resource waste, and the other problem may cause overfitting, so that the model is not generalized enough. Some questions and answers can be generated through the form information.

The generation mode mainly has two directions:

generated by a template. And asking the data of the table by using a fixed sentence pattern. What is a [ team ] like "? "the problem can be completed by filling the groove; moreover, sentences are expanded through the model, and synonymy expansion is carried out on the questions by utilizing the T5 model, so that similar questions can be obtained, and the data set can be expanded.

In the prior art, the current text generation only aims at the generation of short sentences or long texts and other text types, and does not aim at the generation of question and answer of texts of table structure data. The embodiment of the invention provides a new scheme for generating the question and answer aiming at the table structure data, and how to prepare the data set, how to expand the data set and how to train, and the process is explained in detail. Therefore, the model can generate question and answer texts for the table structure data according to the method.

Fig. 5 is a schematic structural diagram of an automatic question answering system for a table according to an embodiment of the present invention, as shown in fig. 5, the system includes an information extraction module 510, a structural prediction module 520, a statement generation module 530, and a prediction module 540, where:

the information extraction module 510 is configured to obtain, according to the preprocessed to-be-identified document, text information, position information, and picture information corresponding to each text box in the target table region from the to-be-identified document;

the structure prediction module 520 is configured to input text information, position information, and picture information corresponding to each text box into a structure prediction model, and obtain a structure position relationship between each text box in the target table region, where the structure prediction model is obtained by training using the text information, the position information, and the picture information corresponding to each text box in the sample table region as a sample, and using the structure position relationship between each text box in the sample table region as a label;

the sentence generation module 530 is configured to generate a target question-answering sentence according to the structural position relationship between the text boxes and the text information;

the prediction module 540 is configured to input the target question-answer sentence into a table question-answer model, and obtain an answer, where the table question-answer model is obtained by training a preset question-answer sentence as a sample and using an answer label.

The present embodiment is a system embodiment corresponding to the above method embodiment, and the specific process of the embodiment is the same as the above method embodiment, and please refer to the above method embodiment for details, which is not described herein again.

On the basis of the foregoing embodiment, preferably, the structure prediction model includes a text feature extraction module, a coordinate feature extraction module, a picture feature extraction module, a multilayer perceptron, and a graph convolution network, where:

the text feature extraction module is used for acquiring text content features according to the text information corresponding to each text box;

the coordinate feature extraction module is used for acquiring text position features according to the position information corresponding to each text box;

the picture feature extraction module is used for acquiring local image features according to the picture information corresponding to each text box and the position information corresponding to each text box;

the perception machine is used for acquiring multilayer image characteristics according to the local image characteristics;

and the graph volume network is used for acquiring the structural position relation between each text box and other text boxes according to the multilayer image characteristics, the text content characteristics and the text position characteristics.

On the basis of the foregoing embodiment, preferably, the text feature extraction module includes a projection unit and an LSTM unit, where:

the text feature extraction module is used for projecting the text information into a preset vector space according to the text information corresponding to each text box to obtain a text projection vector;

and the LSTM unit is used for converting the character strings into text semantic vectors according to the text projection vectors and acquiring the text content characteristics.

On the basis of the above embodiment, preferably, the information extraction module includes a detection unit and a segmentation unit, wherein:

the detection unit is used for detecting the preprocessed document to be recognized based on an OCR technology, and acquiring text information corresponding to each text box and position information corresponding to each text box;

the segmentation unit is used for acquiring the picture information corresponding to each text box according to the common line segmentation line and column segmentation line of each text box.

On the basis of the foregoing embodiment, preferably, the sentence generation module includes a reference structure unit, a reference text unit, and a generation unit, wherein:

the reference structure unit is used for determining a reference text box according to the structure position relation between each text box and other text boxes;

the reference text unit is used for acquiring reference text information according to the reference text box and the text information corresponding to each text box;

the generating unit is used for generating the target question-answering sentence according to the reference text information.

The various modules in the above automatic question-answering system for tables can be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

Fig. 6 is a schematic structural diagram of a computer device provided in an embodiment of the present invention, where the computer device may be a server, and an internal structural diagram of the computer device may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a computer storage medium and an internal memory. The computer storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the computer storage media. The database of the computer device is used for storing data generated or acquired during execution of the automatic question answering method for tables, such as documents to be recognized, text information, position information, and picture information. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an automatic question-answering method for a form.

In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the steps of the automatic question-answering method for tables in the above embodiments are implemented. Alternatively, the processor, when executing the computer program, implements the functions of the modules/units in this embodiment of the automatic question-answering system for tables.

In an embodiment, a computer storage medium is provided, and the computer storage medium stores a computer program, and the computer program is executed by a processor to implement the steps of the automatic question answering method for tables in the above embodiments. Alternatively, the computer program realizes the functions of the modules/units in the above-described embodiment of the automatic question-answering system for tables when executed by a processor.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. An automatic question answering method for tables, which is characterized by comprising the following steps:

inputting text information, position information and picture information corresponding to each text box into a structure prediction model, and obtaining a structure position relation between each text box in the target table area, wherein the structure prediction model is obtained by training by taking the text information, the position information and the picture information corresponding to each text box in a sample table area as a sample and taking the structure position relation between each text box in the sample table area as a label;

2. The method according to claim 1, wherein the structure prediction model includes a text feature extraction module, a coordinate feature extraction module, a picture feature extraction module, a multi-layer perceptron and a graph convolution network, and the inputting text information, position information and picture information corresponding to each text box into the structure prediction model to obtain the structure position relationship between each text box in the target form area includes:

3. The method according to claim 2, wherein the text feature extraction module comprises a projection unit and an LSTM unit, and the inputting the text information corresponding to each text box into the text feature extraction module to obtain the text content features comprises:

4. The method of claim 2, wherein the image feature extraction module is a convolutional neural network.

5. The method according to claim 1, wherein the obtaining text information, position information, and picture information corresponding to each text box in a target table region from the document to be recognized according to the preprocessed document to be recognized comprises:

6. The method according to any one of claims 1 to 5, wherein the generating a target question-answering sentence according to the structural position relationship between the text boxes and the text information includes:

7. The automatic question-answering method for forms according to any one of claims 1 to 5, wherein the preset question-answering sentences are expanded on the basis of basic question-answering sentences.

8. An automatic question-answering system for a form, comprising:

the structure prediction module is used for inputting text information, position information and picture information corresponding to each text box into a structure prediction model and acquiring a structure position relation between each text box in the target table area, wherein the structure prediction model is obtained by training by taking the text information, the position information and the picture information corresponding to each text box in a sample table area as a sample and taking the structure position relation between each text box in the sample table area as a label;

9. A computer arrangement comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method for automatic question answering for a form according to any one of claims 1 to 7 when executing the computer program.

10. A computer storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for automatic question answering for tables according to any one of claims 1 to 7.