CN113705430A

CN113705430A - Table detection method, device and equipment based on detection model and storage medium

Info

Publication number: CN113705430A
Application number: CN202110989638.2A
Authority: CN
Inventors: 雷晨雨; 韩茂琨; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-08-26
Filing date: 2021-08-26
Publication date: 2021-11-26

Abstract

The application relates to artificial intelligence, in particular to target detection, and provides a table detection method, a device, computer equipment and a storage medium based on a detection model, wherein the method comprises the following steps: acquiring a document image; extracting a document feature map from the document image based on the feature map extraction sub-network of the table detection model; determining the prediction information of the document feature map based on a prediction sub-network of the table detection model, wherein the prediction information at least comprises a first position of a table key point on the document feature map and a position offset of a projection position of the first position on the document image; determining a second position of the form key point on the document image according to the prediction information based on a preset position determination rule; and determining a table area in the document image according to the second positions of the plurality of table key points. The application also relates to a block chain technology, and the obtained table area can be stored in the block chain. The medical treatment field is also related, and the document image can be an image of a file such as a medical record, a checking sheet and the like.

Description

Table detection method, device and equipment based on detection model and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for detecting a form based on a detection model, a computer device, and a storage medium.

Background

The current table detection method mainly comprises two modes of realization based on computer vision and realization based on semantic segmentation. The table detection method based on computer vision mainly adopts anchor-based target detection algorithms, such as yolo, false-rcnn and other target detection algorithms. No matter the table detection is realized by an anchor-based target detection algorithm or by semantic segmentation, no ideal detection effect can be obtained for special tables such as long and narrow tables. For example, when table detection is implemented by an anchor-based target detection algorithm, in the case of a long and narrow table, an anchor (anchor) that is difficult to match the table and cannot detect the table occurs, and in the case of a table that is inclined or distorted to some extent, an accurate detection result cannot be obtained; when the table detection is realized based on the semantic segmentation, a plurality of situations that the foreground is not easy to separate appear in the case of a long and narrow table, and the problem that the detection takes long time appears in the case of a table inclined or distorted to a certain extent.

Disclosure of Invention

The application provides a table detection method and device based on a detection model, computer equipment and a storage medium, which can realize table detection based on key point detection of a table.

In a first aspect, the present application provides a form detection method based on a detection model, where the method includes:

acquiring a document image, wherein the size of the document image is a preset first size;

extracting a document feature map from the document image based on a feature map extraction sub-network of a table detection model, wherein the size of the document feature map is a preset second size, and the first size is larger than the second size;

determining prediction information of the document feature map based on a prediction subnetwork of the table detection model, wherein the prediction information at least comprises a first position of a table key point on the document feature map and a position offset of a projection position of the first position on the document image;

determining a second position of the form key point on the document image according to the prediction information based on a preset position determination rule;

and determining a table area in the document image according to the second positions of the plurality of table key points.

In a second aspect, the present application provides a form detection apparatus based on a detection model, comprising:

the feature map extraction module is used for extracting a document feature map from the document image based on a feature map extraction sub-network of a table detection model, wherein the size of the document feature map is a preset second size, and the first size is larger than the second size;

the prediction information acquisition module is used for determining prediction information of the document feature map based on a prediction sub-network of the table detection model, wherein the prediction information at least comprises a first position of a table key point on the document feature map and a position offset of a projection position of the first position on the document image;

the key point determining module is used for determining a second position of the form key point on the document image according to the prediction information based on a preset position determining rule;

and the table determining module is used for determining a table area in the document image according to the second positions of the plurality of table key points.

In a third aspect, the present application provides a computer device comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and implement the above table detection method based on the detection model when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the method for detecting a table based on a detection model is implemented.

The application discloses a form detection method, a form detection device, computer equipment and a storage medium based on a detection model, wherein a document image is obtained, and the size of the document image is a preset first size; extracting a document feature map from the document image based on a feature map extraction sub-network of a table detection model, wherein the size of the document feature map is a preset second size, and the first size is larger than the second size; determining prediction information of the document feature map based on a prediction subnetwork of the table detection model, wherein the prediction information at least comprises a first position of a table key point on the document feature map and a position offset of a projection position of the first position on the document image; determining a second position of the form key point on the document image according to the prediction information based on a preset position determination rule; determining a table area in the document image according to the second positions of the plurality of table key points, realizing the table detection based on the table key point detection, and effectively detecting special tables such as long and narrow tables, tables with certain inclination or distortion and the like; when the detection of the key points is realized, the positions of the key points of the table on the document image are determined by combining the first positions of the key points of the table on the document characteristic diagram and the position offset of the projection positions of the first positions on the document image, so that the detection accuracy of the key points is improved, and the detection accuracy of the table is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a table detection method based on a detection model according to an embodiment of the present application;

FIG. 2 is a block diagram illustrating a table detection model according to an embodiment of the present disclosure;

FIG. 3 is a block diagram schematically illustrating a structure of a table detection apparatus based on a detection model according to an embodiment of the present application;

fig. 4 is a block diagram illustrating a structure of a computer device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation. In addition, although the division of the functional blocks is made in the device diagram, in some cases, it may be divided in blocks different from those in the device diagram.

The embodiment of the application provides a table detection method and device based on a detection model, computer equipment and a computer readable storage medium. The method is used for realizing the table detection based on the key point detection of the table and improving the accuracy of the table detection. For example, in table detection, some more specific tables are often encountered, for example, a long and narrow table such as a table in 2 rows and 10 columns is sometimes encountered, or a certain tilt or distortion of a table in a document image for table detection is sometimes encountered, and a table region cannot be accurately detected, so that table detection can be realized based on table key point detection according to the table detection method based on the detection model of the embodiment of the application, and a special table such as a long and narrow table, a table with a certain tilt or distortion can be effectively detected; when the form key point detection is realized, the second position of the form key point on the document image is determined by combining the first position of the form key point on the document feature map and the position offset of the projection position of the first position on the document image, so that the accuracy of the form key point detection is improved, and the accuracy of the form detection is improved.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic flow chart of a table detection method based on a detection model according to an embodiment of the present application.

As shown in fig. 1, the detection model-based table detection method may include the following steps S110 to S150.

Step S110, a document image is obtained, and the size of the document image is a preset first size.

Illustratively, the document image of the document to be detected is obtained through scanning or format conversion. For example, the document to be detected is converted into a color picture of a first size by format conversion to obtain the document image. In one embodiment, the first dimension is 512 × 512, each pixel of the document image corresponds to a color system (RGB), and the dimension of each pixel of the document image is 3.

And step S120, extracting a document feature map from the document image based on a feature map extraction sub-network of the table detection model, wherein the size of the document feature map is a preset second size, and the first size is larger than the second size.

A feature map (feature map) comprises a plurality of features corresponding to different receptive fields, for example, for the document feature map, a receptive field is understood to be all pixels of a small area on the document image.

In some embodiments, the structure of the table detection model is as shown in fig. 2, the table detection model includes a feature map extraction sub-network for obtaining a feature map and a prediction sub-network for obtaining prediction information, the feature map extraction sub-network includes a backbone sub-network and an upsampling sub-network, and the prediction sub-network includes a plurality of branch sub-networks.

Illustratively, step S120 specifically includes steps S121-S122.

S121, inputting the document image into a backbone sub-network of the feature map extraction sub-network to obtain a primary feature map;

illustratively, the backbone sub-network is a Convolutional Neural Network (CNN), and a convolution kernel in the convolutional neural network can extract features of the document image to generate a feature map. For example, in one embodiment, the backbone sub-network uses a convolutional neural network, shuffle V2, and the document image is input into the backbone sub-network, so as to obtain a primary feature map feature _ map1 with a size of height × width 16 × 16, and the dimension of each pixel in feature _ map1 is 64.

The shufflenet V2 is a lightweight convolutional neural network, which makes a good balance between speed and accuracy, and the detection model-based table detection method is realized based on the shufflenet V2, so that the detection model-based table detection method can be conveniently deployed in mobile terminal equipment.

And S122, inputting the primary feature map into an up-sampling sub-network of the feature map extraction sub-network to obtain the document feature map.

The upsampling refers to a technology for enabling an image to have higher resolution, and the document feature map is obtained to have a size with higher resolution and larger size than the primary feature map by upsampling the primary feature map. The upsampling sub-network may implement upsampling by bilinear interpolation, transposed convolution, etc.

Illustratively, the height of the first dimension and the width of the first dimension are N times of the height of the second dimension and the width of the second dimension in sequence, and N is a positive integer greater than or equal to 2. For example, in an embodiment, feature _ map1 is input into an upsampling subnetwork of the feature map extraction subnetwork to perform upsampling, and a document feature map feature 2 with a second size of 128 × 128 is obtained, where the height of the first size and the width of the first size are sequentially 4 times the height of the second size and the width of the second size.

For example, when the dense table in the document image is detected, the network structure of the up-sampling sub-network may be adjusted to adjust the second size of the document feature map to a larger size, so as to obtain a more ideal detection effect for the dense table.

Step S130, determining the prediction information of the document feature map based on the prediction sub-network of the table detection model, wherein the prediction information at least comprises the first position of the table key point on the document feature map and the position offset of the projection position of the first position on the document image.

Illustratively, the table center point and the table vertex are both table key points;

illustratively, the prediction sub-network includes a plurality of branch sub-networks for obtaining different parts of the prediction information, and step S120 specifically includes steps S131 to S134:

s131, inputting the document feature map into a first branch sub-network of the prediction sub-network to obtain a first position of a table center point on the document feature map;

for example, feature _ map2 is input into a first branch subnetwork of the prediction subnetwork to obtain a first position c1 of a table center point on the document feature map in an output of the first branch subnetwork; the first position, the second position and the projected position of all the key points can be expressed in the form of coordinates, for example, c1 is coordinates (x1, y 1).

In some embodiments, the prediction information further includes a table confidence, for example, the first branch subnetwork is further configured to obtain the table confidence, and the detection model-based table detection method further includes: and if the form confidence coefficient is smaller than a preset threshold value, outputting prompt information that the document does not contain the form.

S132, inputting the document feature map into a second branch sub-network of the prediction sub-network to obtain a first position of a table vertex on the document feature map;

for example, feature _ map2 is input into the second branch subnetwork of the predictor network to obtain, respectively, a first position q1 of a top left table vertex, a first position q2 of a top right table vertex, a first position q3 of a bottom left table vertex, and a first position q4 of a bottom right table vertex, the top left table vertex being a table vertex at the top left of the table, the top right table vertex being a table vertex at the top right of the table, the bottom left table vertex being a table vertex at the bottom left of the table, and the bottom right table vertex being a table vertex at the bottom right of the table.

S133, inputting the document feature map into a third branch sub-network of the prediction sub-network to obtain the offset of a projection position corresponding to a grid center point on the document image;

for example, feature _ map2 is input into a third sub-network of the prediction sub-network to obtain the offset delta _ c1 of the projection position corresponding to the lattice center point on the document image in the output of the third sub-network.

S134, inputting the document feature map into a fourth branch sub-network of the prediction sub-network to obtain the offset of the projection position corresponding to the top point of the table on the document image.

For example, feature _ map2 is input to the fourth branch sub-network of the prediction sub-network to obtain the offset delta _ q1 of the projection position corresponding to the vertex of the upper left table, the offset delta _ q2 of the projection position corresponding to the vertex of the upper right table, the offset delta _ q3 of the projection position corresponding to the vertex of the lower left table, and the offset delta _ q4 of the projection position corresponding to the vertex of the lower right table, respectively.

In some embodiments, the prediction information further includes a displacement between a table vertex and a table center point on the document image, and step S130 further includes step S135.

And S135, inputting the document feature map into a fifth branch sub-network of the prediction sub-network to obtain the displacement between the table vertex and the table central point on the document image.

For example, feature _ map2 is input into a fifth of the prediction subnetworks to obtain, in the output of the fifth subnetwork, a displacement delta _ p1 from the table centroid to the top left table vertex, a displacement delta _ p2 from the table centroid to the top right table vertex, a displacement delta _ p3 from the table centroid to the bottom left table vertex, and a displacement delta _ p4 from the table centroid to the top right table vertex on the document image.

In some embodiments, the prediction information further includes a width of the table and a height of the table on the document image, and step S130 further includes step S136.

S136, inputting the document feature map into a sixth branch sub-network of the prediction sub-network to obtain the width of the table and the height of the table on the document image.

For example, feature _ map2 is input into a sixth sub-network of the prediction sub-network to obtain the height h and width w of the table on the document image in the output of the sixth sub-network.

Step S140, determining a second position of the form key point on the document image according to the prediction information based on a preset position determination rule.

Illustratively, step S140 specifically includes steps S141 to S142:

s141, converting the first position of the table key point according to the ratio of the first size to the second size to obtain the projection position of the first position on the document image;

illustratively, step S141 specifically includes: and converting the first positions of all the table key points according to the ratio of the first size to the second size to obtain the projection positions of all the first positions on the document image.

For example, in an embodiment, since the height of the first size and the width of the first size are 4 times of the height of the second size and the width of the second size in sequence, c1 is multiplied by 4 in both the height direction and the width direction to obtain a projection position c1_ t corresponding to the center point of the table, q1 is multiplied by 4 in both the height direction and the width direction to obtain a projection position q1_ t corresponding to the top point of the upper left table, q2 is multiplied by 4 in both the height direction and the width direction to obtain a projection position q2_ t corresponding to the top point of the upper right table, q3 is multiplied by 4 in both the height direction and the width direction to obtain a projection position q3_ t corresponding to the top point of the upper right table, and q4 is multiplied by 4 in both the height direction and the width direction to obtain a projection position q4_ t corresponding to the top point of the upper right table.

S142, determining a second position of the form key point on the document image according to the projection position corresponding to the form key point and the position offset of the projection position.

In some embodiments, step S142 specifically includes: and determining a second position of each form key point on the document image according to the projection position corresponding to each form key point and the position offset of the projection position. For example, the second position C1 of the table center point is C1_ t + delta _ C1, the second position of the top left table vertex is q1_ t + delta _ q1, the second position of the top right table vertex is q2_ t + delta _ q2, the second position of the bottom left table vertex is q3_ t + delta _ q3, and the second position of the bottom right table vertex is q4_ t + delta _ q 4.

In some other embodiments, the prediction information further includes a displacement between a table vertex and a table center point on the document image, and step S142 specifically includes: determining a second position of the table center point on the document image according to the projection position corresponding to the table center point on the document feature map and the position offset of the projection position; step S140 also includes steps S143-S145.

S143, determining a first candidate position of a table vertex on the document image according to a projection position corresponding to the table vertex on the document feature map and a position offset of the projection position;

illustratively, the projection position corresponding to the table vertex on the document feature map and the position offset of the projection position are added to obtain a first candidate position of the table vertex on the document image.

For example, the first candidate position Q1 of the top left table vertex is Q1_ t + delta _ Q1, the first candidate position Q2 of the top right table vertex is Q2_ t + delta _ Q2, the first candidate position Q3 of the bottom left table vertex is Q3_ t + delta _ Q3, and the first candidate position Q4 of the bottom right table vertex is Q4_ t + delta _ Q4.

S144, obtaining a second candidate position of the table vertex on the document image according to the projection position of the table center point and the displacement between the table vertex and the table center point on the document image;

illustratively, the displacement between the table vertex and the table center point on the document image is the displacement from the table center point to the table vertex, and the projection position of the table center point is added with the displacement between the table vertex and the table center point on the document image to obtain a second candidate position of the table vertex on the document image.

For example, the second candidate position P1 of the top left table vertex is c1_ t + delta _ P1, the second candidate position P2 of the top right table vertex is c1_ t + delta _ P2, the second candidate position P3 of the bottom left table vertex is c1_ t + delta _ P3, and the second candidate position P4 of the bottom right table vertex is c1_ t + delta _ P4.

S145, determining a second position of the table vertex according to the first candidate position and the second candidate position.

In some embodiments, the second position of the table vertex is determined according to an average of the first candidate position and the second candidate position.

In some other embodiments, the prediction information further includes a width of the table and a height of the table on the document image, and the step S145 specifically includes steps S145a-S145 c.

S145a, determining a reference frame of the table on the document image according to the second position of the table center point, the width of the table on the document image and the height of the table;

for example, the second position C1 of the table center point, the width w of the table and the height h of the table are sequentially used as the center point of a rectangle on a document image, the width of the rectangle and the height of the rectangle, so as to determine a rectangular border on the document, and determine the rectangular border as the reference border box of the table.

S145b, if the first candidate positions of all the table vertices are within the reference frame, determining the first candidate position of the table vertex as the second position of the table vertex;

for example, if Q1, Q2, Q3, and Q4 are all within the box, then the second position of the top left grid vertex, the second position of the top right grid vertex, the second position of the bottom left grid vertex, and the second position of the bottom right grid vertex are Q1, Q2, Q3, and Q4, respectively.

S145c, if there is a first candidate position of the table vertex outside the reference frame, determining that a second candidate position of the table vertex is the second position of the table vertex.

For example, if any of Q1, Q2, Q3, and Q4 is outside the box, the second position of the top left table vertex, the second position of the top right table vertex, the second position of the bottom left table vertex, and the second position of the bottom right table vertex are P1, P2, P3, and P4, in that order.

S150, determining a table area in the document image according to the second positions of the plurality of table key points.

Illustratively, the second positions of four table vertices of a table are taken as the positions of four vertices of a quadrangle on the document image to determine a quadrangle on the document image, and the area of the quadrangle is determined as the table area.

In some embodiments, the table regions in the resulting document image may be stored in blockchain nodes. The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

In some embodiments, the detection model-based table detection method further comprises training the target detection model through steps S100-S108:

s100, obtaining a plurality of training samples, wherein at least one part of the training samples are document images containing tables, and the document images containing the tables are provided with marking information of key points of the tables;

illustratively, documents for training are collected, and referring to step S110, document images of the documents are obtained by scanning or format conversion to obtain the training samples.

S101, obtaining a standard second position of the form key point according to the marking information of the form key point;

illustratively, the labeling information includes labeling information of a table center point and labeling information of all table vertices, and the standard second position of the table center point and the standard second position of the table vertices are obtained through step S101.

S102, converting the standard second position of the table key point according to the ratio of the second size to the first size to obtain the standard first position of the table key point;

illustratively, the ratio of the second size to the first size is multiplied by the normalized second position of each of the keypoints and rounded to obtain the normalized first position of each of the keypoints.

S103, converting the standard first position of the table key point according to the ratio of the first size to the second size to correspondingly obtain the standard projection position;

illustratively, the ratio of the first size to the second size is multiplied by the normalized first position of each of the keypoints to obtain a normalized projected position corresponding to each of the keypoints.

S104, determining the position offset of the standard projection position according to the standard second position of the form key point and the standard projection position corresponding to the form key point;

for example, the standard second position of the table key point is subtracted from the standard projection position corresponding to the table key point to obtain the position offset of the standard projection position.

S105, determining labels of all the training samples, wherein the labels of the document image containing the table at least comprise position correction quantities of the standard first position and the standard projection position;

illustratively, the data in the tag corresponds to the data in the predictive information.

S106, extracting a sub-network based on the feature map of the table detection model, and extracting a document feature map of a training sample;

illustratively, the training sample is input into a feature map extraction sub-network of a table detection model to obtain a document feature map of the training sample in the output of the feature map extraction sub-network. The specific step of step S106 can refer to the specific step of step S120.

S107, determining the prediction information of the training sample according to the document feature map of the training sample based on the prediction sub-network of the table detection model;

illustratively, the document feature map of the training sample is input into a prediction sub-network of the table detection model to obtain prediction information of the training sample in an output of the prediction sub-network. The specific step of step S107 may refer to the specific step of step S130.

And S108, adjusting the network parameters of the table detection model according to the error between the label of the training sample and the prediction information of the training sample.

Illustratively, the network parameters of the table inspection model are adjusted to reduce the error by back-propagating the error in the table inspection model.

Illustratively, steps S106-S108 are iteratively performed to train the table detection model, and if the table detection model converges, the iteration is stopped to obtain the table detection model for performing steps S110-S150.

According to the form detection method based on the detection model, a document image is obtained, and the size of the document image is a preset first size; extracting a document feature map from the document image based on a feature map extraction sub-network of a table detection model, wherein the size of the document feature map is a preset second size, and the first size is larger than the second size; determining prediction information of the document feature map based on a prediction subnetwork of the table detection model, wherein the prediction information at least comprises a first position of a table key point on the document feature map and a position offset of a projection position of the first position on the document image; determining a second position of the form key point on the document image according to the prediction information based on a preset position determination rule; determining a table area in the document image according to the second positions of the plurality of table key points, realizing the table detection based on the table key point detection, and effectively detecting special tables such as long and narrow tables, tables with certain inclination or distortion and the like; when the detection of the key points is realized, the positions of the key points of the table on the document image are determined by combining the first positions of the key points of the table on the document characteristic diagram and the position offset of the projection positions of the first positions on the document image, so that the detection accuracy of the key points is improved, and the detection accuracy of the table is improved.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology, for example, prediction information of a document image is acquired through a form detection model. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Object detection, which is a subdivision of computer vision techniques, finds objects in an image while determining the category of the objects and the location of the objects. The target detection is widely applied to the fields of robot navigation, intelligent video monitoring, industrial detection, aerospace and the like, reduces the consumption of human capital through computer vision, and has important practical significance.

In the field of target detection, an anchor (anchor) can be understood as a sliding window with an anchor point as a central point, the anchor point is a reference point preset on an image, and the anchor-based target detection method determines the accurate position and size of a target by acquiring an anchor which is optimally matched with the target. At present, a form detection method realized based on computer vision mainly adopts an anchor-based target detection algorithm, and when special forms such as a long and narrow form and a form with certain inclination or distortion appear in a document image, the anchor is difficult to match the form, so that the form cannot be accurately detected. The invention realizes the table detection based on the key point detection of the table, is an anchor-free target detection method, and can obtain better detection effect on special tables such as long and narrow tables, tables with certain inclination or distortion in document images and the like.

The invention can also be applied in the medical field, for example, the document image can be the image of a medical record, a check document and the like.

As shown in fig. 3, the table detection apparatus based on the detection model includes: an image acquisition module 110, a feature map extraction module 120, a prediction information acquisition module 130, a keypoint determination module 140, and a table determination module 150.

The image obtaining module 110 is configured to obtain a document image, where the size of the document image is a preset first size;

a feature map extraction module 120, configured to extract a document feature map for the document image based on a feature map extraction sub-network of a table detection model, where the size of the document feature map is a preset second size, and the first size is larger than the second size;

a prediction information obtaining module 130, configured to determine, based on a prediction subnetwork of the table detection model, prediction information of the document feature map, where the prediction information at least includes a first position of a table key point on the document feature map and a position offset of a projection position of the first position on the document image;

a key point determining module 140, configured to determine, based on a preset position determination rule, a second position of the form key point on the document image according to the prediction information;

a table determining module 150, configured to determine a table region in the document image according to the second position of the plurality of table key points.

Illustratively, the feature map extraction module 120 includes a primary feature map extraction module and an upsampling module.

The primary feature map extraction module is used for inputting the document image into a backbone sub-network of the feature map extraction sub-network to obtain a primary feature map;

and the up-sampling module is used for inputting the primary feature map into an up-sampling sub-network of the feature map extraction sub-network so as to obtain the document feature map.

Illustratively, the prediction information obtaining module 130 is specifically configured to: inputting the document feature map into a first branch sub-network of the prediction sub-network to obtain a first position of a table center point on the document feature map; inputting the document feature map into a second branch subnetwork of the prediction subnetwork to obtain a first position of a table vertex on the document feature map; inputting the document feature map into a third branch sub-network of the prediction sub-network to obtain the offset of a projection position corresponding to a grid center point on the document image; and inputting the document feature map into a fourth branch sub-network of the prediction sub-network to obtain the offset of the projection position corresponding to the top point of the table on the document image.

In some embodiments, the keypoint determination module 140 is specifically configured to: converting the first position of the table key point according to the ratio of the first size to the second size to obtain the projection position of the first position on the document image; and determining a second position of the form key point on the document image according to the projection position corresponding to the form key point and the position offset of the projection position.

In some other embodiments, the prediction information further includes a displacement between a table vertex and a table center point on the document image, and the keypoint determination module 140 specifically includes a projection position determination unit, a table center point position determination unit, a table vertex first candidate position determination unit, a table vertex second candidate position determination unit, and a table vertex position determination unit.

A projection position determining unit, configured to convert the first positions of all the table key points according to a ratio of the first size to the second size to obtain projection positions of all the first positions on the document image;

the table central point position determining unit is used for determining a second position of the table central point on the document image according to a projection position corresponding to the table central point on the document feature map and the position offset of the projection position;

the table vertex first candidate position determining unit is used for determining a first candidate position of a table vertex on the document image according to a projection position corresponding to the table vertex on the document feature map and a position offset of the projection position;

the table vertex second candidate position determining unit is used for obtaining a second candidate position of the table vertex on the document image according to the projection position of the table central point and the displacement between the table vertex and the table central point on the document image;

a table vertex position determination unit configured to determine a second position of the table vertex according to the first candidate position and the second candidate position.

Illustratively, the prediction information further includes a width of the table and a height of the table on the document image, and the table vertex position determining unit is specifically configured to: determining a reference frame of the table on the document image according to the second position of the table center point, the width of the table on the document image and the height of the table; if the first candidate positions of all the table vertexes are in the reference frame, determining the first candidate positions of the table vertexes as second positions of the table vertexes; and if the first candidate position of the table vertex is outside the reference frame, determining that the second candidate position of the table vertex is the second position of the table vertex.

Illustratively, the form detection apparatus further comprises a form detection model training module.

The table detection model training module is used for acquiring a plurality of training samples, wherein at least one part of the training samples are document images containing tables, and the document images containing the tables are provided with the labeling information of the key points of the tables; obtaining a standard second position of the form key point according to the marking information of the form key point; converting the standard second locations of the table keypoints to obtain standard first locations of the table keypoints according to a ratio of the second size to the first size; converting the standard first position of the table key point according to the ratio of the first size to the second size to correspondingly obtain a standard projection position; determining the position offset of the standard projection position according to the standard second position of the form key point and the standard projection position corresponding to the form key point; determining labels of all the training samples, wherein the labels of the document image containing the table at least comprise position correction quantities of a first position of the standard and a projection position of the standard; extracting a sub-network based on the feature map of the table detection model, and extracting a document feature map of a training sample; determining prediction information of the training samples according to the document feature map of the training samples based on a prediction sub-network of the table detection model; and adjusting the network parameters of the table detection model according to the error between the label of the training sample and the prediction information of the training sample.

Referring to fig. 4, fig. 4 is a schematic diagram of a computer device according to an embodiment of the present disclosure. The computer device may be a server or a terminal.

As shown in fig. 4, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a nonvolatile storage medium and an internal memory.

The non-volatile storage medium may store an operating system and a computer program. The computer program includes program instructions that, when executed, cause a processor to perform any one of the detection model-based table detection methods.

The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment.

The internal memory provides an environment for the execution of a computer program on a non-volatile storage medium, which when executed by the processor, causes the processor to perform any one of the methods for table detection based on a detection model.

The network interface is used for network communication, such as sending assigned tasks and the like. Those skilled in the art will appreciate that the configuration of the computer apparatus is merely a block diagram of a portion of the configuration associated with aspects of the present application and is not intended to limit the computer apparatus to which aspects of the present application may be applied, and that a particular computer apparatus may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein, in some embodiments, the processor is configured to execute a computer program stored in the memory to implement the steps of: acquiring a document image, wherein the size of the document image is a preset first size; extracting a document feature map from the document image based on a feature map extraction sub-network of a table detection model, wherein the size of the document feature map is a preset second size, and the first size is larger than the second size; determining prediction information of the document feature map based on a prediction subnetwork of the table detection model, wherein the prediction information at least comprises a first position of a table key point on the document feature map and a position offset of a projection position of the first position on the document image; determining a second position of the form key point on the document image according to the prediction information based on a preset position determination rule; and determining a table area in the document image according to the second positions of the plurality of table key points.

Illustratively, the processor is configured to implement a feature map extraction sub-network based on a table detection model, and when extracting a document feature map from the document image, implement: inputting the document image into a backbone sub-network of the feature map extraction sub-network to obtain a primary feature map; and inputting the primary feature map into an up-sampling sub-network of the feature map extraction sub-network to obtain the document feature map.

Illustratively, the processor is configured to implement a prediction sub-network based on the table detection model, and when determining the prediction information of the document feature map, implement: inputting the document feature map into a first branch sub-network of the prediction sub-network to obtain a first position of a table center point on the document feature map; inputting the document feature map into a second branch subnetwork of the prediction subnetwork to obtain a first position of a table vertex on the document feature map; inputting the document feature map into a third branch sub-network of the prediction sub-network to obtain the offset of a projection position corresponding to a grid center point on the document image; and inputting the document feature map into a fourth branch sub-network of the prediction sub-network to obtain the offset of the projection position corresponding to the top point of the table on the document image.

In some embodiments, the processor is configured to implement the second table keypoint location determination rule based on the predicted information when determining the second table keypoint location on the document image, to implement: converting the first position of the table key point according to the ratio of the first size to the second size to obtain the projection position of the first position on the document image; and determining a second position of the form key point on the document image according to the projection position corresponding to the form key point and the position offset of the projection position.

In some other embodiments, the prediction information further includes a displacement between a table vertex and a table center point on the document image, and the processor is configured to implement the preset-based position determination rule, and when determining a second position of the table key point on the document image according to the prediction information, implement the conversion of the first positions of all the table key points according to a ratio of the first size to the second size to obtain projection positions of all the first positions on the document image; determining a second position of the table center point on the document image according to the projection position corresponding to the table center point on the document feature map and the position offset of the projection position; determining a first candidate position of a table vertex on the document image according to a projection position corresponding to the table vertex on the document feature map and a position offset of the projection position; obtaining a second candidate position of the table vertex on the document image according to the projection position of the table central point and the displacement between the table vertex and the table central point on the document image; determining a second position of the table vertex from the first candidate position and the second candidate position.

In some embodiments, the prediction information further includes a width of the table and a height of the table on the document image, and processing for enabling the determination of the second position of the table vertex from the first candidate position and the second candidate position enables: determining a reference frame of the table on the document image according to the second position of the table center point, the width of the table on the document image and the height of the table; if the first candidate positions of all the table vertexes are in the reference frame, determining the first candidate positions of the table vertexes as second positions of the table vertexes; and if the first candidate position of the table vertex is outside the reference frame, determining that the second candidate position of the table vertex is the second position of the table vertex.

Illustratively, the processor is further configured to implement: obtaining a plurality of training samples, wherein at least one part of the training samples are document images containing tables, and the document images containing the tables are provided with labeling information of key points of the tables; obtaining a standard second position of the form key point according to the marking information of the form key point; converting the standard second locations of the table keypoints to obtain standard first locations of the table keypoints according to a ratio of the second size to the first size; converting the standard first position of the table key point according to the ratio of the first size to the second size to correspondingly obtain a standard projection position; determining the position offset of the standard projection position according to the standard second position of the form key point and the standard projection position corresponding to the form key point; determining labels of all the training samples, wherein the labels of the document image containing the table at least comprise position correction quantities of a first position of the standard and a projection position of the standard; extracting a sub-network based on the feature map of the table detection model, and extracting a document feature map of a training sample; determining prediction information of the training samples according to the document feature map of the training samples based on a prediction sub-network of the table detection model; and adjusting the network parameters of the table detection model according to the error between the label of the training sample and the prediction information of the training sample.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application, such as:

a computer-readable storage medium, where a computer program is stored, where the computer program includes program instructions, and the processor executes the program instructions to implement any one of the detection model-based table detection methods provided in the embodiments of the present application.

The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device.

While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A form inspection method based on an inspection model, the method comprising:

2. The form detection method based on detection model of claim 1, wherein the feature map extraction sub-network based on form detection model extracts a document feature map for the document image, and comprises:

inputting the document image into a backbone sub-network of the feature map extraction sub-network to obtain a primary feature map;

and inputting the primary feature map into an up-sampling sub-network of the feature map extraction sub-network to obtain the document feature map.

3. The form inspection method of claim 1, wherein determining the prediction information of the document feature map based on the prediction sub-network of the form inspection model comprises:

inputting the document feature map into a first branch sub-network of the prediction sub-network to obtain a first position of a table center point on the document feature map;

inputting the document feature map into a second branch subnetwork of the prediction subnetwork to obtain a first position of a table vertex on the document feature map;

inputting the document feature map into a third branch sub-network of the prediction sub-network to obtain the offset of a projection position corresponding to a grid center point on the document image;

and inputting the document feature map into a fourth branch sub-network of the prediction sub-network to obtain the offset of the projection position corresponding to the top point of the table on the document image.

4. The form detection method based on detection model according to any one of claims 1 to 3, wherein the determining a second position of the form key point on the document image according to the prediction information based on a preset position determination rule comprises:

converting the first position of the table key point according to the ratio of the first size to the second size to obtain the projection position of the first position on the document image;

and determining a second position of the form key point on the document image according to the projection position corresponding to the form key point and the position offset of the projection position.

5. A method for form inspection based on an inspection model according to any one of claims 1-3, characterized in that:

the prediction information further comprises displacement between table vertices and table center points on the document image;

the determining, based on a preset position determination rule, a second position of the form key point on the document image according to the prediction information includes:

converting the first positions of all the table key points according to the ratio of the first size to the second size to obtain the projection positions of all the first positions on the document image;

determining a second position of the table center point on the document image according to the projection position corresponding to the table center point on the document feature map and the position offset of the projection position;

determining a first candidate position of a table vertex on the document image according to a projection position corresponding to the table vertex on the document feature map and a position offset of the projection position;

obtaining a second candidate position of the table vertex on the document image according to the projection position of the table central point and the displacement between the table vertex and the table central point on the document image;

determining a second position of the table vertex from the first candidate position and the second candidate position.

6. The form inspection method based on inspection model of claim 5, characterized in that:

the prediction information further includes a width of the table and a height of the table on the document image;

the determining a second position of the table vertex from the first candidate position and the second candidate position includes:

determining a reference frame of the table on the document image according to the second position of the table center point, the width of the table on the document image and the height of the table;

if the first candidate positions of all the table vertexes are in the reference frame, determining the first candidate positions of the table vertexes as second positions of the table vertexes;

and if the first candidate position of the table vertex is outside the reference frame, determining that the second candidate position of the table vertex is the second position of the table vertex.

7. The inspection model-based form inspection method of any one of claims 1-3, further comprising:

obtaining a plurality of training samples, wherein at least one part of the training samples are document images containing tables, and the document images containing the tables are provided with labeling information of key points of the tables;

obtaining a standard second position of the form key point according to the marking information of the form key point;

converting the standard second locations of the table keypoints to obtain standard first locations of the table keypoints according to a ratio of the second size to the first size;

converting the standard first position of the table key point according to the ratio of the first size to the second size to correspondingly obtain a standard projection position;

determining the position offset of the standard projection position according to the standard second position of the form key point and the standard projection position corresponding to the form key point;

determining labels of all the training samples, wherein the labels of the document image containing the table at least comprise position correction quantities of a first position of the standard and a projection position of the standard;

extracting a sub-network based on the feature map of the table detection model, and extracting a document feature map of a training sample;

determining prediction information of the training samples according to the document feature map of the training samples based on a prediction sub-network of the table detection model;

and adjusting the network parameters of the table detection model according to the error between the label of the training sample and the prediction information of the training sample.

8. A form inspection apparatus based on an inspection model, the apparatus comprising:

the image acquisition module is used for acquiring a document image, and the size of the document image is a preset first size;

9. A computer device, wherein the computer device comprises a memory and a processor;

the memory for storing a computer program;

the processor for executing the computer program and implementing the detection model based table detection method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored, which, if executed by a processor, implements a detection model-based table detection method according to any one of claims 1 to 7.