CN114529924A

CN114529924A - Table positioning method and device and related equipment

Info

Publication number: CN114529924A
Application number: CN202210176706.8A
Authority: CN
Inventors: 张大千; 毛瑞彬; 朱菁; 郭润婷; 蒋阳; 尚东东; 潘斌强; 杨雯雯; 张俊; 杨建明
Original assignee: SHENZHEN SECURITIES INFORMATION CO Ltd
Current assignee: SHENZHEN SECURITIES INFORMATION CO Ltd
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2022-05-24

Abstract

The application discloses a form positioning method, which comprises the steps of obtaining sample data containing form information; carrying out positive and negative sample equalization processing on the sample data to obtain processed sample data; training by using the processed sample data to obtain a table positioning model; carrying out table positioning on the document to be processed by utilizing the table positioning model to obtain each table prediction frame; and screening all the table prediction boxes by using a Confluent algorithm based on a cross-over ratio to obtain an optimal table prediction box. By applying the technical scheme provided by the application, the positioning precision of the document table can be effectively improved. The application also discloses a table positioning device, equipment and a computer readable storage medium, which have the beneficial effects.

Description

Table positioning method and device and related equipment

Technical Field

The present application relates to the field of document processing technologies, and in particular, to a table positioning method, and further, to a table positioning apparatus, a device, and a computer-readable storage medium.

Background

With the mass increase of PDF documents in the financial industry, a great deal of time and energy are required to be invested in the traditional manual processing method, and the daily work requirement cannot be met. In order to improve the efficiency of auditing financial documents, financial companies have begun to take the way to intelligentize, and in particular, it has become a trend around the electronization and intelligentization of open financial documents. The table parsing is an important branch of the intellectualization of the disclosed financial documents, and the flow of the table parsing mainly comprises table positioning, table restoration and content extraction, wherein the table positioning is the first step of the table positioning. However, because the forms are designed and have various types and rich features, the existing single-mode recognition mode cannot effectively ensure the positioning accuracy of the forms at all.

Therefore, how to effectively improve the positioning accuracy of the document table is an urgent problem to be solved by those skilled in the art.

Disclosure of Invention

The form positioning method can effectively improve the positioning precision of the document form; another object of the present application is to provide a table positioning apparatus, a device and a computer readable storage medium, all of which have the above advantages.

In a first aspect, the present application provides a table locating method, including:

acquiring sample data containing table information;

carrying out positive and negative sample equalization processing on the sample data to obtain processed sample data;

training by using the processed sample data to obtain a table positioning model;

carrying out table positioning on the document to be processed by utilizing the table positioning model to obtain each table prediction frame;

and screening all the table prediction boxes by using a Confluent algorithm based on a cross-over ratio to obtain an optimal table prediction box.

Preferably, the performing positive and negative sample equalization processing on the sample data to obtain processed sample data includes:

generating each preset anchor frame according to the sample data, and calculating the intersection ratio of each preset anchor frame and the real frame;

calculating a first cross-over ratio threshold according to a positive sample selection formula, wherein the positive sample selection formula is as follows:

IoU₁＝m+ln(1-v)；

wherein, the IoU₁The first intersection ratio threshold value is set, m represents the intersection ratio mean value of the real frame and the preset anchor frame, and v represents the intersection ratio difference of the real frame and the preset anchor frame;

taking a preset anchor frame with the intersection ratio exceeding the first intersection ratio threshold value as a positive sample;

and taking a preset anchor frame of which the intersection ratio does not exceed the first intersection ratio threshold value as a negative sample.

Preferably, the table locating method further includes:

determining each prediction box according to the sample data;

and taking the prediction frames with the preset number which are closest to the central point of the real frame as the positive samples.

Preferably, after the preset anchor frame in which the intersection ratio does not exceed the first intersection ratio threshold is taken as a negative sample, the method further includes:

and deleting the negative samples of which the intersection ratio exceeds the second intersection ratio threshold from all the negative samples.

Preferably, the screening among all the table prediction boxes by using a concoluence algorithm based on cross-over ratio to obtain an optimal table prediction box comprises:

generating a first set of prediction boxes based on all of the table prediction boxes;

selecting any table prediction frame from the first prediction frame set as a first table prediction frame, calculating the intersection ratio of the first table prediction frame and other table prediction frames, and deleting the other table prediction frames with the intersection ratio of zero and the first table prediction frame to obtain a second prediction frame set;

calculating a confidence degree weighted intersection ratio of the first table prediction frame and each table prediction frame in the second prediction frame set, and taking the table prediction frame with the confidence degree weighted intersection ratio exceeding a first threshold value as the optimal table prediction frame;

calculating the intersection ratio of the optimal table prediction frame and other table prediction frames in the second prediction frame set, and deleting other table prediction frames with the intersection ratio exceeding a second threshold value to obtain a third prediction frame set;

deleting the first table prediction box from the first prediction box set, and returning to the step of selecting any table prediction box in the first prediction box set as the first table prediction box until the first prediction box set is empty, so as to obtain all the optimal table prediction boxes.

Preferably, the training to obtain the table positioning model by using the processed sample data includes:

combining an Involution algorithm and a single-stage feature detector to construct an initial network model;

performing model training on the initial network model by using the processed sample data to obtain a table positioning model with a preset evaluation index lower than a preset threshold; the preset evaluation index is specifically the alignment precision of the real frame and the prediction frame.

Preferably, before performing the positive and negative sample equalization processing on the sample data and obtaining the processed sample data, the method further includes:

and performing copying operation and deleting operation on the table in the sample data according to the row and column layout of the table to obtain the data-enhanced sample data.

In a second aspect, the present application further discloses a form positioning apparatus, comprising:

the sample acquisition module is used for acquiring sample data containing table information;

the sample balancing module is used for carrying out positive and negative sample balancing processing on the sample data to obtain processed sample data;

the model training module is used for training by utilizing the processed sample data to obtain a table positioning model;

the table positioning module is used for carrying out table positioning on the document to be processed by utilizing the table positioning model to obtain each table prediction frame;

and the prediction box screening module is used for screening all the table prediction boxes to obtain an optimal table prediction box by utilizing a Confluent algorithm based on a cross-over ratio.

In a third aspect, the present application also discloses a form positioning apparatus, including:

a memory for storing a computer program;

a processor for implementing the steps of any of the table locating methods described above when executing the computer program.

In a fourth aspect, the present application also discloses a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the table locating methods described above.

The table positioning method comprises the steps of obtaining sample data containing table information; carrying out positive and negative sample equalization processing on the sample data to obtain processed sample data; training by using the processed sample data to obtain a table positioning model; carrying out table positioning on the document to be processed by utilizing the table positioning model to obtain each table prediction frame; and screening all the table prediction boxes by using a Confluent algorithm based on a cross-over ratio to obtain an optimal table prediction box.

By applying the technical scheme provided by the application, after a large amount of sample data is obtained, positive and negative sample balancing processing is firstly carried out on the sample data to obtain high-quality sample data with the positive and negative sample balancing, then model training is carried out by utilizing the high-quality sample data to obtain a table positioning model with high precision, therefore, in the table positioning process, the table positioning model can be directly utilized to position each table prediction box from a document to be processed, and finally, Confluey algorithm based on cross-over ratio is utilized to screen and obtain the optimal table prediction box in all the table prediction boxes, so that the optimal result is screened and obtained from the model prediction result, the table positioning based on a neural network model is further realized, and the positioning precision of the document table is effectively improved.

The table positioning device, the equipment and the computer readable storage medium provided by the application all have the beneficial effects, and are not described again.

Drawings

In order to more clearly illustrate the technical solutions in the prior art and the embodiments of the present application, the drawings that are needed to be used in the description of the prior art and the embodiments of the present application will be briefly described below. Of course, the following description of the drawings related to the embodiments of the present application is only a part of the embodiments of the present application, and it will be obvious to those skilled in the art that other drawings can be obtained from the provided drawings without any creative effort, and the obtained other drawings also belong to the protection scope of the present application.

Fig. 1 is a schematic flowchart of a table locating method provided in the present application;

FIG. 2 is a block diagram of a framework of a form-oriented model provided herein;

fig. 3 is a schematic structural diagram of a DardNet network in a table alignment model provided in the present application;

FIG. 4 is a schematic structural diagram of an FPN + network in a table alignment model provided in the present application;

FIG. 5 is a diagram illustrating a table data enhancement method provided in the present application;

FIG. 6 is a schematic structural diagram of a table positioning device provided in the present application;

fig. 7 is a schematic structural diagram of a table locating apparatus provided in the present application.

Detailed Description

The core of the application is to provide a form positioning method, which can effectively improve the positioning precision of the document form; another core of the present application is to provide a table positioning apparatus, a device and a computer readable storage medium, which also have the above advantages.

In order to more clearly and completely describe the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides a table positioning method.

Referring to fig. 1, fig. 1 is a schematic flow chart of a table locating method provided in the present application, where the table locating method includes:

s101: sample data containing table information is acquired.

The step aims at obtaining the sample data, and the technical scheme provided by the application aims at realizing table positioning, so the sample data is the sample data containing table information. It will be appreciated that the sample data is used to implement model training to train to arrive at a neural network model for implementing table positioning, and therefore, to implement model training, the sample data should include both positive and negative sample data.

Of course, the sample data acquisition mode is not unique, and can be realized by acquiring a large number of documents, which is not limited in the present application. It is conceivable that the greater the number of sample data, the more accurate model training is facilitated, so that a neural network model with higher precision is obtained, and the table positioning precision is further improved.

S102: and carrying out positive and negative sample equalization processing on the sample data to obtain the processed sample data.

The step aims to realize the equalization processing of the positive and negative samples so as to effectively ensure the equalization of the positive sample and the negative sample in the sample data, thereby effectively improving the model precision. It can be understood that, the positive and negative sample equalization processing is performed here, specifically, the positive sample data and the negative sample data in the sample data reach a state of equalized quantity, for example, when the quantity of the positive sample data is more, and the quantity of the negative sample data is less, the quantity of the negative sample data can be increased; when the number of positive sample data is small and the number of negative sample data is large, the number of positive sample data can be increased.

The specific implementation manner of the positive and negative Sample equalization processing is not unique, and the positive and negative Sample equalization processing can be achieved, which is not limited in the present application, and for example, the positive and negative Sample equalization processing can be implemented by using an ATSS (Adaptive Training Sample Selection) algorithm.

S103: training by using the processed sample data to obtain a table positioning model;

the step aims to realize model training to obtain a table positioning model, and the table positioning model is used for realizing table prediction and further realizing acquisition of a table prediction frame. Specifically, on the basis of obtaining sample data of positive and negative sample equalization processing, the high-quality sample data can be used for model training to obtain a table positioning model meeting requirements. It can be understood that the specific type of the table positioning model does not affect the implementation of the present technical solution, and the table prediction may be implemented, for example, the table positioning model may be a neural network model based on the invention algorithm, and the present application does not limit this. In addition, the training process of the table positioning model only needs to refer to the prior art, and the details are not repeated herein.

S104: carrying out table positioning on the document to be processed by using a table positioning model to obtain each table prediction frame;

the method aims to realize table positioning by using a table positioning model so as to predict each table prediction frame from a document to be processed. The document to be processed is a document that needs to be subjected to form positioning, and the specific type of the document does not affect the implementation of the technical scheme. Certainly, the source of the document to be processed is not unique, and the document to be processed can be directly input by a user or downloaded from a network according to a user instruction, which is not limited in the present application. Specifically, after the document to be processed is obtained, the document to be processed can be input into the form positioning model, and the model output is each form prediction box in the document to be processed.

S105: and screening all the table prediction boxes by using a Confluent algorithm based on cross-over ratio to obtain an optimal table prediction box.

The method aims to realize the screening of the table prediction frames so as to screen all the table prediction frames output by the table positioning model to obtain the optimal table prediction frame, thereby realizing the screening of the model prediction results to obtain the optimal results and effectively ensuring the accuracy of the table positioning results. It can be understood that, for a to-be-processed document, the number of tables contained therein is determined by the document content, that is, the number of tables actually contained in the to-be-processed document is not unique, and on this basis, when the optimal table prediction box screening is performed, the number of optimal table prediction boxes obtained by the optimal table prediction box screening is also not unique.

The screening process of the optimal table prediction box can be realized by using a Confluent algorithm based on a cross-over ratio. Specifically, the Confluent algorithm proposes to measure the degree of overlap between two sets of bounding boxes by using Manhattan distance, and the formula is defined as:

wherein p is_(i,j)Indicates the degree of overlap of two sets of bounding boxes (bounding box i and bounding box j), MH indicates the Manhattan distance, i_uCoordinates (x) of the upper left corner representing bounding box i₁,y₁)，j_vCoordinates (p) of the upper left corner of the bounding box j₁,q₁)，i_mThe lower right corner coordinate (x) representing the bounding box i₂,y₂)，j_nThe lower right corner coordinate (p) of bounding box j₂,q₂)。

However, due to the normalized bounding box pairs in the Confluent algorithm p_i,jAfter filtering < 2, outliers still exist, which may affect the selection of the optimal bounding box within the cluster. All coordinates of bounding box pairs in Confluent are located between (0,1) after maximum and minimum normalization, and the Confluent algorithm considers p to be_i,jThe < 2 bounding box pairs are intersecting, however, in special cases, there is p_i,jThe bounding box of < 2 is for the disjoint case. Based on this, since there is no overlap in the disclosed document tables, the confluent + algorithm chooses to introduce IoU (cross-over ratio) instead of manhattan distance to solve the above-mentioned outlier problem of the confluent algorithm, i.e. model training is performed using the confluent algorithm based on the cross-over ratio.

It can be understood that the advantages of the Confluence algorithm based on the cross-over ratio are as follows: on one hand, the method can effectively ensure that the boundary box pairs in the screened cluster are in pairwise intersection, improves the consistency of the boundary boxes, deletes normalization operation and simplifies post-processing algorithm; on the other hand, the algorithm does not depend on the confidence score singly, but adopts the confidence weighted IoU value to select the optimal bounding box, so that false positives can be effectively reduced, and the detection accuracy is improved.

It should be noted that, the above-mentioned S101 to S103 are the obtaining process of the table positioning model, and the process is only required to be executed once, and in the actual application process, the trained table positioning model is directly called to process the document to be processed, where the trained table positioning model may be prestored in the corresponding storage space so as to be directly called. Certainly, in order to effectively ensure the precision of the model and the accuracy of the table positioning result, the model can be optimized at regular time or irregular time.

Therefore, in the form positioning method provided by the application, after a large amount of sample data is obtained, positive and negative sample balancing processing is firstly carried out on the sample data to obtain high-quality sample data with the positive and negative sample balancing, then model training is carried out by utilizing the high-quality sample data to obtain a form positioning model with high precision, therefore, in the form positioning process, the form positioning model can be directly utilized to position each form prediction box from a document to be processed, and finally, Confluene algorithm based on cross-over ratio is utilized to screen all the form prediction boxes to obtain the optimal form prediction box, so that the optimal result is screened from the model prediction result, further, the form positioning based on a neural network model is realized, and the positioning precision of the document form is effectively improved.

In an embodiment of the application, the performing positive and negative sample equalization processing on the sample data to obtain the processed sample data may include: generating each preset anchor frame according to the sample data, and calculating the intersection and parallel ratio of each preset anchor frame and the real frame; calculating a first cross-over ratio threshold according to a positive sample selection formula, wherein the positive sample selection formula is as follows:

IoU₁＝m+ln(1-v)；

wherein, IoU₁Is the first cross-over ratio threshold value,m represents the intersection comparison mean value of the real frame and the preset anchor frame, and v represents the intersection comparison difference of the real frame and the preset anchor frame;

taking a preset anchor frame with the intersection ratio exceeding a first intersection ratio threshold value as a positive sample; and taking the preset anchor frame with the intersection ratio not exceeding the first intersection ratio threshold value as a negative sample.

The embodiment of the application provides a method for realizing positive and negative sample equalization processing. As described above, the positive and negative sample equalization processing may be implemented by using an ATSS algorithm, where the ATSS algorithm may calculate a sum of a mean value and a standard deviation according to IoU values of preset anchor boxes and real boxes on different feature layers, and use the sum as a IoU threshold for selecting a positive sample, select a preset anchor box IoU of which is greater than the threshold as a positive sample, and set the remaining preset anchor boxes as negative samples; the IoU threshold for negative samples is set again, and the prediction anchor box IoU greater than the threshold is changed to ignore samples. Therefore, the ATSS algorithm does not need to specify a threshold value for judgment, and selects the positive sample by dynamically and automatically generating the threshold value, so that the function of self-adaptive equalization of the sample is realized to a certain extent.

However, when the IoU values of the real box and the pre-set anchor box are extremely unbalanced, the sum of the mean and the standard deviation is too large, so that no pre-set anchor box satisfies the condition. Based on this, in order to further increase the number of positive samples, the ATSS algorithm may be optimized, and the above improved positive sample selection formula is proposed, which is smoother than the formula before optimization, and can better handle the situation of IoU value distribution imbalance, thereby better selecting positive samples on different feature layers.

Therefore, after determining each preset anchor frame based on sample data and calculating to obtain IoU of each preset anchor frame and the real frame, a first intersection ratio threshold value can be calculated based on the formula, where the first intersection ratio threshold value is a IoU threshold value for selecting positive sample data, and further, when IoU of the preset anchor frame and the real frame exceeds the first intersection ratio threshold value, the preset anchor frame can be used as a positive sample, and when IoU of the preset anchor frame and the real frame does not exceed the first intersection ratio threshold value, the preset anchor frame can be used as a negative sample.

In an embodiment of the present application, the table locating method may further include: determining each prediction box according to the sample data; and taking the prediction frames with the preset number which are closest to the center point of the real frame as positive samples.

After the positive and negative sample equalization processing is performed on the sample data in the previous embodiment, the number of the negative sample data is still far larger than that of the positive sample data, so that the positive sample data can be expanded to obtain more positive sample data. Specifically, each prediction box in the sample data may be determined, then the distance between the center point of each prediction box and the center point of the real box is calculated, and finally a certain number of prediction boxes closest to the center point of each prediction box are used as positive samples, so as to further increase the number of positive samples. It will be appreciated that the closer the center point is, the greater the likelihood that the predicted box is a true box.

In an embodiment of the application, after the preset anchor frame with the intersection ratio not exceeding the first intersection ratio threshold is taken as a negative sample, the method may further include: and deleting the negative samples with the intersection ratio exceeding the second intersection ratio threshold value from all the negative samples.

Besides expanding the positive sample data to obtain more positive sample data, the ATSS algorithm can be referred to screen the negative sample data to reduce the number of negative samples and achieve the balance of positive and negative samples. Specifically, after the preset anchor frame with the intersection ratio not exceeding the first intersection ratio threshold is used as the negative sample, a second intersection ratio threshold, namely an IoU threshold for screening the negative sample, may be further set in all the negative samples, and then the negative sample with the intersection ratio exceeding the second intersection ratio threshold is deleted, that is, the negative sample with the intersection ratio exceeding the second intersection ratio threshold is modified into the ignore sample, so as to achieve the purposes of reducing the number of the negative samples and improving the quality of the negative sample.

In addition, the specific value of the first intersection ratio threshold and the second intersection ratio threshold does not affect the implementation of the technical scheme, and the specific value is set by a technician according to the actual situation, which is not limited in the present application.

In an embodiment of the present application, the screening of all table prediction boxes to obtain an optimal table prediction box by using a Confluence algorithm based on a cross-over ratio may include: generating a first set of prediction boxes based on all the table prediction boxes; selecting any table prediction frame from the first prediction frame set as a first table prediction frame, calculating the intersection ratio of the first table prediction frame and other table prediction frames, and deleting the other table prediction frames with the intersection ratio of zero and the first table prediction frame to obtain a second prediction frame set; calculating the weighted intersection ratio of the confidence degrees of the first form prediction frame and each form prediction frame in the second prediction frame set, and taking the form prediction frame with the weighted intersection ratio of the confidence degrees exceeding a first threshold value as an optimal form prediction frame; calculating the intersection ratio of the optimal table prediction frame and other table prediction frames in the second prediction frame set, and deleting other table prediction frames with the intersection ratio exceeding a second threshold value to obtain a third prediction frame set; and deleting the first table prediction frame from the first prediction frame set, and returning to the step of selecting any table prediction frame in the first prediction frame set as the first table prediction frame until the first prediction frame set is empty, so as to obtain all the optimal table prediction frames.

The embodiment of the application provides a screening method of an optimal table prediction box. Specifically, after obtaining the table prediction boxes in the document to be processed based on the table positioning model, a prediction box set, i.e., the first prediction box set, is generated based on the table prediction boxes, and the first prediction box set is assumed to be B ═ B₁,b₂,...,b_nAnd traversing the first prediction frame set, and selecting any table prediction frame b from the first prediction frame set_iAs a first table prediction box, and for each group (b)_i,b_j) (j values 1-n and j ≠ i) calculation IoU_i,jValue of IoU_i,jDelete the table prediction box of 0 and put the first table prediction box b_iAfter the prediction frames are removed, obtaining a prediction frame set after primary screening, namely the second preset frame set; further, assuming that the second predetermined frame set is B ', for each predicted frame in the second predetermined frame set B', calculating the predicted frame and the first table predicted frame B_iThe confidence weighted cross-over ratio, the confidence weighted cross-over ratio calculation formula is:

wherein (b)_i,b_j) Representing two sets of prediction boxes, c representing confidence scores, WP_(i,j)Representing a confidence weighted cross-over ratio.

Therefore, after obtaining the weighted cross-over ratio of the confidence degrees, the weighted cross-over ratio is compared with a preset threshold value, namely the first threshold value, and if the weighted cross-over ratio exceeds the first threshold value, the corresponding table prediction frame (the table prediction frame in the second preset frame set B', the non-first table prediction frame) can be regarded as the "optimal" table prediction frame. On the basis, the intersection ratio of the optimal table prediction frame and other table prediction frames in the second prediction frame set B' is calculated, and other table prediction frames with the intersection ratio exceeding a second threshold value are deleted, so that other table prediction frames highly overlapped with the optimal table prediction frame are removed, and a third prediction frame set is obtained. Further, deleting the first table prediction frame from the first prediction frame set, reselecting any table prediction frame from the first prediction frame set in which the first table prediction frame is deleted as a new first table prediction frame, and repeatedly executing the steps until the first prediction frame set is traversed, namely the first prediction frame set is empty, and obtaining all optimal table prediction frames.

Similarly, the specific values of the first threshold and the second threshold do not affect the implementation of the technical scheme, and the setting may be performed by a technician according to the actual situation, which is not limited in the present application.

In an embodiment of the application, the training to obtain the table positioning model by using the processed sample data may include: combining an Involution algorithm and a single-stage feature detector to construct an initial network model; performing model training on the initial network model by using the processed sample data to obtain a table positioning model with a preset evaluation index lower than a preset threshold; the preset evaluation index is specifically the alignment accuracy of the real frame and the prediction frame.

The embodiment of the application provides a specific type of table positioning model, namely a table positioning model based on an Involution algorithm and a single-stage feature detector. Specifically, an initial network model is constructed by using an Involution algorithm and a single-stage feature detector, then the initial network model is trained by using high-quality sample data after sample equalization processing, and a table positioning model meeting requirements, namely the table positioning model with the preset evaluation index lower than a preset threshold value, is obtained by combining preset evaluation indexes.

On the basis, the embodiment of the application provides a form positioning model of a specific framework structure. Referring to fig. 2, fig. 2 is a frame structure diagram of a table positioning model provided in the present application, and the specific content of the network structure is as follows:

1. backbone part (Backbone network):

in the task of table detection, besides the basic features of lines and characters, the important features of the table such as row-column relationship, indentation, and cell background are also included. However, conventional convolution has spatial invariance, a property that deprives the convolution kernel of the ability to learn different visual information at different spatial locations. In order to fully extract the table space layout characteristics, an involution operator (a neural network operator) can be introduced into a backbone network, and a DardNet network is proposed. As shown in fig. 3, fig. 3 is a schematic structural diagram of a DardNnet network in a table alignment model provided in the present application, the structure of the DardNnet network is composed of two parts, namely, a DardNnet Conv (DardNnet convolution module) and an evolution Residual module (evolution Residual module), wherein the DardNnet Conv is composed of 7 × 7 convolution operators, 2 convolutions of 3 × 3, a BatchNorm (regularization layer) and a ReLU (activation function), and the evolution Residual is composed of 5 Residual modules, wherein the Residual modules are composed of 2 convolutions of 7 × 7 convolution operators and 1 convolution of 1 × 1.

Since the invaolution is to generate an invaolution kernel (invaolution convolution kernel) by performing convolution on a single feature point pixel, the spatial layout feature of the table is learned. The process of Involution generating an Involution convolution kernel has a certain relation with self-attention, which can enable the interested pixel to interact with the surrounding pixels and capture the long-distance relation. In addition, invaluation has the property of performing convolution kernel parameter sharing among channels, so that the possibility of using a large convolution kernel is provided, the receptive field of the model is improved, more context information is learned, and the overall layout information of the table is acquired.

2. The hack part (part between the backbone network and the prediction layer):

since the form of the disclosed document belongs to a medium-large object, a feature map sampled 32 times is sufficient for form detection. Therefore, the feature map pyramid structure network FPN + of the single-stage feature detector can be selected, 32 times of down-sampled feature maps are input, and 13 × 13 detection layers are output, so that the network structure is simplified, and the model convergence speed is accelerated. As shown in fig. 4, fig. 4 is a schematic structural diagram of an FPN + network in a table positioning model provided by the present application, in order to guarantee a detection effect while increasing a speed, an expansion residual error module is adopted for the FPNs +, and the network mainly includes two blocks: project layer (projection layer) and scaled Residual block (hole Residual block), wherein the Project layer is composed of a 1x1 and a 3x3 convolutional layer; the scaled Residual block is composed of 5 Residual blocks with different expansion rates.

Further, for the table positioning model, the loss function mainly comprises a bounding box regression loss function, a classification loss function and a confidence coefficient loss function. The GIoULoss may be selected by the bounding box regression loss function because the norm of the coordinate L2 is sensitive to the scale of the bounding box, for example, when the norm of the coordinate L2 of the large-scale prediction box is equal to the norm of the coordinate L2 of the small-scale prediction box, the corresponding IoU of the large-scale prediction box and the small-scale prediction box may be greatly different, and the GIoU may not only reflect the real distance when the prediction box does not intersect with the real box, but also optimize IoU the problem of zero gradient of the loss function, which is expressed as follows:

wherein, A and B respectively represent two frames, C represents the union of the A frame and the B frame, and A ^ B represents the intersection of the A frame and the B frame.

Then, the GIoU-based bounding box regression loss function is:

L_GIoU＝λ_coord(1-GIoU)；

λ_coord＝2-t_wt_h；

wherein L is_GIoURepresents a bounding box regression loss function based on GIoU, which represents an overlap measure, t_wAnd t_hWidth and height of characteristic diagram respectively representing model prediction, lambda_coordWeight coefficients representing a bounding box regression loss function.

Due to lambda_coord＝2-t_wt_hThen, when t is_w*t_hThe smaller the weight coefficient is, the larger the regression loss function of the corresponding small target frame is, and the weight coefficient can improve the occupation ratio of the small target frame by improving the regression loss value of the small target frame.

However, in the form detection task of the disclosure document, detection is mainly performed on a medium-large target frame. In order to increase the weight ratio of the large target frame and decrease the weight ratio of the small target frame, a new weight coefficient can be adopted

The final bounding box regression loss function is then:

in addition, the preset evaluation index is set for effectively ensuring the accuracy of the model, and is used for training the table positioning model with the obtained evaluation index meeting the preset condition, wherein the preset evaluation index specifically can be the alignment accuracy of the real frame and the predicted frame. It can be appreciated that unlike the target detection task, table detection requires accurate boundary segmentation, and therefore the degree of fit of table data to the boundary of the prediction box is of greater concern. On the basis, in order to measure the fitting degree of the table boundary more accurately, an EoB evaluation index (a model evaluation index) can be introduced, which is based on the principle of calculating the accuracy of the alignment of the target frame and the prediction frame, and the calculation formula is as follows:

where i and j represent the prediction box and the real box, respectively,

the coordinates of the upper left corner of the real box are represented,

the coordinates of the lower right corner of the real box are represented,

the coordinates of the upper left corner of the prediction box are represented,

representing the coordinates of the lower right corner of the prediction box.

In an embodiment of the application, before performing positive and negative sample equalization processing on the sample data and obtaining the processed sample data, the method may further include: and performing copying operation and deleting operation on the table in the sample data according to the row and column layout of the table to obtain the data-enhanced sample data.

It can be understood that in the target detection task, a network with stronger generalization capability can be trained by the large data set, and a model with higher recognition accuracy can be obtained, so that the method can be better suitable for application scenarios. Because the cost of manpower labeling is high, the related technology usually adopts data enhancement to expand the data set, so as to improve the accuracy of the model, and the standard data enhancement method comprises rotation, scaling, contrast increase and the like. However, for tabular data sets, standard data enhancement does not take into account the layout of rows and columns in the table, and cannot produce representative data, even resulting in reduced model performance. Therefore, for the problem that the standard data enhancement technology is ineffective for the table data set, a table-based data enhancement method can be adopted, as shown in fig. 5, fig. 5 is a schematic diagram of the table data enhancement method provided by the present application, and the method can copy and delete rows and columns of the table, so that not only can the local structure of the table be changed, but also the overall layout of the table can be maintained, and a large amount of effective augmentation data can be obtained.

Of course, the data enhancement mode of the copy operation and the delete operation is only one implementation form provided in the embodiment of the present application, and is not unique, and may also be implemented in other modes, and the expansion of the sample data may be implemented, which is not limited in the present application.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a table positioning device provided in the present application, where the table positioning device may include:

the system comprises a sample acquisition module 1, a form information acquisition module and a form information acquisition module, wherein the sample acquisition module is used for acquiring sample data containing form information;

the sample balancing module 2 is used for carrying out positive and negative sample balancing processing on the sample data to obtain the processed sample data;

the model training module 3 is used for training by utilizing the processed sample data to obtain a table positioning model;

the table positioning module 4 is used for carrying out table positioning on the document to be processed by utilizing a table positioning model to obtain each table prediction frame;

and the prediction box screening module 5 is used for screening all the table prediction boxes by utilizing a Confluence algorithm based on cross-over ratio to obtain an optimal table prediction box.

Therefore, the form positioning device provided by the embodiment of the application performs positive and negative sample balanced processing on sample data after obtaining a large amount of sample data to obtain high-quality sample data with the positive and negative sample balanced, performs model training by using the high-quality sample data to obtain a form positioning model with high precision, and therefore, in the form positioning process, the form positioning model can be directly used for positioning each form prediction box from a document to be processed, and finally, the Confluence algorithm based on cross-over ratio is used for screening all form prediction boxes to obtain the optimal form prediction box, so that the optimal result is screened from the model prediction result, the form positioning based on the neural network model is further realized, and the positioning precision of the document form is effectively improved.

In an embodiment of the present application, the sample equalization module 2 may be specifically configured to generate each preset anchor frame according to sample data, and calculate an intersection ratio between each preset anchor frame and a real frame; calculating a first cross-over ratio threshold according to a positive sample selection formula, wherein the positive sample selection formula is as follows:

IoU₁＝m+ln(1-v)；

wherein, IoU₁The intersection ratio is a first intersection ratio threshold value, m represents the intersection ratio mean value of the real frame and the preset anchor frame, and v represents the intersection ratio difference of the real frame and the preset anchor frame;

taking a preset anchor frame with the intersection ratio exceeding a first intersection ratio threshold value as a positive sample; and taking a preset anchor frame with the intersection ratio not exceeding the first intersection ratio threshold value as a negative sample.

In an embodiment of the present application, the sample equalization module 2 may be further configured to determine each prediction box according to sample data; and taking the prediction frames with the preset number which are closest to the center point of the real frame as positive samples.

In an embodiment of the present application, the sample equalization module 2 may be further configured to delete negative samples having a cross ratio exceeding the second cross ratio threshold from all negative samples after the preset anchor frame having a cross ratio not exceeding the first cross ratio threshold is used as the negative sample.

In an embodiment of the present application, the prediction box filtering module 5 may be specifically configured to generate a first prediction box set based on all table prediction boxes; selecting any table prediction frame from the first prediction frame set as a first table prediction frame, calculating the intersection ratio of the first table prediction frame and other table prediction frames, and deleting the other table prediction frames with the intersection ratio of zero and the first table prediction frame to obtain a second prediction frame set; calculating the weighted intersection ratio of the confidence degrees of the first form prediction frame and each form prediction frame in the second prediction frame set, and taking the form prediction frame with the weighted intersection ratio of the confidence degrees exceeding a first threshold value as an optimal form prediction frame; calculating the intersection ratio of the optimal table prediction frame and other table prediction frames in the second prediction frame set, and deleting other table prediction frames with the intersection ratio exceeding a second threshold value to obtain a third prediction frame set; and deleting the first table prediction frame from the first prediction frame set, and returning to the step of selecting any table prediction frame in the first prediction frame set as the first table prediction frame until the first prediction frame set is empty, so as to obtain all the optimal table prediction frames.

In an embodiment of the present application, the model training module 3 may be specifically configured to combine an inversion algorithm and a single-stage feature detector to construct an initial network model; performing model training on the initial network model by using the processed sample data to obtain a table positioning model with a preset evaluation index lower than a preset threshold; the preset evaluation index is specifically the alignment accuracy of the real frame and the prediction frame.

In an embodiment of the application, the table locating device may further include a sample enhancement module, configured to, before the positive and negative sample equalization processing is performed on the sample data to obtain the processed sample data, perform a copy operation and a delete operation on the table in the sample data according to the row and column layout of the table to obtain the data-enhanced sample data.

For the introduction of the apparatus provided in the present application, please refer to the method embodiments described above, which are not described herein again.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a form positioning apparatus provided in the present application, where the form positioning apparatus may include:

a memory for storing a computer program;

a processor, configured to execute a computer program, may implement the steps of any of the above table locating methods.

As shown in fig. 7, which is a schematic diagram of a composition structure of the form locating apparatus, the form locating apparatus may include: a processor 10, a memory 11, a communication interface 12 and a communication bus 13. The processor 10, the memory 11 and the communication interface 12 all communicate with each other through a communication bus 13.

In the embodiment of the present application, the processor 10 may be a Central Processing Unit (CPU), an application specific integrated circuit, a digital signal processor, a field programmable gate array or other programmable logic device, etc.

The processor 10 may call a program stored in the memory 11, and in particular, the processor 10 may perform operations in embodiments of the table locating method.

The memory 11 is used for storing one or more programs, the program may include program codes, the program codes include computer operation instructions, in this embodiment, the memory 11 stores at least the program for implementing the following functions:

acquiring sample data containing table information;

carrying out positive and negative sample equalization processing on the sample data to obtain the processed sample data;

carrying out table positioning on the document to be processed by using a table positioning model to obtain each table prediction frame;

In one possible implementation, the memory 11 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created during use.

Further, the memory 11 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid state storage device.

The communication interface 12 may be an interface of a communication module for connecting with other devices or systems.

Of course, it should be noted that the structure shown in fig. 7 does not constitute a limitation of the table positioning device in the embodiment of the present application, and the table positioning device may include more or less components than those shown in fig. 7 or some components in combination in practical applications.

The present application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, performs the steps of any of the above-described table locating methods.

The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

For the introduction of the computer-readable storage medium provided in the present application, please refer to the above method embodiments, which are not described herein again.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components and steps of the various examples have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The technical solutions provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, without departing from the principle of the present application, several improvements and modifications can be made to the present application, and these improvements and modifications also fall into the protection scope of the present application.

Claims

1. A method for locating a form, comprising:

acquiring sample data containing table information;

and screening all the table prediction boxes by using a Confluent algorithm based on cross-over ratio to obtain an optimal table prediction box.

2. The form locating method of claim 1, wherein the performing positive and negative sample equalization processing on the sample data to obtain processed sample data comprises:

IoU₁＝m+ln(1-v)；

and taking a preset anchor frame with the intersection ratio not exceeding the first intersection ratio threshold value as a negative sample.

3. The form locating method of claim 2, further comprising:

determining each prediction box according to the sample data;

4. The method of claim 2, wherein the step of determining the default anchor frame that the intersection ratio does not exceed the first intersection ratio threshold as a negative sample further comprises:

and deleting the negative samples of which the intersection ratio exceeds a second intersection ratio threshold value from all the negative samples.

5. The form locating method according to claim 1, wherein the screening among all the form prediction boxes by using a Confluent algorithm based on cross-over ratio to obtain an optimal form prediction box comprises:

6. The method according to claim 1, wherein training to obtain a table positioning model using the processed sample data comprises:

7. The form locating method of claim 1, wherein before performing the positive and negative sample equalization processing on the sample data and obtaining the processed sample data, the method further comprises:

8. A form positioning apparatus, comprising:

9. A form locating apparatus, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the table locating method as claimed in any one of claims 1 to 7 when executing said computer program.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the table locating method according to one of the claims 1 to 7.