CN116052188A - Form detection method, form detection device, electronic equipment and storage medium - Google Patents
Form detection method, form detection device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN116052188A CN116052188A CN202310108917.2A CN202310108917A CN116052188A CN 116052188 A CN116052188 A CN 116052188A CN 202310108917 A CN202310108917 A CN 202310108917A CN 116052188 A CN116052188 A CN 116052188A
- Authority
- CN
- China
- Prior art keywords
- vertex
- corrected
- characteristic information
- image
- vertices
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 claims abstract description 31
- 238000012937 correction Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 7
- 238000012986 modification Methods 0.000 claims description 5
- 230000004048 modification Effects 0.000 claims description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 12
- 238000012545 processing Methods 0.000 abstract description 12
- 238000013135 deep learning Methods 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000001788 irregular Effects 0.000 description 3
- 238000012015 optical character recognition Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003924 mental process Effects 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Image Analysis (AREA)
Abstract
The disclosure provides a form detection method, a form detection device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, image processing and computer vision. The method comprises the following steps: extracting features of the form image to be detected to obtain form feature information; respectively carrying out parting line prediction and vertex prediction on the table image according to the table characteristic information to obtain parting lines and table vertices in the table image; correcting the table vertices in the table image according to the dividing lines in the table image to obtain corrected table vertices; and matching the corrected table vertex with the dividing line, and obtaining cell information in the table image according to the matching relation. By the technical scheme, the accuracy of table detection can be improved.
Description
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular to the technical fields of deep learning, image processing, and computer vision. And in particular, to a method and apparatus for detecting a table, an electronic device, and a storage medium.
Background
With the rapid development of artificial intelligence (Artificial Intelligence, AI), artificial intelligence has been widely used in the fields of computer vision technology, speech recognition technology, natural language processing technology, deep learning, big data processing technology, and the like.
Word recognition technology based on artificial intelligence has also been widely applied to different scenarios. The data organization mode of the table structure has more abundant structured knowledge, accurately extracts and restores the structured information of the table, and can provide more direct decision information for business application. Therefore, how to improve the table detection performance is important.
Disclosure of Invention
The disclosure provides a form detection method, a form detection device, electronic equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a form detection method including:
extracting features of the form image to be detected to obtain form feature information;
respectively carrying out parting line prediction and vertex prediction on the table image according to the table characteristic information to obtain parting lines and table vertices in the table image;
correcting the table vertices in the table image according to the dividing lines in the table image to obtain corrected table vertices;
and matching the corrected table vertex with the dividing line, and obtaining cell information in the table image according to the matching relation.
According to still another aspect of the present disclosure, there is provided a form detection apparatus including:
the feature extraction module is used for extracting features of the form image to be detected to obtain form feature information;
the table prediction module is used for respectively carrying out parting line prediction and vertex prediction on the table image according to the table characteristic information to obtain parting lines and table vertices in the table image;
the vertex correction module is used for correcting the table vertices in the table image according to the dividing lines in the table image to obtain corrected table vertices;
and the vertex matching module is used for matching the corrected table vertices with the dividing lines and obtaining cell information in the table image according to the matching relationship.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods provided by any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method provided by any of the embodiments of the present disclosure.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1a is a flow chart of a form detection method provided in accordance with an embodiment of the present disclosure;
FIG. 1b is a schematic diagram of a form detection process provided in accordance with an embodiment of the present disclosure;
FIG. 2 is a flow chart of another form detection method provided in accordance with an embodiment of the present disclosure;
FIG. 3 is a flow chart of yet another form detection method provided in accordance with an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a form detection apparatus according to an embodiment of the present disclosure;
fig. 5 is a block diagram of an electronic device used to implement a form detection method of an embodiment of the present disclosure.
Detailed Description
Fig. 1a is a flowchart of a table detection method provided according to an embodiment of the present disclosure. The method is suitable for the situation of carrying out content identification on the table in the image. The method may be performed by a form detection device, which may be implemented in software and/or hardware, and may be integrated in an electronic device. As shown in fig. 1a, the table detection method of the present embodiment may include:
s101, extracting features of a form image to be detected to obtain form feature information;
s102, respectively carrying out parting line prediction and vertex prediction on the table image according to the table characteristic information to obtain parting lines and table vertices in the table image;
s103, correcting the table vertices in the table image according to the dividing lines in the table image to obtain corrected table vertices;
and S104, matching the corrected table vertex with the dividing line, and obtaining cell information in the table image according to a matching relationship.
In the embodiment of the disclosure, the form image to be detected can be any document image with a form, the form type in the image can be wired, wireless, inclined or irregular, and the like, and the form image can be used for carrying out character recognition (Optical Character Recognition, OCR) on document image data with forms in different scenes.
Fig. 1b is a schematic diagram of a table detection process according to an embodiment of the present disclosure, and referring to fig. 1b, a table image to be detected may be input to a feature extractor to obtain table feature information. The method comprises the steps that table characteristic information is respectively input into a dividing line prediction module and a vertex prediction module, dividing lines in a table image are determined through the dividing line prediction module, the dividing lines can comprise row dividing lines or column dividing lines, and then the table image is divided according to the row dividing lines and the column dividing lines to obtain cells; position prediction is carried out on table vertices of table cells in the table image through a vertex prediction module, and pixel coordinates of the table vertices are obtained; and fusing the cell and the table vertex by a fusion module to obtain a table identification result. It should be noted that the position information of the table cells and the position information of the cells may include pixel coordinates and logic coordinates, where the logic coordinates are used to characterize the relative positions of the table cells and the cells, such as the number of rows and columns.
The feature extractor may be constructed based on a convolutional neural network (Convolutional Neural Networks, CNN) or a Transformer (Transformer) network. The split line prediction module can be constructed by adopting a full convolution network (Fully Convolutional Networks, FCN) and can also be realized by a splitting and merging method based on a table structure. The vertex prediction module may be constructed using regression methods such as fast-RCNN or Cascade target detection (Casade-RCNN) or may be constructed using full convolutional network (Fully Convolutional Networks, FCN) networks.
In the case of performing vertex prediction for a part of table types, particularly wireless tables, the pixel coordinates of the table vertices are easily mispredicted, so that a large error exists in the recognition result of the table; moreover, the logical coordinates of the table vertices are often obtained by post-processing, and the defect of lower accuracy exists. Although the table image can be more accurately divided by the dividing line prediction mode to obtain the logical coordinates of the cells in the table image, the logical coordinates of the cells have larger errors for the inclined table or the irregular table.
In the embodiment of the disclosure, the table vertices in the table image are corrected according to the dividing lines in the table image, that is to say, the accuracy of the position information of the table vertices can be improved by correcting the position information of the table vertices by taking the dividing lines as reference bases. And matching the corrected table vertex with the dividing line to obtain a matching relationship between the table vertex and the dividing line, namely, a subordinate relationship between the table vertex and the dividing line, and fusing the position information of the table vertex and the position information of the cell based on the subordinate relationship to obtain the logical coordinates of the cell and the pixel coordinates of the cell vertex as cell information. By correcting the table vertices by adopting the dividing lines in the table image, the accuracy of the position information of the table vertices can be improved, the accuracy of the matching relationship between the table vertices and the dividing lines is improved, the position information of the dividing lines in the table image and the position information of the table cells are fused based on the matching relationship to obtain cell information, and the accuracy of the cell information can be improved.
According to the technical scheme provided by the embodiment of the disclosure, the table vertices are corrected by adopting the dividing lines in the table image, so that the accuracy of the position information of the table vertices and the matching relationship between the table vertices and the dividing lines can be improved, the position information of the dividing lines and the position information of the table cells are fused based on the matching relationship to obtain the cell information, and the accuracy of the cell information can be improved, so that the accuracy of the table structure in a complex scene can be identified.
Fig. 2 is a flowchart of another form detection method provided in accordance with an embodiment of the present disclosure. Referring to fig. 2, the table detection method of the present embodiment may include:
s201, extracting features of a form image to be detected to obtain form feature information;
s202, respectively carrying out parting line prediction and vertex prediction on the table image according to the table characteristic information to obtain parting lines and table vertices in the table image;
s203, extracting characteristic information of a dividing line from the form characteristic information according to the dividing line in the form image;
s204, extracting characteristic information of table vertices from the table characteristic information according to the table vertices in the table image;
s205, correcting the characteristic information of the table vertex according to the characteristic information of the dividing line to obtain the characteristic information of the corrected table vertex;
s206, determining pixel coordinates of the corrected table vertex according to the characteristic information of the corrected table vertex;
s207, matching the corrected table vertex with the dividing line, and obtaining cell information in the table image according to the matching relation.
In the embodiment of the present disclosure, the feature information of the dividing line and the feature information of the table vertex may be determined based on the table feature information, respectively. For each dividing line in the form image, the average pixel coordinates of each point in the dividing line can be obtained by averaging the pixel coordinates of each point in the dividing line, and the characteristic information of the dividing line is extracted from the form characteristic information by adopting the average pixel coordinates of the dividing line; and combining the characteristic information of each parting line to obtain a combined result. For each table vertex in the table image, the pixel coordinates of the table vertex may be used to extract the feature information of the table vertex from the table feature information.
The combination result of the characteristic information of the dividing line can be used as a reference basis to correct the characteristic information of the table vertex, so as to obtain the corrected characteristic information of the table vertex; and determining the pixel coordinates of the corrected table vertex according to the characteristic information of the corrected table vertex, for example, the characteristic information of the corrected table vertex can be input into the first regression unit to obtain the pixel coordinates of the corrected table vertex. By respectively determining the characteristic information of the dividing line and the characteristic information of the table vertex and correcting the characteristic information of the table vertex by taking the characteristic information of the dividing line as a reference basis, the accuracy of the characteristic expression of the table vertex can be improved, and the accuracy of the pixel coordinates of the table vertex is improved.
In an optional implementation manner, the correcting the characteristic information of the table vertex according to the characteristic information of the dividing line to obtain the corrected characteristic information of the table vertex includes: the characteristic information of the dividing line is respectively used as a key vector and a key value vector in the attention model, and the characteristic information of the table vertex is used as a query vector in the attention model to obtain an output vector of the attention model; and obtaining the characteristic information of the corrected table vertex according to the output vector of the attention model.
In the embodiment of the disclosure, the combined result of the feature information of the dividing line may be used as a key (key) vector and a key value (value) vector of the attention model, the feature information of the table vertex may be used as a query (query) vector of the attention model, an output vector of the attention model may be obtained, and the feature information of the corrected table vertex may be obtained according to the output vector. The attention model can be obtained by performing supervised training in advance by adopting a table image comprising parting line marking information and table vertex marking information based on a transducer structure.
Specifically, taking an example that M row dividing lines, N column dividing lines and K table vertices are provided, and single feature information is D dimension, the combination result of the dividing line feature information is (m+n) x D dimension, and the feature information of the table vertices is K x D dimension, wherein M, N, K, D is a natural number. The (m+n) ×d-dimensional split line feature information may be used as a key (key) vector and a key value (value) vector of the attention model, and the k×d-dimensional table vertex feature information may be used as a query (query) vector of the attention model, to obtain an output vector k×d 'of the attention model, and the output vector D' may be used as the corrected table vertex feature information. In the correction process of the table vertex, the attention distribution of the characteristic information of the table vertex is learned based on an attention mechanism, the corrected characteristic information of the table vertex is determined based on the attention distribution, and the accuracy correction of the characteristic information of the table vertex is realized, so that the accuracy of the table recognition of the candidate according to the corrected characteristic information of the table vertex can be improved.
In an optional embodiment, after obtaining the feature information of the modified table vertex, the method further includes: determining the confidence coefficient of the corrected table vertex according to the characteristic information of the corrected table vertex; and filtering the corrected table vertex according to the confidence.
After the characteristic information of the table vertex is corrected, vertex prediction can be carried out again according to the characteristic information of the corrected table vertex to obtain the confidence coefficient of the corrected table vertex, for example, the characteristic information of the corrected table vertex can be input into a second regression unit to obtain the confidence coefficient of the corrected table vertex; form vertices with confidence below the confidence threshold are filtered out, leaving only form vertices with confidence equal to or above the confidence threshold.
For example, in the case of having X table vertices in the table, Y table vertices may be retained according to the vertex prediction result, and after the Y table vertices are corrected, the Y table vertices may be filtered according to the feature information of the corrected table vertices. Wherein Y is a natural number greater than X. After the characteristic information of the table vertex is corrected, the table vertex is filtered according to the corrected characteristic information of the table vertex, so that the accuracy of table vertex prediction can be further improved, and the accuracy of a fusion result between the follow-up table vertex and the dividing line is improved.
According to the technical scheme provided by the embodiment of the disclosure, the characteristic information of the dividing line and the characteristic information of the table vertex are respectively determined according to the table characteristic information; correcting the characteristic information of the peak of the table by taking the characteristic information of the parting line as a reference; and the pixel coordinates of the table vertices are redetermined and the table vertices are filtered according to the corrected characteristic information of the table vertices, so that the accuracy of the table structure identification can be further improved.
Fig. 3 is a flow chart of yet another form detection method provided in accordance with an embodiment of the present disclosure. This embodiment is an alternative to the embodiments described above. Referring to fig. 3, the table detection method of the present embodiment may include:
s301, extracting features of a form image to be detected to obtain form feature information;
s302, carrying out parting line prediction and vertex prediction on the table image according to the table characteristic information to obtain parting lines and table vertices in the table image;
s303, correcting the table vertices in the table image according to the dividing lines in the table image to obtain corrected table vertices;
s304, dividing the table image according to the dividing line to obtain logical coordinates of cells and cell vertexes in the table image;
s305, matching the corrected table vertex with the dividing line according to the characteristic information of the corrected table vertex and the characteristic information of the dividing line to obtain the logic coordinates of the corrected table vertex;
s306, determining the cell vertex matched with the modified table vertex according to the logical coordinates of the cell vertex and the logical coordinates of the modified table vertex, and taking the pixel coordinates of the modified table vertex as the pixel coordinates of the matched cell vertex.
In the embodiment of the disclosure, the table image can be divided into cells according to the row dividing lines and the column dividing lines in the table image to obtain the logical coordinates of the cells and the cell vertices; the characteristic information of the dividing line and the characteristic information of the table vertex can be respectively determined based on the table characteristic information; and correcting the characteristic information of the table vertex according to the characteristic information of the dividing line to obtain the characteristic information of the corrected table vertex. After the form image is divided according to the dividing line to obtain the cells, the cells may be combined according to the overlapping relationship between adjacent cells, the feature information, and the like, so as to improve the accuracy of dividing the cells.
For each corrected table vertex, the characteristic information of the corrected table vertex can be respectively matched with the characteristic information of the row dividing line and the characteristic information of the column dividing line, and the logic coordinates of the corrected table vertex can be obtained according to the matching relation; if the logical coordinates of any corrected table vertex are the same as the logical coordinates of any cell vertex, the pixel coordinates of the corrected table vertex are used as the pixel coordinates of the cell vertex. The logic coordinates of the corrected table vertex and the logic coordinates of the cell vertex are respectively determined, the matching relation between the corrected table vertex and the cell vertex is obtained based on the logic coordinates, and the pixel coordinates of the corrected table vertex are used as the pixel coordinates of the matched cell vertex, so that the prediction accuracy of the cell vertex can be improved, and the accuracy of the cell vertex position can be improved especially for an inclined table and an irregular table.
In an optional implementation manner, the matching the corrected table vertex and the dividing line according to the characteristic information of the corrected table vertex and the characteristic information of the dividing line to obtain the logic coordinates of the corrected table vertex includes: determining a row dividing line matched with the corrected table vertex and a column dividing line matched with the corrected table vertex according to the similarity between the characteristic information of the corrected table vertex and the characteristic information of each dividing line; and determining the logic coordinates of the corrected table vertex according to the serial numbers of the row dividing lines and the column dividing lines.
For each table vertex, determining the line similarity between the characteristic information of the table vertex and the characteristic information of each line dividing line, and selecting the line dividing line to which the table vertex belongs according to the line similarity; determining the column similarity between the characteristic information of the table vertex and the characteristic information of each column of dividing lines, and selecting the column dividing line to which the table vertex belongs according to the column similarity; and obtaining the logic coordinates of the table vertex according to the row dividing line and the column dividing line of the table vertex. The similarity between the characteristic information of the corrected table vertex and the characteristic information of each dividing line is respectively determined, so that the row dividing line and the column dividing line to which the corrected table vertex belongs are obtained, and the logic coordinates of the table vertex are obtained according to the row dividing line and the column dividing line to which the corrected table vertex belongs, so that the accuracy of the logic coordinates of the table vertex can be improved, and the accuracy of the matching relationship between the table vertex and the cell vertex is further improved.
According to the technical scheme provided by the embodiment of the disclosure, the logic coordinates of the cell vertexes and the logic coordinates of the corrected table vertexes are respectively determined, the matching relation between the cell vertexes and the corrected table vertexes is obtained based on the logic coordinates of the two, and the pixel coordinates of the corrected table vertexes are used as the pixel coordinates of the corresponding cell vertexes, so that the accuracy of the pixel coordinates of the cells can be improved, and the table structure in a complex scene can be accurately identified.
Fig. 4 is a schematic structural diagram of a table detection device according to an embodiment of the present disclosure. The embodiment is suitable for the case of content recognition of a table in an image. The apparatus may be implemented in software and/or hardware. As shown in fig. 4, the table detection apparatus 400 of the present embodiment may include:
the feature extraction module 410 is configured to perform feature extraction on a form image to be detected to obtain form feature information;
the table prediction module 420 is configured to perform parting line prediction and vertex prediction on the table image according to the table feature information, so as to obtain a parting line and a table vertex in the table image;
the vertex correction module 430 is configured to correct a table vertex in the table image according to the dividing line in the table image, so as to obtain a corrected table vertex;
and the vertex matching module 440 is configured to match the modified table vertex with the dividing line, and obtain cell information in the table image according to the matching relationship.
In an alternative embodiment, the vertex correction module 430 includes:
a line feature extraction unit, configured to extract feature information of a division line from the form feature information according to the division line in the form image;
a point feature extraction unit, configured to extract feature information of table vertices from the table feature information according to the table vertices in the table image;
the point characteristic correction unit is used for correcting the characteristic information of the table vertex according to the characteristic information of the dividing line to obtain the characteristic information of the corrected table vertex;
and the point coordinate correction unit is used for determining the pixel coordinates of the corrected table vertex according to the characteristic information of the corrected table vertex.
In an alternative embodiment, the point feature correction unit includes:
the attention subunit is used for taking the characteristic information of the dividing line as a key vector and a key value vector in the attention model respectively, and taking the characteristic information of the table vertex as a query vector in the attention model to obtain an output vector of the attention model;
and the point characteristic correction subunit is used for obtaining the characteristic information of the corrected table vertex according to the output vector of the attention model.
In an alternative embodiment, the vertex modification module 430 further includes a vertex filtering unit, the vertex filtering unit including:
a confidence coefficient subunit, configured to determine a confidence coefficient of the modified table vertex according to the feature information of the modified table vertex;
and the vertex correction subunit is used for filtering the corrected table vertices according to the confidence coefficient.
In an alternative embodiment, the vertex matching module 440 includes:
the table dividing unit is used for dividing the table image according to the dividing line to obtain logical coordinates of cells and cell vertexes in the table image;
the table matching unit is used for matching the corrected table vertex with the dividing line according to the characteristic information of the corrected table vertex and the characteristic information of the dividing line to obtain the logic coordinates of the corrected table vertex;
and the pixel coordinate unit is used for determining the cell vertex matched with the modified table vertex according to the logical coordinates of the cell vertex and the logical coordinates of the modified table vertex, and taking the pixel coordinates of the modified table vertex as the pixel coordinates of the matched cell vertex.
In an alternative embodiment, the table matching unit includes:
a similarity subunit, configured to determine, according to the similarity between the feature information of the modified table vertex and the feature information of each parting line, a row parting line that matches the modified table vertex and a column parting line that matches the modified table vertex;
and the table matching subunit is used for determining the logic coordinates of the corrected table vertex according to the serial numbers of the row dividing lines and the column dividing lines.
According to the technical scheme, the table vertices are corrected by adopting the dividing lines in the table image, so that the position accuracy of the table vertices can be improved, the corrected table vertices are matched with the dividing lines, the position information of the corrected table vertices and the position information of the cells are fused according to the matching relation, the accuracy of the position information of the cells can be improved, and therefore, the table structure in a complex scene can be accurately identified, and the method is suitable for various OCR identification scenes such as finance, verification, medical treatment, insurance, office and government affairs.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 5 is a block diagram of an electronic device used to implement a form detection method of an embodiment of the present disclosure.
Fig. 5 illustrates a schematic block diagram of an example electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the electronic device 500 includes a computing unit 501 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic device 500 may also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in electronic device 500 are connected to I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 501 performs the respective methods and processes described above, such as a table detection method. For example, in some embodiments, the form detection method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the table detection method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the form detection method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligent software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.
Cloud computing (cloud computing) refers to a technical system that a shared physical or virtual resource pool which is elastically extensible is accessed through a network, resources can comprise servers, operating systems, networks, software, applications, storage devices and the like, and resources can be deployed and managed in an on-demand and self-service mode. Through cloud computing technology, high-efficiency and powerful data processing capability can be provided for technical application such as artificial intelligence and blockchain, and model training.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.
Claims (15)
1. A form detection method, comprising:
extracting features of the form image to be detected to obtain form feature information;
respectively carrying out parting line prediction and vertex prediction on the table image according to the table characteristic information to obtain parting lines and table vertices in the table image;
correcting the table vertices in the table image according to the dividing lines in the table image to obtain corrected table vertices;
and matching the corrected table vertex with the dividing line, and obtaining cell information in the table image according to the matching relation.
2. The method of claim 1, wherein the modifying the table vertices in the table image according to the dividing line in the table image to obtain modified table vertices comprises:
extracting feature information of a dividing line from the form feature information according to the dividing line in the form image;
extracting characteristic information of table vertices from the table characteristic information according to the table vertices in the table image;
correcting the characteristic information of the table vertex according to the characteristic information of the dividing line to obtain the characteristic information of the corrected table vertex;
and determining pixel coordinates of the corrected table vertex according to the characteristic information of the corrected table vertex.
3. The method of claim 2, wherein the correcting the characteristic information of the table vertex according to the characteristic information of the dividing line to obtain the corrected characteristic information of the table vertex includes:
the characteristic information of the dividing line is respectively used as a key vector and a key value vector in the attention model, and the characteristic information of the table vertex is used as a query vector in the attention model to obtain an output vector of the attention model;
and obtaining the characteristic information of the corrected table vertex according to the output vector of the attention model.
4. The method of claim 2, further comprising, after obtaining the feature information of the modified table vertex:
determining the confidence coefficient of the corrected table vertex according to the characteristic information of the corrected table vertex;
and filtering the corrected table vertex according to the confidence.
5. The method according to any one of claims 1-4, wherein the matching the modified table vertices with the dividing line, and obtaining cell information in the table image according to a matching relationship, includes:
dividing the table image according to the dividing line to obtain logical coordinates of cells and cell vertexes in the table image;
matching the corrected table vertex with the dividing line according to the characteristic information of the corrected table vertex and the characteristic information of the dividing line to obtain the logic coordinates of the corrected table vertex;
and determining the cell vertex matched with the modified table vertex according to the logical coordinates of the cell vertex and the logical coordinates of the modified table vertex, and taking the pixel coordinates of the modified table vertex as the pixel coordinates of the matched cell vertex.
6. The method of claim 5, wherein the matching the corrected table vertex and the dividing line according to the characteristic information of the corrected table vertex and the characteristic information of the dividing line to obtain the logical coordinates of the corrected table vertex, comprises:
determining a row dividing line matched with the corrected table vertex and a column dividing line matched with the corrected table vertex according to the similarity between the characteristic information of the corrected table vertex and the characteristic information of each dividing line;
and determining the logic coordinates of the corrected table vertex according to the serial numbers of the row dividing lines and the column dividing lines.
7. A form detection apparatus comprising:
the feature extraction module is used for extracting features of the form image to be detected to obtain form feature information;
the table prediction module is used for respectively carrying out parting line prediction and vertex prediction on the table image according to the table characteristic information to obtain parting lines and table vertices in the table image;
the vertex correction module is used for correcting the table vertices in the table image according to the dividing lines in the table image to obtain corrected table vertices;
and the vertex matching module is used for matching the corrected table vertices with the dividing lines and obtaining cell information in the table image according to the matching relationship.
8. The apparatus of claim 7, wherein the vertex modification module comprises:
a line feature extraction unit, configured to extract feature information of a division line from the form feature information according to the division line in the form image;
a point feature extraction unit, configured to extract feature information of table vertices from the table feature information according to the table vertices in the table image;
the point characteristic correction unit is used for correcting the characteristic information of the table vertex according to the characteristic information of the dividing line to obtain the characteristic information of the corrected table vertex;
and the point coordinate correction unit is used for determining the pixel coordinates of the corrected table vertex according to the characteristic information of the corrected table vertex.
9. The apparatus of claim 8, wherein the point feature correction unit comprises:
the attention subunit is used for taking the characteristic information of the dividing line as a key vector and a key value vector in the attention model respectively, and taking the characteristic information of the table vertex as a query vector in the attention model to obtain an output vector of the attention model;
and the point characteristic correction subunit is used for obtaining the characteristic information of the corrected table vertex according to the output vector of the attention model.
10. The apparatus of claim 8, the vertex modification module further comprising a vertex filtering unit comprising:
a confidence coefficient subunit, configured to determine a confidence coefficient of the modified table vertex according to the feature information of the modified table vertex;
and the vertex correction subunit is used for filtering the corrected table vertices according to the confidence coefficient.
11. The apparatus of any of claims 7-10, wherein the vertex matching module comprises:
the table dividing unit is used for dividing the table image according to the dividing line to obtain logical coordinates of cells and cell vertexes in the table image;
the table matching unit is used for matching the corrected table vertex with the dividing line according to the characteristic information of the corrected table vertex and the characteristic information of the dividing line to obtain the logic coordinates of the corrected table vertex;
and the pixel coordinate unit is used for determining the cell vertex matched with the modified table vertex according to the logical coordinates of the cell vertex and the logical coordinates of the modified table vertex, and taking the pixel coordinates of the modified table vertex as the pixel coordinates of the matched cell vertex.
12. The apparatus of claim 11, wherein the table matching unit comprises:
a similarity subunit, configured to determine, according to the similarity between the feature information of the modified table vertex and the feature information of each parting line, a row parting line that matches the modified table vertex and a column parting line that matches the modified table vertex;
and the table matching subunit is used for determining the logic coordinates of the corrected table vertex according to the serial numbers of the row dividing lines and the column dividing lines.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-6.
15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310108917.2A CN116052188A (en) | 2023-01-31 | 2023-01-31 | Form detection method, form detection device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310108917.2A CN116052188A (en) | 2023-01-31 | 2023-01-31 | Form detection method, form detection device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116052188A true CN116052188A (en) | 2023-05-02 |
Family
ID=86121900
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310108917.2A Pending CN116052188A (en) | 2023-01-31 | 2023-01-31 | Form detection method, form detection device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116052188A (en) |
-
2023
- 2023-01-31 CN CN202310108917.2A patent/CN116052188A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112966522A (en) | Image classification method and device, electronic equipment and storage medium | |
CN112597837B (en) | Image detection method, apparatus, device, storage medium, and computer program product | |
CN113780098B (en) | Character recognition method, character recognition device, electronic equipment and storage medium | |
EP3955216A2 (en) | Method and apparatus for recognizing image, electronic device and storage medium | |
CN113657483A (en) | Model training method, target detection method, device, equipment and storage medium | |
CN113657395B (en) | Text recognition method, training method and device for visual feature extraction model | |
CN113887615A (en) | Image processing method, apparatus, device and medium | |
CN113205041A (en) | Structured information extraction method, device, equipment and storage medium | |
CN114218931A (en) | Information extraction method and device, electronic equipment and readable storage medium | |
CN113963197A (en) | Image recognition method and device, electronic equipment and readable storage medium | |
US11610396B2 (en) | Logo picture processing method, apparatus, device and medium | |
CN114495101A (en) | Text detection method, and training method and device of text detection network | |
CN114187488B (en) | Image processing method, device, equipment and medium | |
CN113010721B (en) | Picture auditing method and device, electronic equipment and storage medium | |
CN114973333A (en) | Human interaction detection method, human interaction detection device, human interaction detection equipment and storage medium | |
JP2022165925A (en) | Data labeling method, device, electronic apparatus, and readable storage medium | |
CN114708580A (en) | Text recognition method, model training method, device, apparatus, storage medium, and program | |
CN116052188A (en) | Form detection method, form detection device, electronic equipment and storage medium | |
CN113947195A (en) | Model determination method and device, electronic equipment and memory | |
CN113903071A (en) | Face recognition method and device, electronic equipment and storage medium | |
CN113379592A (en) | Method and device for processing sensitive area in picture and electronic equipment | |
CN114330576A (en) | Model processing method and device, and image recognition method and device | |
CN114254650A (en) | Information processing method, device, equipment and medium | |
CN113591567A (en) | Target detection method, training method of target detection model and device thereof | |
CN113887394A (en) | Image processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |