CN113837151B - Table image processing method and device, computer equipment and readable storage medium - Google Patents

Table image processing method and device, computer equipment and readable storage medium Download PDF

Info

Publication number
CN113837151B
CN113837151B CN202111409116.7A CN202111409116A CN113837151B CN 113837151 B CN113837151 B CN 113837151B CN 202111409116 A CN202111409116 A CN 202111409116A CN 113837151 B CN113837151 B CN 113837151B
Authority
CN
China
Prior art keywords
matched
selection
text block
image
cell
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111409116.7A
Other languages
Chinese (zh)
Other versions
CN113837151A (en
Inventor
王旭伟
张良友
余家林
成帆
孙志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hundsun Technologies Inc
Original Assignee
Hundsun Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hundsun Technologies Inc filed Critical Hundsun Technologies Inc
Priority to CN202111409116.7A priority Critical patent/CN113837151B/en
Publication of CN113837151A publication Critical patent/CN113837151A/en
Application granted granted Critical
Publication of CN113837151B publication Critical patent/CN113837151B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Character Input (AREA)

Abstract

The invention provides a form image processing method, a form image processing device, computer equipment and a readable storage medium, wherein the method comprises the following steps: acquiring a form image to be processed; identifying the form image to be processed to obtain the position information of all the text blocks and the position information of all the selected identifiers; aiming at a cell image to be matched in a table image to be processed, constructing a distance matrix corresponding to the cell image to be matched according to the position information and the text block number of a text block in the cell image to be matched, and the position information and the selection identifier number of a selection identifier; matching a target text block for the selection identifier in the cell image to be matched based on the distance matrix corresponding to the cell image to be matched; the invention provides an effective solution for one-to-one matching of the selection identifier and the text block, provides a basis for table reconstruction of the text block for subsequently extracting the selection identifier, and improves the efficiency and accuracy of table image processing.

Description

Table image processing method and device, computer equipment and readable storage medium
Technical Field
The invention relates to the technical field of image processing, in particular to a form image processing method, a form image processing device, computer equipment and a readable storage medium.
Background
Form data is widely existed in various industries, and with the continuous popularization of Optical Character Recognition (OCR) technology, the demand for form Recognition is more and more, and the useful information in the form is intelligently recognized and extracted, so that the labor cost can be reduced, and the Recognition efficiency can be improved.
At present, although a form recognition method proposed by the related art recognizes text blocks in a form, a large amount of check information is often present in the form, and the check information is really useful information, however, since check marks corresponding to the check information have various forms, the recognition of the check marks and the matching of the text blocks are a difficulty at present, and therefore, how to recognize the check marks in the form and match the text blocks is a problem to be solved.
Disclosure of Invention
One of the objectives of the present invention is to provide a method and an apparatus for processing a table image, a computer device, and a readable storage medium, which can solve the problem of one-to-one matching between a selection identifier and a text block, thereby providing a basis for subsequently extracting the text block with the selection identifier to perform table reconstruction, and improving efficiency and accuracy of table image processing.
In a first aspect, the present invention provides a form image processing method, comprising: acquiring a form image to be processed; identifying the form image to be processed to obtain the position information of all text blocks and the position information of all selected identifications; aiming at a cell image to be matched in the table image to be processed, constructing a distance matrix corresponding to the cell image to be matched according to the position information and the text block number of a text block in the cell image to be matched, and the position information and the selection identifier number of a selection identifier; based on the distance matrix corresponding to the cell image to be matched, identifying a matching target text block for the selection in the cell image to be matched; the cell image to be matched is a cell image with a text block and a selection identifier existing at the same time; the target text block is one of all the text blocks in the cell image to be matched.
Optionally, the method further comprises: determining a target selection identifier from all the selection identifiers; determining a target text block corresponding to the target selection identifier as a text block to be extracted; and reconstructing the table image to be processed based on the text block to be extracted and the text blocks except the target text block.
Optionally, recognizing the to-be-processed form image to obtain location information of all text blocks and location information of all selection identifiers, including: cell recognition is carried out on the table image to be processed, and the cell image corresponding to each cell is determined; performing text block detection and selection identifier detection on the target cell image, and determining the position information of the text block in the target cell image, or determining the position information of the text block and the selection identifier in the target cell image; wherein the target cell image is any one of all the cell images; and traversing all the cell images to obtain the position information of all the text blocks and the position information of all the selection marks.
Optionally, identifying a matching target text block for selection in the cell image to be matched based on the distance matrix corresponding to the cell image to be matched, including: determining the target text block matched with each selection identifier in the first cell image to be matched according to the distance matrix corresponding to the first cell image to be matched; the first cell image to be matched is any one of all the cell images to be matched; and traversing all the cell images to be matched to obtain the target text blocks matched with all the selection identifications.
Optionally, before the step of constructing, for a to-be-matched cell image in the to-be-processed table image, a distance matrix corresponding to the to-be-matched cell image according to the position information and the text block number of the text block in the to-be-matched cell image, and the position information and the selection identification number of the selection identification, the method further includes: numbering the text blocks and the selection marks in the cell images to be matched according to a preset sequence; and the preset sequence is the horizontal direction or the vertical direction of the cell images to be matched.
Optionally, for a to-be-matched cell image in the to-be-processed table image, constructing a distance matrix corresponding to the to-be-matched cell image according to the position information and the text block number of the text block in the to-be-matched cell image, and the position information and the selection identifier number of the selection identifier, including: determining the maximum limit horizontal coordinate and the minimum limit horizontal coordinate of each text block according to the position information of all characters in the position information of the text block in the cell image to be matched; calculating the distance between each selection mark and each text block in the cell image to be matched based on the position information of the selection mark, the position information of the text block, the maximum limit horizontal coordinate, the minimum limit horizontal coordinate and a predetermined line feed coefficient in the cell image to be matched; and forming a distance matrix corresponding to the cell image to be matched by all the distances according to the text block number and the selection identification number in the cell image to be matched.
Optionally, the line feed coefficient is predetermined by: determining a preset threshold value as the line feed coefficient; or determining the maximum limit horizontal coordinate and the minimum limit horizontal coordinate of the cell image to be matched according to the maximum limit horizontal coordinate and the minimum limit horizontal coordinate of each text block; and calculating the line feed coefficient according to the maximum limit coordinate and the minimum limit coordinate of the cell image to be matched.
Optionally, determining a target selection identifier from the all selection identifiers includes: identifying each selection mark in the cell image to be matched; determining the target selection identification in the cell image to be matched according to the recognition result; and traversing all the cell images to be matched, and determining all the target selection identifiers.
Optionally, determining the target selection identifier in the cell image to be matched according to the recognition result includes: if the selection marks are identified to be in an unmarked state, determining each selection mark as the target selection mark; and if at least one selected identifier is identified to be in a marked state, determining the target selected identifier from each selected identifier according to the identified identifier category.
Optionally, if it is recognized that at least one selection identifier exists in a marked state, determining the target selection identifier from each selection identifier according to the recognized identifier category includes: if only the preset information exclusion category exists, determining the selection identifier with the unmarked state as the target selection identifier; wherein the information exclusion category represents that the target text block corresponding to the selection identifier is not selected; if the information exclusion category does not exist, determining the selection identifier corresponding to the marked state as a target selection identifier; and if the information exclusion category and the identification category except the information exclusion category exist at the same time, determining the selection identification corresponding to the identification category except the information exclusion category as the target selection identification.
Optionally, after acquiring the table image to be processed, the method further includes: and preprocessing the table image to be processed.
In a second aspect, the present invention provides a form image processing apparatus comprising: the acquisition module is used for acquiring a form image to be processed; the recognition module is used for recognizing the table image to be processed to obtain the position information of all the text blocks and the position information of all the selection marks; the building module is used for building a distance matrix corresponding to the cell image to be matched according to the position information and the text block number of the text block in the cell image to be matched and the position information and the selection identification number of the selection identification aiming at the cell image to be matched in the table image to be processed; the matching module is used for identifying and matching a target text block for selection in the cell image to be matched based on the distance matrix corresponding to the cell image to be matched; the cell image to be matched is a cell image with a text block and a selection identifier existing at the same time; the target text block is one of all the text blocks in the cell image to be matched.
Optionally, the form image processing apparatus further includes: a determining module, configured to determine a target selection identifier from all the selection identifiers; determining the target text block matched with the target selection identifier as a text block to be extracted; and the reconstruction module is used for reconstructing the table image to be processed based on the text block to be extracted and the unmatched text block.
Optionally, the identification module further includes: the determining unit is used for carrying out cell identification on the table image to be processed and determining the cell image corresponding to each cell; the detection unit is used for detecting a text block and a selection mark of the target cell image, and determining the position information of the text block in the target cell image, or determining the position information of the text block and the selection mark in the target cell image; wherein the target cell image is any one of all the cell images; and the acquisition unit is used for traversing all the cell images to acquire the position information of all the text blocks and the position information of all the selection marks.
Optionally, the matching module includes: the determining unit is used for determining the target text block matched with each selection identifier in the first cell image to be matched according to the distance matrix corresponding to the first cell image to be matched; the first cell image to be matched is any one of all the cell images to be matched; and the obtaining unit is used for traversing all the cell images to be matched to obtain the target text blocks matched with all the selection identifications.
Optionally, the system further comprises a numbering module, configured to: numbering the text blocks and the selection marks in the cell images to be matched according to a preset sequence; and the preset sequence is the horizontal direction or the vertical direction of the cell images to be matched.
Optionally, the building module is specifically configured to: determining the maximum limit horizontal coordinate and the minimum limit horizontal coordinate of each text block according to the position information of all characters in the position information of the text block in the cell image to be matched; calculating the distance between each selection mark and each text block in the cell image to be matched based on the position information of the selection mark, the position information of the text block, the maximum limit horizontal coordinate, the minimum limit horizontal coordinate and a predetermined line feed coefficient in the cell image to be matched; and forming a distance matrix corresponding to the cell image to be matched by all the distances according to the text block number and the selection identification number in the cell image to be matched.
Optionally, the line feed coefficient is predetermined by: determining a preset threshold value as the line feed coefficient; or determining the maximum limit horizontal coordinate and the minimum limit horizontal coordinate of the cell image to be matched according to the maximum limit horizontal coordinate and the minimum limit horizontal coordinate of each text block; and calculating the line feed coefficient according to the maximum limit coordinate and the minimum limit coordinate of the cell image to be matched.
Optionally, the determining module is specifically configured to: identifying each selection mark in the cell image to be matched; determining the target selection identification in the cell image to be matched according to the recognition result; and traversing all the cell images to be matched, and determining all the target selection identifiers.
Optionally, the determining module is further specifically configured to: if the selection marks are identified to be in an unmarked state, determining each selection mark as the target selection mark; and if at least one selected identifier is identified to be in a marked state, determining the target selected identifier from each selected identifier according to the identified identifier category.
Optionally, the determining module is further specifically configured to: if only the preset information exclusion category exists, determining the selection identifier with the unmarked state as the target selection identifier; wherein the information exclusion category represents that the target text block corresponding to the selection identifier is not selected; if the information exclusion category does not exist, determining the selection identifier corresponding to the marked state as a target selection identifier; and if the information exclusion category and the identification category except the information exclusion category exist at the same time, determining the selection identification corresponding to the identification category except the information exclusion category as the target selection identification.
Optionally, the form image processing apparatus further includes: and the preprocessing module is used for preprocessing the table image to be processed.
In a third aspect, the present invention provides a computer device comprising a processor and a memory, the memory storing a computer program executable by the processor, the processor being operable to execute the computer program to implement the form image processing method of the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the form image processing method of the first aspect.
The invention provides a form image processing method, a form image processing device, computer equipment and a readable storage medium, wherein the method comprises the following steps: acquiring a form image to be processed; identifying the form image to be processed to obtain the position information of all the text blocks and the position information of all the selected identifiers; aiming at a cell image to be matched in a table image to be processed, constructing a distance matrix corresponding to the cell image to be matched according to the position information and the text block number of a text block in the cell image to be matched, and the position information and the selection identifier number of a selection identifier; matching a target text block for the selection identifier in the cell image to be matched based on the distance matrix corresponding to the cell image to be matched; the cell image to be matched is a cell image with a text block and a selection mark simultaneously; the target text block is one of all text blocks in the cell image to be matched. The invention provides an effective solution for matching the selection marks and the text blocks one to one by identifying the position information of all the text blocks and the position information of all the selection marks in the table image to be processed and further matching the target text block for each selection mark according to the identified position information, thereby providing a basis for table reconstruction of the text blocks for subsequently extracting the selection marks and improving the efficiency and the accuracy of table image processing.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a schematic illustration of a form image;
FIG. 2 is a diagram of an application environment of a table image processing method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart diagram of a tabular image processing method provided by an embodiment of the invention;
fig. 4 is a schematic flowchart of an implementation manner of step S301 provided in the embodiment of the present invention;
FIG. 5 is an exemplary diagram of a cell image to be matched according to an embodiment of the present invention;
FIG. 6 is a schematic flow chart diagram of one implementation of step S302 provided in an embodiment of the present invention;
FIG. 7 is a schematic flow chart diagram of another form image processing method provided by the embodiment of the invention;
FIG. 8 is an exemplary diagram of reconstructed table content provided by an embodiment of the invention;
fig. 9 is a schematic flowchart of an implementation manner of step S304 provided by the embodiment of the present invention;
fig. 10 is a schematic diagram illustrating a marking status of a selection identifier according to an embodiment of the present invention;
fig. 11 is a schematic diagram of a selection identifier of an information exclusion category according to an embodiment of the present invention;
FIG. 12 is a functional block diagram of a form image processing apparatus according to an embodiment of the present invention;
fig. 13 is a block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.
First, explanation of related terms involved in the embodiments of the present application is made with reference to fig. 1, and fig. 1 is a schematic view of a form image.
Selecting an identifier: refers to check boxes existing in electronic data such as tables and documents. For example, the check marks shown in FIG. 1
Figure DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE004
Identification of order selection
Figure DEST_PATH_IMAGE006
Figure DEST_PATH_IMAGE008
The mark type of the selection marker referred to in this application is not limited to another type of selection marker, such as an electronic form checkbox, and a handwritten form after printing or copying.
Text block: refers to a text region composed of a plurality of characters. For example, referring to FIG. 1, taking the cell to which the "Business property" belongs as an example, the area of characters immediately following each selection identifier represents a block of text.
Information exclusion category: the information exclusion category in this embodiment refers to a category of selection identifiers corresponding to the information that the user excluded from selection or discarded, and is similar to the selection identifiers
Figure DEST_PATH_IMAGE010
. In the process of identifying the selective identifier, if the type of the selective identifier to be identified is an information exclusion type, the information corresponding to the selective identifier is discarded.
Line feed coefficient: the line feed coefficient in this embodiment refers to a distance conversion multiple added in the vertical direction (i.e., y-coordinate component) when measuring or calculating the distance between the selection identifier and the text block located in different lines, so as to convert the position represented by the two-dimensional coordinate into a one-dimensional distance.
Referring to fig. 2, fig. 2 is an application environment diagram of a form image processing method according to an embodiment of the present invention, where the application environment diagram includes: database 210, terminal 220, computer device 230, and network 240.
Database 210 may be used to store various forms of tabular images, which may be in formats including, but not limited to: various pictures such as jpg, jpeg, ppm, bmp, png, screenshot, scanned files, PDF documents and the like. The image content may be, but is not limited to, a large amount of form material that has been authorized by the user in the documents of financial institutions such as banks, securities, funds, insurance, etc., business units, institutions, such as receipts, tickets, insurance policies, notices, confirmations, application forms, etc.
The terminal 220 may create or generate the form image in real-time and upload the form image to a database for storage in real-time or upload the form image to the computer device 230 for form image processing in real-time.
The computer device 230 may be a device for executing the form image processing method, and specifically, the computer device 230 may obtain the form image from the database 210, or the computer device 230 receives the form image uploaded by the terminal 220 in real time and then executes the form image processing method provided in the embodiment of the present invention to achieve a corresponding technical effect.
In some possible embodiments, the computer device 230 may be a stand-alone physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers.
In some possible embodiments, the network 240 may include, but is not limited to: wired network, wireless network, wherein, this wired network includes: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, Wi-Fi, and other networks that enable wireless communication.
In some possible embodiments, the terminal 220 may be, but is not limited to, a smart phone, a tablet Computer, a Personal Computer (PC), a smart wearable device, and the like.
Referring to fig. 1, in the form image shown in fig. 1, the selected text blocks after being selected and identified are really useful information, and the user requirements can be accurately analyzed by extracting the selected information.
At present, although there is a method for outputting the content of a text line after identifying the state of a document image check box provided by the related art, the method is not suitable for table images with complex content and forms, and meanwhile, in the related art for processing the table images disclosed, the effect of correspondingly filling the matched text content and cells can only be realized, there is no reliable method for matching the selection identifier with the text content, and the matching of the text content for the selection identifier is difficult due to various selection identifier forms and complex acquisition modes in the table images, thereby increasing the difficulty of extracting check information and reducing the efficiency and accuracy of table processing.
In order to solve the above technical problem, taking the application environment shown in fig. 2 as an example, the embodiment of the present invention provides a form image processing method, and it is understood that the form image processing method can be applied to the computer device 230 shown in fig. 2, please refer to fig. 3, where fig. 3 is a schematic flowchart of the form image processing method provided in the embodiment of the present invention, and the method can include the following steps:
and step S300, acquiring a form image to be processed.
The form image to be processed acquired in this embodiment is a form image authorized by a user, and the image format may include, but is not limited to, various pictures, screenshots, scans, PDF documents, and the like, such as jpg, jpeg, ppm, bmp, png, and the like. The content in the form image to be processed is not limited, but may be, but not limited to, a large amount of form data existing in the documents of financial institutions, business units and institutions such as banks, securities, funds, insurance, etc., such as receipts, bills, insurance policies, notices, confirmations, application forms, etc.
Step S301, recognizing the form image to be processed, and obtaining the position information of all text blocks and the position information of all selection marks.
It can be understood that, by identifying the form image to be processed, the cell areas in the form image to be processed can be extracted, and then the position information of the text block and the selection identifier in each cell area is obtained.
In this embodiment, the table image to be processed may be identified by an existing table identification model, and the table identification model may be implemented by, but not limited to, an image segmentation algorithm such as a U-Net Network, a Full Convolution Network (FCN), a SetNet Network, a Recurrent Neural Network (RNN), a regional-Convolutional Neural Network (Region-CNN), or a target detection algorithm such as a Single Shot multi box Detector (SDD) algorithm, a young only look once algorithm, and a mobile Network.
Step S302, aiming at the cell image to be matched in the table image to be processed, according to the position information and the text block number of the text block in the cell image to be matched, and the position information and the selection identification number of the selection identification, a distance matrix corresponding to the cell image to be matched is constructed.
The cell image to be matched is a cell image with a text block and a selection mark existing at the same time.
In this embodiment, the purpose of numbering the selection identifier and the text block respectively is as follows: the line serial number of the finally constructed distance matrix corresponds to the selection identification number, the column serial number corresponds to the text block number, or the line serial number corresponds to the text block number, the column serial number corresponds to the selection identification number, the matching processes corresponding to different corresponding relations are different, and the specific matching mode is specifically introduced in the subsequent content.
In order to measure the distance between a certain selection mark and a certain text block in a cell image to be matched, the invention innovatively provides a one-dimensional chain distance, two-dimensional coordinates are converted into a one-dimensional distance, and a calculation formula (1) is as follows:
Dij=min(|Xi-(xjmin+K|Yi-yj|)|,|Xi-(xjmax+K|Yi-yj|)|)
wherein D isijRepresenting the distance between the ith selection identifier and the jth text block in the cell image to be matched; (X)i,Yi) Characterizing location information of an ith selection marker; (x)j,xj) Representing the position information of the jth text block; x is the number ofjminRepresenting the minimum limit horizontal coordinate of the jth text block; x is the number ofjmaxRepresenting the maximum limit horizontal coordinate of the jth text block; k is a line feed coefficient, and D isijAnd the formed matrix is used as a distance matrix corresponding to the cell image to be matched.
For example, assuming that M text blocks and N selection identifiers exist in the cell image to be matched, after the distance between each text block and each selection identifier is obtained, a distance matrix D = [ D ] may be constructed according to the text block number and the selection identifier numberij]And each line represents the distance between one of the selection marks and all the text blocks, and each column represents the distance between one of the text blocks and all the selection marks. The transpose matrix of the distance matrix may also be used as the final distance matrix, where each row represents the distance between one of the text blocks and all the selection flags, and each column represents the distance between one of the selection flags and all the text blocks.
The related parameters of the distance matrix, such as the line feed coefficient, the minimum limit horizontal coordinate and the maximum limit horizontal coordinate, will be described in detail later.
Step S303, based on the distance matrix corresponding to the cell image to be matched, a matching target text block is identified for the selection in the cell image to be matched.
And the target text block is one of all text blocks in the cell image to be matched.
In a possible implementation manner, for step S303, a target text block matching each selection identifier in the first to-be-matched cell image may be determined according to the distance matrix corresponding to the first to-be-matched cell image; the first cell image to be matched is any one of all cell images to be matched; and traversing all the cell images to be matched to obtain target text blocks matched with all the selected identifiers. It can be understood that, because the conditions of the selection identifier and the text block included in each cell image to be matched are different, the embodiment adopts a mode of processing the cell images to be matched one by one, so as to determine the matching relationship between the selection identifier and the text block in each cell image to be matched.
According to the form image processing method provided by the embodiment of the invention, the obtained form image to be processed is identified to obtain the position information of all the selection marks and the position information of all the text blocks, further, the cell image to be matched in the form image to be processed is constructed according to the position information and the text block number of the text block in the cell image to be matched, the position information and the selection mark number of the selection mark, the distance matrix corresponding to the cell image to be matched is constructed, and finally, the selection mark in the cell image to be matched is matched with the target text block on the basis of the distance matrix corresponding to the cell image to be matched, so that the problem of one-to-one matching between the selection mark and the text block is solved, a basis is further provided for performing form reconstruction on the text block for subsequently extracting the selection mark, and the efficiency and the accuracy of form image processing are improved.
Optionally, after the table image to be processed is obtained, in order to ensure accuracy and efficiency of a subsequent processing result, the table image to be processed may be preprocessed first, for example, the preprocessing method adopted in this embodiment includes, but is not limited to: the method is a common method in the image processing fields of zooming, translation, transposition, mirroring, rotation, normalization, dimension reduction, denoising, equalization, smoothing and the like so as to meet the input condition of the deep learning network model. For example, the input image is reduced or enlarged to a certain size, such as 1920 × 1080 original image is reduced to 224 × 224 standard image.
It can be understood that, in the subsequent processing process, the detection, matching, reconstruction and other processing can be performed on the preprocessed form image to be processed, so that the efficiency of the subsequent processing flow can be improved, and the accuracy of the processing result can be ensured.
Optionally, before performing one-to-one matching between the selection identifier and the text block, position information of the selection identifier and the text block needs to be determined, and then matching is completed according to the position information, so that an implementation manner of obtaining the position information of the selection identifier and the text block is provided in an embodiment of the present invention, please refer to fig. 4, where fig. 4 is a schematic flowchart of an implementation manner of step S301 provided in an embodiment of the present invention, where step S301 may include the following sub-steps:
and a substep S301-1 of identifying cells of the form image to be processed and determining a cell image corresponding to each cell.
In this embodiment, after the cell identification is performed, the position information corresponding to each cell may be determined, and then the cell image corresponding to each cell is captured from the to-be-processed form image according to the position information of each cell.
In some possible embodiments, the manner of determining the location information corresponding to each cell may be: the method comprises the steps of obtaining row and column dividing lines of cells through cell areas in a pre-trained model to-be-processed form image, outputting a form line dividing binary image, screening and combining the obtained form lines by using the existing connected domain analysis, detecting form row lines and column lines by using a common form line detection model, combining into a form frame line graph, and determining the position information of each cell.
And a substep S301-2 of detecting a text block and a selection mark of the target cell image, and determining the position information of the text block in the target cell image, or determining the position information of the text block and the selection mark in the target cell image.
Wherein, the target cell image is any one of all cell images.
For the detection mode of Text block detection, in some possible embodiments, a pre-trained Text detection model may be used to perform Text block detection on the target cell image, where the Text detection model may be, but is not limited to, a connection Text local area network (CTPN), a Text detection model (Efficient and Accurate Scene Text, EAST), a slice link (SegLink) model, a classification task link (PixelLink) model, TextBoxes + +, textbook, and the like; the specific operation steps are as follows: and inputting the target cell image into the text detection model, wherein the output result is the number N1 of the text blocks of the target cell image and the position information of the text blocks.
In order to facilitate subsequent calculation of the distance between the text block and the selection mark and matching between the text block and the selection mark, after the text block is detected, all detected text block area images can be input into a pre-trained OCR algorithm model, and all text contents and corresponding coordinates thereof in the text block are recognized.
Illustratively, the OCR algorithm model described above may be implemented using, but is not limited to, the following two methods: the first is two-stage OCR which is divided into two parts of character detection and character recognition, such as: CTPN + temporal classification (CTC for short), CRNN (CNN + RNN + CTC), CNN + RNN + attention, CNN + DenseNet + CTC, and the like; the second scheme is an end-to-end OCR scheme, such as Spatial Transformation Network (STN) -OCR (Spatial transformation Network), and a trainable end-to-end Network (FOTS).
For the detection mode of the selective marker, in some possible embodiments, the category and the position features of all the selective markers in the target cell image can be extracted through a pre-trained deep learning detection algorithm.
Illustratively, the deep learning detection algorithm may use a target detection algorithm based on a deep learning technique, such as SSD, YOLOv5, and the like, where the input of the algorithm is a target cell image, and the network model structure and the output parameters are adjusted so that the result of detection output includes: the number, the category, the probability value and the position information of the selection marks in the target cell image.
And a substep S301-3 of traversing all the cell images to obtain the position information of all the text blocks and the position information of all the selection marks.
In this embodiment, for all the obtained cell images, text block detection and selection identifier detection are performed one by one, and a text block and/or a selection identifier in each cell is determined until all the cells are processed, so that position information of all the text blocks and position information of all the selection identifiers in the whole image to be processed are obtained.
It should be noted that, for each cell image, in the process of performing text block detection and selection flag detection, the following situations may occur: in the first case: only the text block exists in the cell, for example, the cell to which "business name" in fig. 1 belongs, and only the position information of the text block is obtained. In the second case: the text block and the selection identifier exist in the cell, for example, the cell to which the "enterprise property" belongs in fig. 1, at this time, the image corresponding to the cell may be the cell image to be matched, and therefore, the respective position information of the text block and the selection identifier needs to be obtained, and in this case, the text block matching needs to be performed on each selection identifier.
By the method, the position information of all the text blocks and all the selection marks in the to-be-processed form image can be obtained, and further, the target text block can be matched for each selection mark according to a matching mode given subsequently in the embodiment.
Optionally, before step S302, numbering may also be performed on the text block and the selection identifier in the cell to be matched, so as to obtain a text block number and a selection identifier number, and this embodiment further provides a possible implementation manner of numbering the text block and the selection identifier separately: and numbering the text blocks and the selection identifications in the cell images to be matched according to a preset sequence. The preset sequence is the horizontal direction or the vertical direction of the cell images to be matched.
In a possible implementation manner, the text blocks and the selection identifiers may be numbered sequentially from left to right and from top to bottom, the numbered cell images to be matched may be as shown in fig. 5, and fig. 5 is an exemplary diagram of a cell image to be matched according to an embodiment of the present invention.
For example, continuing with the distance matrix D, where the row number corresponds to the text block number, and the column number corresponds to the selection identifier number, that is, the mth row represents the distance between the mth text block and all the selection identifiers, and the nth column corresponds to the distance between the nth selection identifier and all the text blocks, assuming that the jth row is matched with the jth selection identifier, it may be considered that the jth selection identifier matches with the ith text block. By analogy, the text block to which each selection identifier corresponds can be obtained.
Optionally, this embodiment further provides an implementation manner of constructing a distance matrix corresponding to a to-be-matched cell image, please refer to fig. 6, where fig. 6 is a schematic flowchart of an implementation manner of step S302 provided in the embodiment of the present invention, and step S302 may include:
s302-1, determining the maximum limit horizontal coordinate and the minimum limit horizontal coordinate of each text block according to the position information of all characters in the position information of the text block in the cell image to be matched.
In this embodiment, the maximum limit coordinate and the minimum limit coordinate are determined as follows: supposing that M text blocks exist in the detection result in the target cell image to be matched, supposing that the mth text block consists of N characters, and the coordinate of the center point of the ith character is (x)i,yi) Then, the coordinate y component of the mth text block is defined as:
Figure DEST_PATH_IMAGE012
wherein N denotes the total number of characters, yiThe ordinate characterizing the center point of the ith character.
Then the extreme left value x of the horizontal coordinate x of the mth text blockminAnd extreme right value xmaxRespectively as follows:
xmmin=min(xi),i=1,2,…,N
xmmax=max(xi),i=1,2,…,N
wherein, min (x)i) The representation takes the minimum value from the horizontal coordinates corresponding to the N characters of the mth text block; max (x)i) The representation takes the maximum value from the horizontal coordinates corresponding to the N characters of the mth text block; therefore, the minimum limit horizontal coordinate and the maximum limit horizontal coordinate of the mth text block are (x) respectivelymmin,y)、(xmmax,y)。
S302-2, calculating the distance between each selection mark and each text block in the cell image to be matched based on the position information of the selection mark in the cell image to be matched, the position information of the text block, the maximum limit horizontal coordinate, the minimum limit horizontal coordinate and a predetermined line feed coefficient.
In this embodiment, in order to determine the distance matrix, the position information of the selective marker may be processed first, for example, if the coordinate of the upper left corner of a certain selective marker in the detection result is (x)1,y1) The coordinate of the lower right corner is (x)2,y2) Then, the coordinates (x, y) of the center point of the selection marker are used as the position of the selection marker, that is:
Figure DEST_PATH_IMAGE014
and then, the determined center point coordinate is used as the position information of the selection mark for subsequent calculation of the distance between the text block and the center point coordinate.
In this embodiment, the line feed coefficient may be predetermined in the following manner:
determining a preset threshold value as a line feed coefficient; or determining the maximum limit coordinate and the minimum limit coordinate in the target cell image to be matched according to the maximum limit coordinate and the minimum limit coordinate of each text block; and calculating a line feed coefficient according to the maximum limit coordinate and the minimum limit coordinate of the target cell image to be matched.
In this embodiment, the minimum limit horizontal coordinate and the maximum limit horizontal coordinate of the cell to be matched may be defined as:
Xmin=min(xmmin),i=1,2,…,M
Xmax=max(xmmax),i=1,2,…,M
wherein, min (x)mmin) The representation takes the minimum value from the minimum limit horizontal coordinates corresponding to the M character blocks; max (x)mmax) The maximum value is taken from the maximum limit horizontal coordinates corresponding to the M character blocks; x is the number ofmminThe minimum limit horizontal coordinate representing the mth text block; x is the number ofmmaxThe maximum limit horizontal coordinate characterizing the mth text block, the line feed coefficient can be expressed as: k = | Xmax-Xmin|。
And S302-3, forming a distance matrix corresponding to the cell image to be matched by all the distances according to the text block number and the selection identification number in the cell image to be matched.
In this embodiment, it is assumed that a one-dimensional chain distance is calculated from a two-dimensional coordinate having M text blocks and N selection marks in a cell image to be matched according to formula 1, and the following distance matrix obtained by multiplying M by N may be formed as follows:
Figure DEST_PATH_IMAGE016
it can be seen that the number of rows of the distance matrix represents the number of text blocks, the number of columns represents the number of selection identifiers, each row represents a one-dimensional chain distance between one text block and all the selection identifiers, and each column represents a one-dimensional chain distance between one selection identifier and all the text blocks, that is, the row number in the distance matrix corresponds to a text number, and the column number corresponds to a selection identifier number.
For example, continuing with the above distance matrix, column 1 represents the distance between the 1 st text block and all the selection indicators, and therefore, the column number 1 is identical to the text block number 1, and similarly, row 1 represents the distance between the 1 st selection indicator and all the text blocks, and therefore, the row number 1 is identical to the selection indicator number.
Of course, in the process of constructing the distance matrix, the column number may be consistent with the text number, and the row number may be consistent with the selection identifier number, which is not limited herein, but it should be distinguished that, for the two construction methods, different matching methods are provided in this embodiment.
In a first possible matching manner, for a construction manner in which a row number in a distance matrix corresponds to a text block number and a column number corresponds to a selection identifier number, an embodiment of the present invention provides a matching manner, that is:
step 1, determining the minimum distance in a target column; the target column is any column in the distance matrix.
And 2, associating the selection identification number corresponding to the row sequence number of the row where the minimum distance is located with the text block number corresponding to the column sequence number of the target column.
And 3, deleting the row where the minimum distance exists in the distance matrix and the target column.
And 4, traversing all columns to obtain the association relationship between each selection identifier number and the text block number, and determining the target text block corresponding to each selection identifier according to the association relationship.
For example, continuing with the distance matrix example above, assuming that the target column is column 1, column 1 corresponding to the 1 st selection marker, the column vector [ D ] is11,D21,…,Dj1,…,DM1]Sorting according to the order from small to large, if D is obtainedmin=Dj1Then D isj1The row sequence number of the jth row is associated with the column sequence number of the 1 st column, so that the 1 st selection identifier is associated with the jth text block, the column where the 1 st selection identifier is matched and the row where the jth text block is located are deleted, and so on, all the N selection identifiers are finally matched, and the sequence number of the jth row in which the 1 st selection identifier is located is associated with the column sequence number of the jth text blockAnd reserving unmatched text blocks, and completing the matching of all the selection marks.
In a second possible matching method, the transpose of the distance matrix may be used as the final distance matrix for matching. Thus, the row number in the distance matrix corresponds to the selection identifier number, and the column number corresponds to the text block number, so that the corresponding matching manner is different from the first matching manner in that the traversal is performed by rows, and other matching processes are similar, and are not described herein again.
That is, regardless of the first matching manner or the second matching manner, since the number of text blocks is necessarily greater than or equal to the number of selection marks, the text blocks are traversed according to the row or column corresponding to the selection mark number during traversal until each selection mark matches the target text block.
To facilitate understanding of the above matching principle, a specific example is given below for explanation.
Continuing with fig. 5 as an example, assuming that 3 selection identifiers, 3 text blocks and respective position information are detected in step S302, it can be seen that the number of the text blocks is equal to the number of the check boxes, and the text blocks and the check boxes are numbered sequentially from left to right and from top to bottom.
The step S303-2 may include the following steps: firstly, a 3 × 3 distance matrix is constructed, and it can be considered that the row sequence number corresponds to the text block number, and the column sequence number corresponds to the selection identifier number:
Figure DEST_PATH_IMAGE018
then, following the principle of column-wise traversal, the identifier is selected for the 1 st column of the distance matrix, i.e. 0, corresponding to the 3-dimensional column vector [15,125,701 ]]TDetermining the minimum value to be 15, wherein 15 is located in the 1 st row, that is, the text block No. 0, so that the selection identifier No. 0 matches the text block No. 0, and deleting the 1 st row and the 1 st column of data from the distance matrix to obtain a 2 × 2 distance matrix after dimensionality reduction:
Figure DEST_PATH_IMAGE020
secondly, the 2 nd column of the original distance matrix, namely the No. 1 selection mark, is corresponding to the 2-dimensional column vector [14,105 ] after dimension reduction]TDetermining that the minimum value is 14, is positioned on the 2 nd row of the original distance matrix and corresponds to the text block No. 1, so that the selection identifier No. 1 is matched with the text block No. 1; deleting the 2 nd row and 2 nd column data from the distance matrix to obtain the distance matrix after dimension reduction as follows:
Figure DEST_PATH_IMAGE022
finally, aiming at the 3 rd column of the original distance matrix, namely the 2 nd selection mark, the distance matrix after corresponding dimension reduction only has one distance value, so the distance matrix is automatically matched with the 2 nd text block. And finally, matching the empty selection identifier with a production enterprise, matching the checked selection identifier with a network e-commerce, and matching the crossed selection identifier with an export enterprise.
In order to verify the accuracy of the matching method provided by the embodiment of the present invention, the implementation establishes 100 table picture test sets (including 535 selection identifiers and 546 text blocks, where the scenes cover various scenes such as the most common single-line text selection identifier, multi-line text selection identifier, cross-line text selection identifier, the same number of selection identifiers and texts, different numbers of selection identifiers and texts, selection identifiers before text blocks, check boxes after text blocks, and the like), performs matching tests on the test sets by using the method for matching selection identifiers and text blocks, and as a result, the number of correctly matched text blocks and selection identifiers is 533, and the comprehensive matching accuracy reaches 99.6%, as shown in the following table:
TABLE 1 output result of matching test set of certain selection identifier and text block
Type of scene Number of pictures Number of selected marks Number of text blocks Match correct number Matching accuracy
Shan Hang 52 191 202 191 100%
Multiple rows 48 344 344 342 99.4%
Total up to 100 535 546 533 99.6%
And combining the matching result of the selection identification and the text block, further extracting useful information in the form image for displaying.
Optionally, after the problem of one-to-one matching between the selection identifier and the text block is solved, the embodiment of the present invention further provides a method for extracting a target text matched with the selection identifier, and further table information may be reconstructed according to the extracted target text block and other unmatched text blocks, so as to achieve an effect of only showing information useful for a user in a table. Referring to fig. 7, fig. 7 is a schematic flowchart of another form image processing method according to an embodiment of the present invention, where the method may further include:
in step S304, a target selection flag is determined from all the selection flags.
Step S305, determining the target text block matched with the target selection identifier as a text block to be extracted.
It can be understood that the target text block corresponding to the target selection identifier is information useful for the user, and the table information is reconstructed by extracting the information useful for the user, so that the table information can be simplified, and the information redundancy is avoided to reduce the user viewing experience.
Step S306, based on the text block to be extracted and the unmatched text block, reconstructing the table content corresponding to the table image to be processed.
It can be understood that the text blocks that are not matched are also key information that constitutes the table content, and therefore, after extracting the target text block corresponding to the target selection identifier and the text block that is not matched, the table content is reconstructed, and the obtained table content only retains the key information of the table and information useful to the user, which is convenient for the user to view.
For example, in conjunction with fig. 1, by performing the above steps, the finally reconstructed table content is as shown in fig. 8, and fig. 8 is an exemplary diagram of the reconstructed table content provided by the embodiment of the present invention. As can be seen from comparing fig. 1 and fig. 8, only the text blocks after the selection marks of the selection marks in fig. 1 are retained in the reconstructed table content, and the text blocks that are not matched with the selection marks are also retained, that is, the key information constituting the table, for example, in fig. 8, "patent type: "and all text blocks in the first column of the table. Therefore, the reconstructed table content can only show useful information of the user, so that the redundancy of the table content is avoided, and the user can conveniently check the table content.
Optionally, in order to determine all target selection identifiers, this implementation also provides a possible implementation manner, please refer to fig. 9, where fig. 9 is a schematic flow chart of an implementation manner of step S304 provided in the embodiment of the present invention, where step S304 may include the following sub-steps:
and a substep S304-1, identifying each selection mark in the cell image to be matched.
It can be understood that, because the selection identifier and the text block exist in the cell image to be matched at the same time, the text block that is interested in the user in the table image to be matched needs to be retained in the reconstructed table, and therefore, the target selection identifier is only identified for the cell image to be matched, and the text block to be extracted can be determined, and the non-cell image to be matched is not processed, so that the processing process is simplified, and the identification efficiency is improved.
And a substep S304-2, determining a target selection identifier in the cell image to be matched according to the recognition result.
It is understood that the recognition result may be, but is not limited to, whether the selective marker is a marked state, a marked category, a marked number, and the like, and the target selective marker may be determined according to the result.
And a substep S304-3, traversing all the cell images to be matched, and determining all the target selection identifiers.
In this embodiment, since the identifiers and the mark states of the selection identifiers in each to-be-matched cell image are different, the ways of identifying the target selection identifiers are also different, and therefore, it is necessary to identify each to-be-matched cell image one by one.
With respect to the above sub-step S304-2, an implementation manner provided by the embodiment of the present invention is further to determine the target selection identifier, for example, the sub-step S304-2 may include the following steps:
step 1, if all the selection marks are identified to be in an unmarked state, all the selection marks are determined to be target selection marks.
It can be understood that, if all the selection identifiers are in an unmarked state, the text blocks corresponding to all the selection identifiers are retained so as to be subsequently filled into the reconstructed form image, which is convenient for the user to further select.
For example, with reference to fig. 1, taking the cell to which the "enterprise property" belongs as an example, if the marking states of the 3 selection identifiers corresponding to the 3 text blocks are as shown in fig. 10, and fig. 10 is a schematic diagram of the marking states of the selection identifiers provided by the embodiment of the present invention, then all of the "production enterprise", "network e-commerce", and "export enterprise" need to be retained, and in the reconstructed table content shown in fig. 8, the cell should show the 3 text blocks, which is convenient for the user to select again.
And 2, if at least one selected identifier is identified to be in a marked state, determining a target selected identifier from all selected identifiers according to the identified identifier category.
It can be understood that, in the actual implementation process, because some of the marks correspond to the contents that are excluded from being selected by the user and some of the marks correspond to the contents that are selected by the user, the target selection identifiers may be determined for different types of identifiers.
In some possible embodiments, step 2 may be performed as follows:
and 2-1, if only the preset information exclusion category exists, determining other selection marks except the selection mark corresponding to the mark with the information exclusion category as target selection marks.
Wherein the information excludes categories, similar to
Figure 42782DEST_PATH_IMAGE010
The selection marks represent that the corresponding target text block is not kept in the reconstructed form image.
In a possible implementation manner, as an example, the information exclusion category may be, but is not limited to, as shown in fig. 11, where fig. 11 is a schematic diagram of a selection identifier of an information exclusion category provided by an embodiment of the present invention. It is to be understood that, other than the information exclusion category, similar to the single selection marks, for example, the selection marks corresponding to "yes" and "no" in "patent application for product" row in fig. 1, the color covering marks, for example, the selection marks corresponding to "invention patent", "utility model" and "appearance patent" in "patent type" row in fig. 1, or the selection marks corresponding to "enterprise property" and "sales channel" in fig. 1, may be used as the target selection marks.
And 2-2, if the information exclusion type does not exist, determining the selection identifier corresponding to the marked state as a target selection identifier.
For example, continuing to refer to fig. 1, taking the cell to which the "product petition or not" belongs as an example, wherein the selection identifier in the marked state, that is, the "yes" corresponding selection identifier is not the information exclusion category, the "yes" corresponding selection identifier is determined as the target selection identifier, and the "product petition or not" belonging cell in the table image shown in fig. 8 retains the text block "yes".
For another example, in the cell to which the "patent type" belongs, the selection identifier corresponding to the "invention patent" is in a marked state, but not in the information exclusion category, the selection identifier corresponding to the "invention patent" is the target selection identifier.
And 2-3, if the information exclusion category and the identification category except the information exclusion category exist at the same time, determining the selection identification corresponding to the identification category except the information exclusion category as the target selection identification.
For example, continuing to take the cell to which the "enterprise property" belongs as an example in fig. 1, where the selection identifiers corresponding to the "network e-commerce" and the "export enterprise" are in the marked state, but the selection identifier corresponding to the "export enterprise" is the information exclusion category, then only the selection identifier corresponding to the "network e-commerce" is determined as the target selection identifier.
The target selection identifier in each cell image to be matched can be obtained through the method.
In order to implement the steps in the foregoing embodiments to achieve the corresponding technical effects, the form image processing method provided in the embodiments of the present invention may be implemented in a hardware device or in a form of a software module, and when the form image processing method is implemented in a form of a software module, an embodiment of the present invention further provides a form image processing apparatus, please refer to fig. 12, where fig. 12 is a functional block diagram of the form image processing apparatus provided in the embodiments of the present invention, and the form image processing apparatus 400 may include:
the obtaining module 410 is configured to obtain a table image to be processed.
The recognition module 420 is configured to recognize the form image to be processed, and obtain position information of all text blocks and position information of all selection identifiers;
and the constructing module 430 is configured to construct, for a cell image to be matched in the table image to be processed, a distance matrix corresponding to the cell image to be matched according to the position information and the text block number of the text block in the cell image to be matched, and the position information and the selection identifier number of the selection identifier.
The matching module 440 is configured to identify a matching target text block for selection in the cell image to be matched based on the distance matrix corresponding to the cell image to be matched;
the cell image to be matched is a cell image with a text block and a selection mark simultaneously; the target text block is one of all text blocks in the cell image to be matched.
It is appreciated that the obtaining module 410, the identifying module 420, the constructing module 430, and the matching module 440 described above may cooperatively perform the various steps of fig. 3 to achieve the corresponding technical effect.
In some possible implementations, the form image processing apparatus 400 further includes: the determining module is used for determining a target selection identifier from all the selection identifiers; determining a target text block matched with the target selection identifier as a text block to be extracted; and the reconstruction module is used for reconstructing the table image to be processed based on the text block to be extracted and the unmatched text block.
In some possible embodiments, the identifying module 420 further includes: the determining unit is used for identifying the cells of the form image to be processed and determining the cell image corresponding to each cell; the detection unit is used for detecting the text block and the selection mark of the target cell image, and determining the position information of the text block in the target cell image, or determining the position information of the text block and the selection mark in the target cell image; wherein the target cell image is any one of all cell images; and the acquisition unit is used for traversing all the cell images to acquire the position information of all the text blocks and the position information of all the selection marks.
In some possible embodiments, the matching module 430 includes: the determining unit is used for determining a target text block matched with each selection identifier in the first cell image to be matched according to the distance matrix corresponding to the first cell image to be matched; the first cell image to be matched is any one of all cell images to be matched; and the acquisition unit is used for traversing all the cell images to be matched and acquiring the target text blocks matched with all the selection identifiers.
In some possible embodiments, the method further comprises a numbering module configured to: numbering the text blocks and the selection marks in the cell images to be matched according to a preset sequence; the preset sequence is the horizontal direction or the vertical direction of the cell images to be matched.
In some possible embodiments, the building block is specifically configured to: determining the maximum limit horizontal coordinate and the minimum limit horizontal coordinate of each text block according to the position information of all characters in the position information of the text block in the cell image to be matched; calculating the distance between each selection identifier and each text block in the cell image to be matched based on the position information of the selection identifier in the cell image to be matched, the position information of the text block, the maximum limit horizontal coordinate, the minimum limit horizontal coordinate and a predetermined line feed coefficient; and forming a distance matrix corresponding to the cell image to be matched by all the distances according to the text block number and the selected identification number in the cell image to be matched.
In some possible embodiments, the line feed coefficient is predetermined by: determining a preset threshold value as a line feed coefficient; or determining the maximum limit horizontal coordinate and the minimum limit horizontal coordinate of the cell image to be matched according to the maximum limit horizontal coordinate and the minimum limit horizontal coordinate of each text block; and calculating a line feed coefficient according to the maximum limit coordinate and the minimum limit coordinate of the cell image to be matched.
In some possible embodiments, the determining module is specifically configured to: aiming at the target cell image to be matched, identifying each selection mark; determining a target selection identifier in the target cell image to be matched according to the recognition result; and traversing all the cell images to be matched, and determining all the target selection identifiers.
In some possible embodiments, the determining module is further specifically configured to: a determination module specifically configured to: identifying each selection mark in the cell image to be matched; determining a target selection identifier in the cell image to be matched according to the recognition result; and traversing all the cell images to be matched, and determining all the target selection identifiers.
In some possible embodiments, the determining module is further specifically configured to determine each selection identifier as the target selection identifier if it is identified that each selection identifier is in an unmarked state; and if at least one selected identifier is identified to be in a marked state, determining a target selected identifier from each selected identifier according to the identified identifier category.
In some possible embodiments, the determining module is further specifically configured to: if only the preset information exclusion category exists, determining the selection identifier with the unmarked state as a target selection identifier; the target text block corresponding to the information exclusion type representation selection identification is not selected; if the information exclusion type does not exist, determining the selection identifier corresponding to the marked state as a target selection identifier; and if the information exclusion type and the identification type except the information exclusion type exist at the same time, determining the selection identification corresponding to the identification type except the information exclusion type as the target selection identification.
In some possible embodiments, the form image processing apparatus further includes: and the preprocessing module is used for preprocessing the form image to be processed.
It should be noted that each functional module in the form of software or Firmware in the form of table image processing apparatus 400 provided by the embodiment of the present invention may be stored in a memory or solidified in an Operating System (OS) of a computer device, and may be executed by a processor in the computer device. Meanwhile, data, codes of programs, and the like required to execute the above modules may be stored in the memory. Therefore, the embodiment of the present invention further provides a computer device, which may be the computing device 230 shown in fig. 2 or another computer device with a data processing function, and the present invention is not limited thereto.
Referring to fig. 13, fig. 13 is a block diagram of a computer device according to an embodiment of the present invention. The computer device 230 includes a communication interface 231, a processor 232, and a memory 233. The processor 232, memory 233, and communication interface 231 are electrically connected to one another, directly or indirectly, to enable the transfer or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory 233 may be used to store software programs and modules, such as program instructions/modules corresponding to the form image processing method provided by the embodiment of the present invention, and the processor 232 executes various functional applications and data processing by executing the software programs and modules stored in the memory 233. The communication interface 231 may be used for communicating signaling or data with other node devices. The computer device 230 may have a plurality of communication interfaces 231 in the present invention.
The Memory 233 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
Processor 232 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), etc.; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc.
Embodiments of the present invention also provide a readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the form image processing method according to any of the foregoing embodiments. The computer readable storage medium may be, but is not limited to, various media that can store program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a PROM, an EPROM, an EEPROM, a magnetic or optical disk, etc.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (13)

1. A method of form image processing, the method comprising:
acquiring a form image to be processed;
identifying the form image to be processed to obtain the position information of all text blocks and the position information of all selected identifications;
aiming at a cell image to be matched in the table image to be processed, constructing a distance matrix corresponding to the cell image to be matched according to the position information and the text block number of a text block in the cell image to be matched, and the position information and the selection identifier number of a selection identifier;
based on the distance matrix corresponding to the cell image to be matched, identifying a matching target text block for the selection in the cell image to be matched;
the cell image to be matched is a cell image with a text block and a selection identifier existing at the same time; the target text block is one of all text blocks in the cell image to be matched;
aiming at the cell image to be matched in the table image to be processed, according to the position information and the text block number of the text block in the cell image to be matched, the position information and the selection identifier number of the selection identifier, constructing a distance matrix corresponding to the cell image to be matched, comprising the following steps:
determining the maximum limit horizontal coordinate and the minimum limit horizontal coordinate of each text block according to the position information of all characters in the position information of the text block in the cell image to be matched;
calculating the distance between each selection mark and each text block in the cell image to be matched based on the position information of the selection mark, the position information of the text block, the maximum limit horizontal coordinate, the minimum limit horizontal coordinate and a predetermined line feed coefficient in the cell image to be matched;
and forming a distance matrix corresponding to the cell image to be matched by all the distances according to the text block number and the selection identification number in the cell image to be matched.
2. A form image processing method according to claim 1, characterized in that the method further comprises:
determining a target selection identifier from all the selection identifiers;
determining the target text block matched with the target selection identifier as a text block to be extracted;
and reconstructing the table content corresponding to the table image to be processed based on the text block to be extracted and the unmatched text block.
3. The form image processing method according to claim 1 or 2, wherein recognizing the form image to be processed to obtain position information of all text blocks and position information of all selection marks comprises:
cell recognition is carried out on the table image to be processed, and the cell image corresponding to each cell is determined;
performing text block detection and selection identifier detection on a target cell image, and determining position information of a text block in the target cell image, or determining respective position information of the text block and the selection identifier in the target cell image;
wherein the target cell image is any one of all the cell images;
and traversing all the cell images to obtain the position information of all the text blocks and the position information of all the selection marks.
4. The form image processing method of claim 1, wherein identifying matching target text blocks for selection in the cell image to be matched based on the distance matrix corresponding to the cell image to be matched comprises:
determining the target text block matched with each selection identifier in the first cell image to be matched according to the distance matrix corresponding to the first cell image to be matched; the first cell image to be matched is any one of all the cell images to be matched;
and traversing all the cell images to be matched to obtain the target text blocks matched with all the selection identifications.
5. The form image processing method according to claim 1, further comprising, before the step of constructing, for a to-be-matched form image in the to-be-processed form image, a distance matrix corresponding to the to-be-matched form image according to position information and a text block number of a text block in the to-be-matched form image, and position information and a selection identification number of a selection identification, the method further comprising:
numbering the text blocks and the selection marks in the cell images to be matched according to a preset sequence; and the preset sequence is the horizontal direction or the vertical direction of the cell images to be matched.
6. The form image processing method of claim 1, wherein the line feed coefficient is predetermined by:
determining a preset threshold value as the line feed coefficient; or,
determining the maximum limit horizontal coordinate and the minimum limit horizontal coordinate of the cell image to be matched according to the maximum limit horizontal coordinate and the minimum limit horizontal coordinate of each text block;
and calculating the line feed coefficient according to the maximum limit coordinate and the minimum limit coordinate of the cell image to be matched.
7. The form image processing method of claim 2, wherein determining a target selection identifier from the total selection identifiers comprises:
identifying each selection mark in the cell image to be matched;
determining the target selection identification in the cell image to be matched according to the recognition result;
and traversing all the cell images to be matched, and determining all the target selection identifiers.
8. The form image processing method of claim 7, wherein determining the target selection identifier within the cell image to be matched based on the recognition result comprises:
if the selection marks are identified to be in an unmarked state, determining each selection mark as the target selection mark;
and if at least one selected identifier is identified to be in a marked state, determining the target selected identifier from each selected identifier according to the identified identifier category.
9. The form image processing method of claim 8, wherein determining the target selection identifier from each of the selection identifiers based on the identified identifier category if at least one selection identifier is identified as being in a marked state comprises:
if only the preset information exclusion category exists, determining the selection identifier with the unmarked state as the target selection identifier;
wherein the information exclusion category represents that the target text block corresponding to the selection identifier is not selected;
if the information exclusion category does not exist, determining the selection identifier corresponding to the marked state as a target selection identifier;
and if the information exclusion category and the identification category except the information exclusion category exist at the same time, determining the selection identification corresponding to the identification category except the information exclusion category as the target selection identification.
10. The form image processing method of claim 1, wherein after obtaining the form image to be processed, the method further comprises:
and preprocessing the table image to be processed.
11. A form image processing apparatus characterized by comprising:
the acquisition module is used for acquiring a form image to be processed;
the recognition module is used for recognizing the table image to be processed to obtain the position information of all the text blocks and the position information of all the selection marks;
the building module is used for building a distance matrix corresponding to the cell image to be matched according to the position information and the text block number of the text block in the cell image to be matched and the position information and the selection identification number of the selection identification aiming at the cell image to be matched in the table image to be processed;
the matching module is used for identifying and matching a target text block for selection in the cell image to be matched based on the distance matrix corresponding to the cell image to be matched;
the cell image to be matched is a cell image with a text block and a selection identifier existing at the same time; the target text block is one of all text blocks in the cell image to be matched;
the building module is specifically configured to: determining the maximum limit horizontal coordinate and the minimum limit horizontal coordinate of each text block according to the position information of all characters in the position information of the text block in the cell image to be matched; calculating the distance between each selection mark and each text block in the cell image to be matched based on the position information of the selection mark, the position information of the text block, the maximum limit horizontal coordinate, the minimum limit horizontal coordinate and a predetermined line feed coefficient in the cell image to be matched; and forming a distance matrix corresponding to the cell image to be matched by all the distances according to the text block number and the selection identification number in the cell image to be matched.
12. A computer device comprising a processor and a memory, the memory storing a computer program executable by the processor, the processor being operable to execute the computer program to implement the form image processing method of any of claims 1-10.
13. A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the form image processing method according to any one of claims 1 to 10.
CN202111409116.7A 2021-11-25 2021-11-25 Table image processing method and device, computer equipment and readable storage medium Active CN113837151B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111409116.7A CN113837151B (en) 2021-11-25 2021-11-25 Table image processing method and device, computer equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111409116.7A CN113837151B (en) 2021-11-25 2021-11-25 Table image processing method and device, computer equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN113837151A CN113837151A (en) 2021-12-24
CN113837151B true CN113837151B (en) 2022-02-22

Family

ID=78971708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111409116.7A Active CN113837151B (en) 2021-11-25 2021-11-25 Table image processing method and device, computer equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN113837151B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114220103B (en) * 2022-02-22 2022-05-06 成都明途科技有限公司 Image recognition method, device, equipment and computer readable storage medium
CN115273113B (en) * 2022-09-27 2022-12-27 深圳擎盾信息科技有限公司 Table text semantic recognition method and device
CN115690823B (en) * 2022-11-01 2023-11-10 南京云阶电力科技有限公司 Table information extraction method and device with burr characteristics in electrical drawing
CN115618836B (en) * 2022-12-15 2023-03-31 杭州恒生聚源信息技术有限公司 Wireless table structure restoration method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201617971A (en) * 2014-11-06 2016-05-16 Alibaba Group Services Ltd Method and apparatus for information recognition
CN109902673A (en) * 2019-01-28 2019-06-18 北京明略软件系统有限公司 Table Header information identification and method for sorting, system, terminal and storage medium in table
CN110659574A (en) * 2019-08-22 2020-01-07 北京易道博识科技有限公司 Method and system for outputting text line contents after status recognition of document image check box
CN113297975A (en) * 2021-05-25 2021-08-24 新东方教育科技集团有限公司 Method and device for identifying table structure, storage medium and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060142993A1 (en) * 2004-12-28 2006-06-29 Sony Corporation System and method for utilizing distance measures to perform text classification

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201617971A (en) * 2014-11-06 2016-05-16 Alibaba Group Services Ltd Method and apparatus for information recognition
CN109902673A (en) * 2019-01-28 2019-06-18 北京明略软件系统有限公司 Table Header information identification and method for sorting, system, terminal and storage medium in table
CN110659574A (en) * 2019-08-22 2020-01-07 北京易道博识科技有限公司 Method and system for outputting text line contents after status recognition of document image check box
CN113297975A (en) * 2021-05-25 2021-08-24 新东方教育科技集团有限公司 Method and device for identifying table structure, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN113837151A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN113837151B (en) Table image processing method and device, computer equipment and readable storage medium
CN110348294B (en) Method and device for positioning chart in PDF document and computer equipment
WO2020232872A1 (en) Table recognition method and apparatus, computer device, and storage medium
EP4040401A1 (en) Image processing method and apparatus, device and storage medium
US20190294921A1 (en) Field identification in an image using artificial intelligence
CN111639648B (en) Certificate identification method, device, computing equipment and storage medium
CN113378710A (en) Layout analysis method and device for image file, computer equipment and storage medium
CN112883926B (en) Identification method and device for form medical images
CN114005126A (en) Table reconstruction method and device, computer equipment and readable storage medium
CN112036295A (en) Bill image processing method, bill image processing device, storage medium and electronic device
CN114550051A (en) Vehicle loss detection method and device, computer equipment and storage medium
CN112381458A (en) Project evaluation method, project evaluation device, equipment and storage medium
CN112232336A (en) Certificate identification method, device, equipment and storage medium
CN115171125A (en) Data anomaly detection method
CN113255767B (en) Bill classification method, device, equipment and storage medium
CN112396060B (en) Identification card recognition method based on identification card segmentation model and related equipment thereof
CN117709317A (en) Report file processing method and device and electronic equipment
CN113420684A (en) Report recognition method and device based on feature extraction, electronic equipment and medium
CN112287763A (en) Image processing method, apparatus, device and medium
CN115880702A (en) Data processing method, device, equipment, program product and storage medium
CN112395450B (en) Picture character detection method and device, computer equipment and storage medium
CN112633116B (en) Method for intelligently analyzing PDF graphics context
CN114359918A (en) Bill information extraction method and device and computer equipment
CN116524386A (en) Video detection method, apparatus, device, readable storage medium, and program product
CN113033170A (en) Table standardization processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant