CN116151202B - Form filling method, device, electronic equipment and storage medium - Google Patents

Form filling method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116151202B
CN116151202B CN202310155415.5A CN202310155415A CN116151202B CN 116151202 B CN116151202 B CN 116151202B CN 202310155415 A CN202310155415 A CN 202310155415A CN 116151202 B CN116151202 B CN 116151202B
Authority
CN
China
Prior art keywords
cells
text
cell
blank
valued
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310155415.5A
Other languages
Chinese (zh)
Other versions
CN116151202A (en
Inventor
刘树衎
冯杭
李震宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Naval University of Engineering PLA
Original Assignee
Naval University of Engineering PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Naval University of Engineering PLA filed Critical Naval University of Engineering PLA
Priority to CN202310155415.5A priority Critical patent/CN116151202B/en
Publication of CN116151202A publication Critical patent/CN116151202A/en
Application granted granted Critical
Publication of CN116151202B publication Critical patent/CN116151202B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/174Form filling; Merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a form filling method, a form filling device, electronic equipment and a storage medium. Wherein the method comprises the following steps: identifying cells and text in the target form image to convert the target form to a spreadsheet; predicting membership of adjacent valued cells and blank cells through a graph convolution network and cells; matching the text in the valued cell with the header in the database to obtain the text to be filled of the blank cell with the membership with the valued cell; generating corresponding texts in the blank cells based on the texts to be filled in so as to finish filling of the electronic form; the valued cells are cells filled with contents in the target form, and the blank cells are cells not filled with contents in the target form. According to the method and the device, the target form is converted into the electronic form through the target detection technology and the graph rolling network, so that automatic filling of the form is realized, and the workload of staff is reduced.

Description

Form filling method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a method and apparatus for filling a form, an electronic device, and a storage medium.
Background
With the popularization of electronic office work, it is possible to manage tables by using informatization means. The information means is used for managing the forms, so that the work efficiency of form management can be improved, the forms can be conveniently and quickly searched, original forms can be effectively protected, file utilization rate can be improved by sharing the form information, and obvious economic benefits can be generated. It is thus necessary to convert a large number of paper forms, image forms, and the like into electronic forms to realize informative management of the forms.
Currently, common automated form filling is to convert paper forms into electronic forms, mainly through form detection and cell recognition. Although the paper form can be converted into the electronic form, the filling of the form is performed manually, so that the workload of staff is greatly increased.
Disclosure of Invention
In view of this, an object of an embodiment of the present application is to provide a form filling method, apparatus, electronic device, and storage medium. The target form can be converted into the electronic form, automatic filling of the form is realized, and the workload of staff is reduced.
In a first aspect, an embodiment of the present application provides a form filling method, including: identifying cells and text in a target form image to convert the target form into a spreadsheet; predicting membership of adjacent valued cells and blank cells through a graph rolling network and the cells; matching the text in the valued cell with the text in the database to obtain a text to be filled in of the blank cell with a membership relationship with the valued cell; generating corresponding texts in the blank cells based on the texts to be filled in so as to finish filling of the electronic form; the valued cells are cells filled with contents in the target form, and the blank cells are cells not filled with contents in the target form.
In the implementation process, the cells and the text in the target form image are determined through image recognition, so that the target form in the image is converted into the fillable electronic form. And establishing an association relationship between the valued cells and the blank cells through membership relationships between the valued cells and the blank cells so as to fill the blank cells corresponding to filling contents corresponding to the valued cells conveniently, thereby preventing the problem of filling errors. In addition, the text in the valued cell and the text in the database are matched to obtain the text to be filled of the blank cell with the membership with the valued cell, so that the automatic filling of the electronic form is realized. The target form is converted into the electronic form through the target detection technology and the graph rolling network, so that automatic filling of the form is realized, and the workload of staff is reduced.
In one embodiment, the identifying cells and text in the target form image includes: extracting cells in the target table image through a Swin transducer and an R-FPN, and acquiring cell attributes corresponding to the cells; recognizing text in the target form image through pad OCR; wherein the R-FPN is obtained by adding a residual structure in a ResNet network on the FPN structure, and the R-FPN is used for increasing the specific gravity of the high-resolution characteristic map.
In the implementation process, the Swin transducer and the R-FPN are used for extracting the cells in the target table image, so that the advantages of local and global features can be considered in the aspect of feature extraction based on the Swin transducer, and the R-FPN can directly fuse the original feature images which are not combined and up-sampled with the feature images which are combined and up-sampled, the feature representation capability is enhanced, and the accuracy of cell position information extraction is improved. In addition, the text is identified by utilizing the pad OCR, and the text can be accurately positioned to the position of the text based on the pad OCR, so that the corresponding text can be identified. The method has the advantages that the cells and the texts in the target table can be identified in whole and in part through the cooperation of the Swin transducer and the Paddle OCR, so that the cells and the texts in the target table can be extracted, and the extraction accuracy of the cells and the texts is improved.
In one embodiment, the cell attribute includes location information of a cell, and after the identifying the cell and the text in the target form image, the method further includes: numbering the unit cells according to a preset numbering rule; and storing the numbered cells, the cell attributes and the text.
In the implementation process, the cells are numbered and stored, so that the further operation of the cells is facilitated, and the convenience of converting the target form into the cells is improved.
In one embodiment, the storing the numbered cells, the cell attributes, and the text includes: according to the cell attribute, arranging the numbered cells according to the format of the target table; and storing the text in the corresponding cell so as to enable the text and the cell to be stored according to the format of the target table.
In the implementation process, when the cells and the texts are stored, the cells can be stored according to the cell attributes of the cells and the formats of the target tables, and then the texts are stored in the corresponding cells, so that the formats of the target tables can be completely restored, the accuracy of electronic table conversion is improved, and the restoration degree of the electronic table is increased.
In one embodiment, the predicting membership of adjacent the valued cell and the blank cell through the graph rolling network and the cell includes: constructing an adjacency matrix according to the cell attributes; and aggregating the characteristics of adjacent nodes in the adjacent matrix, and judging membership of the adjacent nodes after characteristic aggregation to determine membership of the valued cells and the blank cells corresponding to the adjacent nodes.
In the implementation process, membership between the cells is judged through the graph rolling network so as to determine the valued cells and the blank cells with membership. So that corresponding blank cells can be accurately filled when the blank Bai Shanyuan cells are filled in later, and the accuracy of filling the form is improved.
In one embodiment, the generating the corresponding text in the blank cell based on the text to be filled in to complete filling in the electronic form includes: determining text bone dry points according to the text to be filled and the text track file; determining a writing device operation instruction according to the text bone dry points and the three-dimensional list; and controlling the writing device to write the corresponding text to be filled in the blank cell through the writing device operation instruction so as to generate the corresponding text in the blank cell.
In the implementation process, the text skeleton points are generated according to the text track file and the text to be filled, so that corresponding operation instructions are generated according to the text skeleton points, and the writing device is controlled to perform writing operation according to the corresponding operation instructions, so that the text to be filled is written into the corresponding blank cells. By means of the automatic operation of the writing device, a bridge between the target form and the formatted database can be built, filling of large-scale data is achieved, and the diversity of filling content is increased.
In one embodiment, the matching the text in the valued cell with the text in the database to obtain the text to be filled in of the blank cell having a membership relationship with the valued cell includes: carrying out synonym matching on the text in the valued cell and the text in the database; and determining the text to be filled in of the blank unit with the membership with the valued unit according to the matching result.
In the implementation process, the texts in the valued cells and the table heads in the database are subjected to synonym matching, so that when the texts in the valued cells and the texts in the table heads in the database are not matched, synonym matching can be further performed, and the matching flexibility is improved. The corresponding database form does not need to be manufactured for each target form, and the workload of database form establishment is reduced.
In a second aspect, an embodiment of the present application further provides a form filling apparatus, including: the identification module is used for identifying cells and texts in the target form image so as to convert the target form into a spreadsheet; the prediction module is used for predicting the membership of the adjacent valued cells and the blank cells through the graph rolling network and the cells; the matching module is used for matching the text in the valued cell with the text in the database to obtain the text to be filled in of the blank cell with the membership with the valued cell; the filling module is used for generating a Chinese character track in the blank cell based on the text to be filled so as to finish filling of the electronic form; the valued cells are cells filled with contents in the target form, and the blank cells are cells not filled with contents in the target form.
In a third aspect, embodiments of the present application further provide an electronic device, including: a processor, a memory storing machine-readable instructions executable by the processor, which when executed by the processor, perform the steps of the method of the first aspect, or any of the possible implementations of the first aspect.
In a fourth aspect, embodiments of the present application further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the table filling method of the first aspect, or any of the possible implementations of the first aspect.
In order to make the above objects, features and advantages of the present application more comprehensible, embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a form filling method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a target table according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an FPN according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an R-FPN structure according to an embodiment of the present disclosure;
fig. 5 is a schematic diagram of cell extraction in a target table according to an embodiment of the present application;
FIG. 6 is a diagram of numbered cells, cell attributes, and text storage provided in an embodiment of the present application;
fig. 7 is a schematic diagram of a specific flow of GCN operation provided in an embodiment of the present application;
fig. 8 is a schematic diagram of a storage form of a text bone dry point according to an embodiment of the present application;
FIG. 9 is a schematic diagram of an example of an "mouth" word generation operation instruction provided in an embodiment of the present application;
fig. 10 is a schematic diagram of a functional module of a form filling device according to an embodiment of the present application;
fig. 11 is a block schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
The current handwriting robot application mostly carries out tracing according to the existing documents, cannot automatically process some practical problems, and in the form filling application, mostly needs to give a form file, manually completes an electronic form, and cannot directly correlate data in a database with paper files.
In view of this, the inventor of the present application proposes a form filling method, which can determine a text to be filled in based on a location attribute and a membership of a cell after converting a target form into an electronic form by combining a target detection technology and a graph convolution network. And generating an operation instruction according to the text and the corresponding text track bone points, realizing autonomous operation of the writing device, building a bridge between the paper file and the formatted database, and realizing automatic filling of large-scale data.
Referring to fig. 1, a flowchart of a form filling method according to an embodiment of the present application is shown. The specific flow shown in fig. 1 will be described in detail.
Step 201, identifying cells and text in the target form image to convert the target form into a spreadsheet.
The target form herein may be a paper form, a screenshot spreadsheet, or the like. The target form image can be obtained through a camera, a mobile phone, a tablet personal computer, a screenshot widget and other devices.
The cells include a valued cell and a blank cell (as shown in fig. 2), the valued cell is a cell in which contents are filled in the target form (a cell in which texts are filled in fig. 2), and the blank cell is a cell in which contents are not filled in the target form (a cell in which texts are not filled in fig. 2). The text may be the title of the form, the content filled in the valued cell, etc.
The text may include Chinese characters, arabic numerals, english, etc., and the text type of the text may be adjusted according to actual situations, which is not particularly limited in the present application.
In some embodiments, prior to step 201, the method further comprises: a Gaussian blur process and a dilation operation are performed on the target form image to enhance the degree of discrimination between cells and text. And cutting the theme outline of the form by using an outline recognition technology of opencv and performing perspective transformation to obtain a regular image of the residual target form.
Step 202, predicting membership of adjacent valued cells and blank cells through a graph convolution network and cells.
It will be appreciated that the table shown in FIG. 2 typically has value cells and blank cells-cell pairs present. When the form is filled, the questions of the valued cells and the corresponding blank cells are combined to fill the answers corresponding to the questions of the valued cells in the corresponding blank cells.
However, since the target form image is subjected to image recognition, each of the valued cells and the blank cells may be randomly stored for various reasons (for example, random storage at the time of storage, the recognition order of the cells, the order in the form being inconsistent, and the like). When the form is filled, the valued cell corresponding to each blank cell needs to be known, so that an answer corresponding to the question of the valued cell is filled in the blank cell, and the contents of the blank cell and the valued cell can be matched. By predicting the membership between the adjacent valued cell and the blank cell, whether the adjacent valued cell and the blank cell are corresponding or not can be determined, and when the valued cell and the blank cell are in a corresponding relationship, the blank cell with the membership with the valued cell can be processed according to the content in the valued cell.
It will be appreciated that predicting membership of adjacent valued cells and blank cells is but one embodiment. Membership of non-adjacent valued cells and blank cells in the cell can also be randomly predicted. The prediction order may also be set, for example, the membership of the adjacent valued cells and blank cells is predicted first, and if the membership of the adjacent valued cells and blank cells does not exist, the membership of the non-adjacent valued cells and blank cells is further predicted, so that all the valued cells and blank cells that can be matched are matched. The predicted membership of the valued cells and the blank cells in the cells can be adjusted according to actual conditions, and the method is not particularly limited.
The prediction of the membership of the valued cell and the blank cell can be performed in a mathematical mode such as a coordinate value and the like, and also can be performed through a neural network, and the prediction of the membership of the valued cell and the blank cell can be adjusted according to actual conditions, so that the method and the device are not particularly limited.
And 203, matching the text in the valued cell with the text in the database to obtain the text to be filled in of the blank cell with the membership with the valued cell.
The database stores all the contents that the blank cells of the target form need to be filled in. All contents to be filled in of the blank cells in the database are stored according to classifications. For example, information such as name, age, sex, etc. is stored in the database, and then the name, age, sex, etc. may be stored separately by category, that is, the name storing portion may store: zhang three, lifour, wang five, etc., the age storage section may store: 20. 18, 25, etc., the gender storage portion may store: male, female, etc.
It can be understood that the contents stored in the database may be stored according to a table, may be stored according to a data packet, may be randomly stored, and an association relationship may be established between the randomly stored contents.
For example, if the contents stored in the database are stored according to a table, all contents that need to be filled out for a blank cell stored in the database may be stored as shown in table 1 below.
Name of name Age of Sex (sex)
Zhang San 20 Female
Li Si 18 Man's body
Wang Wu 25 Man's body
If the content stored in the database is stored in terms of data packets, the database may include a plurality of data packets, where each data packet corresponds to a set of data, for example: the first data packet includes: zhang III, 20, girl; the second data packet includes: plums IV, 18, men; the third data packet includes: wang wu, 25, man, etc. Optionally, the name of the data packet may be named as the content in the data packet, the data packet one may be named as Zhang three, the data packet two may be named as Lifour, the data packet three may be named as Wang five, etc.
Before filling in the blank cell, matching is performed according to the text in the valued cell having the membership with the blank cell and the text in the database, so as to determine the text to be filled corresponding to the blank cell.
In some embodiments, the text in the valued cell and the text in the database may be matched, as may the feature text in the database. The feature text refers to text in one data set that can be used to distinguish from text in other data sets. For example, the name in the example above.
For example, when the contents stored in the database are stored according to a table, the text in the value cell and the header may be matched to determine the text to be filled corresponding to the blank cell having the membership with the value cell.
When the content stored in the database is stored according to the data packet, the text in the value cell and the data packet header can be matched to determine the text to be filled corresponding to the blank cell with the membership of the value cell.
It can be understood that the storage manner of the content stored in the database and the storage manner of the valued cell text and the text in the database of the present application can be adjusted according to the actual situation, and the present application is not particularly limited.
Step 204, generating corresponding text in the blank cell based on the text to be filled in, so as to complete filling of the electronic form.
It can be appreciated that after determining the text to be filled in the blank cell, the corresponding text in the database can be directly copied into the blank cell to complete filling of the blank cell. And generating a text track according to the text to be filled, and generating a corresponding text in the blank unit through the text track so as to finish filling of the blank unit.
In the implementation process, the cells and the text in the target form image are determined through image recognition, so that the target form in the image is converted into the fillable electronic form. And establishing an association relationship between the valued cells and the blank cells through membership relationships between the valued cells and the blank cells so as to fill the blank cells corresponding to filling contents corresponding to the valued cells conveniently, thereby preventing the problem of filling errors. In addition, the text in the valued cell and the text in the database are matched to obtain the text to be filled of the blank cell with the membership with the valued cell, so that the automatic filling of the electronic form is realized. The target form is converted into the electronic form through the target detection technology and the graph rolling network, so that automatic filling of the form is realized, and the workload of staff is reduced.
In one possible implementation, step 201 includes: extracting cells in the target table image through the Swin transducer and the R-FPN, and acquiring cell attributes corresponding to the cells; text in the target form image is recognized by pad OCR.
Wherein, R-FPN is obtained by adding residual structure in ResNet network on FPN structure, which is used for increasing specific gravity of high resolution characteristic diagram.
As shown in fig. 3, fig. 3 is an original FPN structure in which a low resolution feature map exists. Because the table has few characteristic elements and small variability, the representation capability of the characteristics can be further improved by adding a structure similar to residual connection on the high-resolution image of the original FPN structure. Further, an R-FPN structure as shown in FIG. 4 was obtained. As shown in fig. 4, the low resolution feature map P6 is removed from the R-FPN structure.
The R-FPN has the characteristic of multi-scale feature fusion, and the original feature map which is not combined with up-sampling in the FPN and the feature map which is combined with up-sampling are directly fused, so that the feature representation capacity is enhanced, the precision of a target detection task is further improved, and the positions of all cells in a target table are determined.
The Swin transducer is a hierarchical structure, and visual features of different layers can be extracted, so that the Swin transducer is more suitable for tasks such as segmentation detection and the like.
The cell attributes include coordinates of vertices of cells in the cells, text, cell size, and the like.
It will be appreciated that the extraction of the cells may be achieved by:
IoU (Intersection of Union, chinese name: cross ratio, refer to the degree of similarity between a cell and a detection frame of a frame selected cell) between cells is used as an evaluation index in cell detection, and the operation of IoU is as follows:
Wherein IoU is the degree of similarity between the unit cell and the detection frame, area of overlay is the overlapping Area between the unit cell and the detection frame, and Area of Union is the Union Area between the unit cell and the detection frame.
After the IoU value is determined, the IoU and IoU thresholds are compared to determine the size relationship between the cell and the detection frame. When the cell is larger or smaller than the detection frame, two opposite vertices of the merged cell need to be determined, and the cell needs to be re-framed to extract the accurate cell, as shown in fig. 5.
The similarity degree calculation between the cells can better avoid inconsistent cell sizes in the form and interference generated by noise factors, and optimize cell detection results.
The above-described pad OCR is a framework structure capable of automatically recognizing text in an image. Recognition of text by the pad OCR can be categorized into text detection and text recognition. The task of the text detection is to locate text regions in the image, and the task of the text recognition is to identify text content in the image.
In the implementation process, the Swin transducer and the R-FPN are used for extracting the cells in the target table image, so that the advantages of local and global features can be considered in the aspect of feature extraction based on the Swin transducer, and the R-FPN can directly fuse the original feature images which are not combined and up-sampled with the feature images which are combined and up-sampled, the feature representation capability is enhanced, and the accuracy of cell position information extraction is improved. In addition, the text is identified by utilizing the pad OCR, and the text can be accurately positioned to the position of the text based on the pad OCR, so that the corresponding text can be identified. The method has the advantages that the cells and the texts in the target table can be identified in whole and in part through the cooperation of the Swin transducer and the Paddle OCR, so that the cells and the texts in the target table can be extracted, and the extraction accuracy of the cells and the texts is improved.
In one possible implementation, after step 201, the method further includes: numbering the cells according to a preset numbering rule; and storing the numbered cells, the cell attributes and the text.
The preset numbering rules can be in the sequence from top to bottom and from left to right, in the sequence from left to right and from top to bottom, in the sequence from bottom to top and from right to left, and the like, and the preset numbering sequence can be adjusted according to actual conditions, so that the method is not particularly limited.
As shown in fig. 6, the numbered cells, cell attributes, and text may be stored as shown in fig. 6. Of course, the data may be stored in other formats, or may be stored in the form of a data packet. The numbered cells, cell attributes and text storage modes can be adjusted according to actual conditions, and the method is not particularly limited.
In the implementation process, the cells are numbered and stored, so that the further operation of the cells is facilitated, and the convenience of converting the target form into the cells is improved.
In one possible implementation, storing the numbered cells, cell attributes, and text includes: according to the cell attribute, arranging the numbered cells according to the format of the target table; and storing the text in the corresponding cells so that the text and the cells are stored according to the format of the target table.
It will be appreciated that when the R-FPN and Swin transducer extract the position information of the cell, the coordinate values of each vertex of the cell are mainly extracted. When the cells are stored, the cells are stored according to the coordinate sequence, so that the cells are ordered according to the coordinate values of the cells, and then the target table is restored.
After the text is identified by the pad OCR, the text and the corresponding cell position information can be matched. Thus, the text is fused with the cell attributes of the corresponding cells. And then the text establishes an association relationship with the corresponding cell, or the text is directly used as one of the cell attributes of the corresponding cell.
After the cells are arranged according to the format of the target table, the text is stored in the corresponding cells, and then the text and the cells are stored according to the format of the target table.
In some embodiments, the electronic forms after the conversion of the target form do not need to be completely restored according to the format of the target form, and may not be arranged according to the format of the target form when the cell storage is performed. The storage mode of the cell can be adjusted according to actual conditions, and the application is not particularly limited.
In the implementation process, when the cells and the texts are stored, the cells can be stored according to the cell attributes of the cells and the formats of the target tables, and then the texts are stored in the corresponding cells, so that the formats of the target tables can be completely restored, the accuracy of electronic table conversion is improved, and the restoration degree of the electronic table is increased.
In one possible implementation, step 202 includes: constructing an adjacency matrix according to the cell attributes; and aggregating the characteristics of adjacent nodes in the adjacent matrix, and judging the membership of the adjacent nodes after characteristic aggregation to determine the membership of the valued cells and the blank cells corresponding to the adjacent nodes.
As shown in fig. 7, fig. 7 is a specific flow of GCN operation. As can be seen from fig. 7, the image in the input graph rolling network, after passing through several layers of GCN, each node has aggregated its neighboring node features, and then changes from X to Z. The connection relationship of each node is still unchanged, that is, the node adjacency matrix in the GCN is shared.
Further, the extraction of membership of valued cells and empty cells in the target table image can be achieved by the following steps:
Sequentially numbering all cells in the target table to form an input vertex set S= { S of the graph convolution network 1 ,s 2 ,...,s N And N represents the number of nodes in the graph. All "title-content lattice", randomly connected "content-content lattice" and "title-title lattice" are connected to construct the adjacency matrix a.
And drawing a corresponding table diagram according to the position distribution of the adjacent matrix A and the cells in the target table, and taking the corresponding table diagram as input data of the diagram convolutional network.
Adding information of the node itself to the original adjacency matrix:
wherein I represents the self-loop information,to add the adjacency matrix after the self-loop, a is the original adjacency matrix.
And (3) operating on the graph roll lamination, realizing feature aggregation, judging the membership between the nodes according to the aggregated features, wherein 1 is used for indicating that the membership exists between two unit cells, and 0 is used for indicating that the membership does not exist. The specific process of feature aggregation of GCN can be expressed as:
wherein H is l-1 Representing the output of the previous convolutional layer, H l Representing the output of the current convolutional layer, the input characteristics of the node,to add the adjacency matrix after the self-loop.
The node characteristics of each layer are expressed as a weighted sum of the characteristics of its neighboring nodes and its own characteristics, namely:
wherein,representation pair->Normalized, σ represents the sigmoid activation function, W l Representing a trainable weight matrix, H l Representing the output of the current convolutional layer.
In some embodiments, after the features of the adjacent nodes are aggregated, the edge formed by the nodes with membership may be weighted 1, and the edge formed by the nodes without membership may be weighted 0, so as to differentially mark the membership of each node.
In the implementation process, membership between the cells is judged through the graph rolling network so as to determine the valued cells and the blank cells with membership. So that corresponding blank cells can be accurately filled when the blank Bai Shanyuan cells are filled in later, and the accuracy of filling the form is improved.
In one possible implementation, step 204 includes: determining text bone dry points according to the text to be filled and the text track file; determining a writing device operation instruction according to the text bone dry points and the three-dimensional list; and controlling the writing device to write corresponding texts to be filled in the blank cells through the writing device operation instruction so as to generate corresponding texts in the blank cells.
The writing means here may be a mechanical arm, a light pen, a mouse, etc.
The text track file is a file stored in a database and comprises texts and bone dry points corresponding to the texts. After determining the text to be filled, matching the text to be filled with the text in the text track file, and determining a backbone point corresponding to the text.
Further, after the backbone points corresponding to the text are determined, the three-dimensional list of the backbone points is converted into an operation instruction of the controllable action writing device. The operation instructions are sent to the corresponding writing device through the communication interface, so that the writing device can write texts according to the received operation instructions.
Optionally, when the operation instruction is sent to the corresponding writing device through the communication interface, the operation instruction may be sent row by row, or may be sent once, or may be sent according to a bone dry point, etc., where the manner of sending the operation instruction to the corresponding writing device may be adjusted according to the actual situation, and the application is not specifically limited.
For a better understanding of the present embodiment, the following takes the Chinese character "mouth" as an example, and the specific implementation procedure of step 204 of the present application is further described by this example:
as shown in FIG. 8, the text skeleton points are stored in the json file in the form of three-dimensional list, each stroke corresponds to one two-dimensional list in the three-dimensional list, the "mouth" word has three strokes in total, each stroke is composed of innumerable skeleton points, and each two-dimensional list is composed of a one-dimensional list of a stack of coordinates of stored points. When writing, each stroke needs to be read in sequence, the first coordinate point in each stroke is dropped, the following points are connected in sequence, the writing of one stroke can be completed, then the pen is lifted, the next stroke is moved to the starting point of the next stroke, and the operation is repeated.
If the font size needs to be regulated and continuous writing is required, the coordinate transformation is only required to be carried out according to the preset font size and the appearance position of the text. According to the thought, the pen lifting, pen falling and movement are realized through operation instructions. Fig. 9 gives an example of an "mouth" word generation operation instruction in which the 1 st to 4 th actions initialize the feed amount, the size, and the like, the 6 th action is pen down, and the 8 to 11 th actions are first pen drawing. Every time a stroke is completed, the system pauses for 0.2 seconds, the lower command is the same as the upper command, and the last line returns to the initial position, so that the writing of the 'mouth' word is completed.
In the implementation process, the text skeleton points are generated according to the text track file and the text to be filled, so that corresponding operation instructions are generated according to the text skeleton points, and the writing device is controlled to perform writing operation according to the corresponding operation instructions, so that the text to be filled is written into the corresponding blank cells. By means of the automatic operation of the writing device, a bridge between the target form and the formatted database can be built, filling of large-scale data is achieved, and the diversity of filling content is increased.
In one possible implementation, step 203 includes: carrying out synonym matching on the text in the valued cell and the table head in the database; and determining the text to be filled in of the blank unit with the membership with the valued unit according to the matching result.
Due to the diversity of the target table, the vocabulary with similar meaning as the database in the target table is expressed in the target file and the database differently. For example, "incumbent role" in the target table is "current role" in the database, "home role" in the target table is "residence" in the database, and so on. Although the text forms in the target form and the database are not identical, the expressions are meant to be consistent. Obviously, it is not necessary to make a corresponding database form for each form, and in this case, the matching of the text needs to have stronger adaptation capability, and texts with similar meanings can be matched through synonym detection.
For example, if the text in the valued cell is "birth time", but the database does not have text named "birth time", then a paraphrase match may be performed to obtain that the "birth year and month" in the database is very close to "birth time", and then the value in "birth year and month" is used as the value of "birth year and month" in the blank cell having a membership to the valued cell to fill out the blank cell.
In the implementation process, the texts in the valued cells and the table heads in the database are subjected to synonym matching, so that when the texts in the valued cells and the texts in the table heads in the database are not matched, synonym matching can be further performed, and the matching flexibility is improved. The corresponding database form does not need to be manufactured for each target form, and the workload of database form establishment is reduced.
Based on the same application conception, the embodiment of the present application further provides a form filling device corresponding to the form filling method, and since the principle of the device in the embodiment of the present application for solving the problem is similar to that of the foregoing form filling method embodiment, the implementation of the device in the embodiment of the present application may refer to the description in the foregoing method embodiment, and the repetition is omitted.
Fig. 10 is a schematic functional block diagram of a form filling device according to an embodiment of the present application. The respective modules in the form filling apparatus in the present embodiment are for performing the respective steps in the above-described method embodiment. The form filling device comprises an identification module 301, a prediction module 302, a matching module 303 and a filling module 304; wherein,
the recognition module 301 is configured to recognize cells and text in the target form image to convert the target form into a spreadsheet.
The prediction module 302 is configured to predict membership of the valued cell and the blank cell adjacent to each other through a graph rolling network and the cells.
The matching module 303 is configured to match a text in a valued cell with a header in a database, so as to obtain a text to be filled in of the blank cell having a membership with the valued cell.
The filling module 304 is configured to generate a kanji track in the blank cell based on the text to be filled in, so as to complete filling of the electronic form; the valued cells are cells filled with contents in the target form, and the blank cells are cells not filled with contents in the target form.
In a possible implementation, the identification module 301 is further configured to: extracting cells in the target table image through a Swin transducer and an R-FPN, and acquiring cell attributes corresponding to the cells; recognizing text in the target form image through pad OCR; wherein the R-FPN is obtained by adding a residual structure in a ResNet network on the FPN structure, and the R-FPN is used for increasing the specific gravity of the high-resolution characteristic map.
In a possible embodiment, the form filling device further comprises a storage module for: numbering the unit cells according to a preset numbering rule; and storing the numbered cells, the cell attributes and the text.
In a possible implementation manner, the storage module is specifically configured to: according to the cell attribute, arranging the numbered cells according to the format of the target table; and storing the text in the corresponding cell so as to enable the text and the cell to be stored according to the format of the target table.
In a possible implementation, the prediction module 302 is further configured to: constructing an adjacency matrix according to the cell attributes; and aggregating the characteristics of adjacent nodes in the adjacent matrix, and judging membership of the adjacent nodes after characteristic aggregation to determine membership of the valued cells and the blank cells corresponding to the adjacent nodes.
In a possible implementation, the filling module 304 is further configured to: determining text bone dry points according to the text to be filled and the text track file; determining a writing device operation instruction according to the text bone dry points and the three-dimensional list; and controlling the writing device to write the corresponding text to be filled in the blank cell through the writing device operation instruction so as to generate the corresponding text in the blank cell.
In a possible implementation, the matching module 303 is further configured to: carrying out synonym matching on the text in the valued cell and the table head in the database; and determining the text to be filled in of the blank unit with the membership with the valued unit according to the matching result.
In order to facilitate understanding of the present embodiment, an electronic device that performs a form filling method disclosed in the embodiments of the present application will be described in detail below.
As shown in fig. 11, a block schematic diagram of an electronic device is provided. The electronic device 100 may include a memory 111, a processor 113. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 11 is merely illustrative and is not intended to limit the configuration of the electronic device 100. For example, the electronic device 100 may also include more or fewer components than shown in fig. 11, or have a different configuration than shown in fig. 11.
The memory 111 and the processor 113 are electrically connected directly or indirectly to each other to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The processor 113 is used to execute executable modules stored in the memory.
The Memory 111 may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. The memory 111 is configured to store a program, and the processor 113 executes the program after receiving an execution instruction, and a method executed by the electronic device 100 defined by the process disclosed in any embodiment of the present application may be applied to the processor 113 or implemented by the processor 113.
The processor 113 may be an integrated circuit chip having signal processing capabilities. The processor 113 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (digital signal processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field Programmable Gate Arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The electronic device 100 in the present embodiment may be used to perform each step in each method provided in the embodiments of the present application.
Furthermore, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor performs the steps of the form filling method described in the above method embodiments.
The computer program product of the form filling method provided in the embodiments of the present application includes a computer readable storage medium storing program codes, where the program codes include instructions for executing the steps of the form filling method described in the method embodiments, and the specific reference may be made to the method embodiments, which are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. A form filling method, comprising:
identifying cells and text in a target form image to convert the target form into a spreadsheet;
predicting membership of adjacent valued cells and blank cells through a graph rolling network and the cells;
Matching the text in the valued cell with the text in the database to obtain a text to be filled in of the blank cell with a membership relationship with the valued cell;
generating corresponding texts in the blank cells based on the texts to be filled in so as to finish filling of the electronic form;
the valued cells are cells filled with contents in the target form, and the blank cells are cells not filled with contents in the target form;
the identifying the cells and the text in the target form image comprises the following steps:
extracting cells in the target table image through a Swin transducer and an R-FPN, and acquiring cell attributes corresponding to the cells;
recognizing text in the target form image through pad OCR;
wherein the R-FPN is obtained by adding a residual structure in a ResNet network on the FPN structure, and the R-FPN is used for increasing the specific gravity of the high-resolution characteristic map.
2. The method of claim 1, wherein the cell attributes include location information of cells, and wherein after identifying cells and text in the target form image, the method further comprises:
Numbering the unit cells according to a preset numbering rule;
and storing the numbered cells, the cell attributes and the text.
3. The method of claim 2, wherein storing the numbered cells, the cell attributes, and the text comprises:
according to the cell attribute, arranging the numbered cells according to the format of the target table;
and storing the text in the corresponding cell so as to enable the text and the cell to be stored according to the format of the target table.
4. The method of claim 1, wherein predicting membership of adjacent the valued cell and the blank cell through a graph rolling network and the cell comprises:
constructing an adjacency matrix according to the cell attributes;
and aggregating the characteristics of adjacent nodes in the adjacent matrix, and judging membership of the adjacent nodes after characteristic aggregation to determine membership of the valued cells and the blank cells corresponding to the adjacent nodes.
5. The method of any of claims 1-4, wherein the generating corresponding text within the blank cells based on the text to be filled in to complete filling of the electronic form comprises:
Determining text bone dry points according to the text to be filled and the text track file;
determining a writing device operation instruction according to the text bone dry points and the three-dimensional list;
and controlling the writing device to write the corresponding text to be filled in the blank cell through the writing device operation instruction so as to generate the corresponding text in the blank cell.
6. The method according to any one of claims 1-4, wherein said matching the text in the valued cell with the text in the database to obtain the text to be filled in for the blank cell having a membership to the valued cell comprises:
carrying out synonym matching on the text in the valued cell and the text in the database;
and determining the text to be filled in of the blank unit with the membership with the valued unit according to the matching result.
7. A form filling apparatus, comprising:
the identification module is used for identifying cells and texts in the target form image so as to convert the target form into a spreadsheet;
the prediction module is used for predicting the membership of the adjacent valued cells and the blank cells through the graph rolling network and the cells;
The matching module is used for matching the text in the valued cell with the text in the database to obtain the text to be filled in of the blank cell with the membership with the valued cell;
the filling module is used for generating a Chinese character track in the blank cell based on the text to be filled so as to finish filling of the electronic form;
the valued cells are cells filled with contents in the target form, and the blank cells are cells not filled with contents in the target form;
the identification module is further used for extracting cells in the target table image through a Swin Transformer and an R-FPN, and acquiring cell attributes corresponding to the cells; recognizing text in the target form image through pad OCR; wherein the R-FPN is obtained by adding a residual structure in a ResNet network on the FPN structure, and the R-FPN is used for increasing the specific gravity of the high-resolution characteristic map.
8. An electronic device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, which when executed by the processor perform the steps of the method of any of claims 1 to 6 when the electronic device is run.
9. A computer-readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, performs the steps of the method according to any of claims 1 to 6.
CN202310155415.5A 2023-02-21 2023-02-21 Form filling method, device, electronic equipment and storage medium Active CN116151202B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310155415.5A CN116151202B (en) 2023-02-21 2023-02-21 Form filling method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310155415.5A CN116151202B (en) 2023-02-21 2023-02-21 Form filling method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116151202A CN116151202A (en) 2023-05-23
CN116151202B true CN116151202B (en) 2024-04-02

Family

ID=86354093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310155415.5A Active CN116151202B (en) 2023-02-21 2023-02-21 Form filling method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116151202B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859876A (en) * 2019-04-21 2020-10-30 桂林电子科技大学 Automatic form entering method and system
CN114973282A (en) * 2022-05-09 2022-08-30 深圳市商汤科技有限公司 Table identification method and device, electronic equipment and storage medium
KR20220133434A (en) * 2021-03-25 2022-10-05 네이버 주식회사 Method and system for recognizing tables
CN115331245A (en) * 2022-10-12 2022-11-11 中南民族大学 Table structure identification method based on image instance segmentation
CN115546813A (en) * 2022-10-09 2022-12-30 科大讯飞股份有限公司 Document analysis method and device, storage medium and equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657274B (en) * 2021-08-17 2022-09-20 北京百度网讯科技有限公司 Table generation method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859876A (en) * 2019-04-21 2020-10-30 桂林电子科技大学 Automatic form entering method and system
KR20220133434A (en) * 2021-03-25 2022-10-05 네이버 주식회사 Method and system for recognizing tables
CN114973282A (en) * 2022-05-09 2022-08-30 深圳市商汤科技有限公司 Table identification method and device, electronic equipment and storage medium
CN115546813A (en) * 2022-10-09 2022-12-30 科大讯飞股份有限公司 Document analysis method and device, storage medium and equipment
CN115331245A (en) * 2022-10-12 2022-11-11 中南民族大学 Table structure identification method based on image instance segmentation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于图卷积网络的表格隶属关系抽取;张宇童 等;《北京航空航天大学学报》;第1-10页 *
表格检测与结构识别综述;张宇童 等;《计算机工程与应用》;第58卷(第22期);第1-10页 *

Also Published As

Publication number Publication date
CN116151202A (en) 2023-05-23

Similar Documents

Publication Publication Date Title
Altwaijry et al. Arabic handwriting recognition system using convolutional neural network
RU2699687C1 (en) Detecting text fields using neural networks
CN112949415B (en) Image processing method, apparatus, device and medium
WO2019238063A1 (en) Text detection and analysis method and apparatus, and device
US9910842B2 (en) Interactively predicting fields in a form
US20190294921A1 (en) Field identification in an image using artificial intelligence
CN111488826A (en) Text recognition method and device, electronic equipment and storage medium
CN110874618B (en) OCR template learning method and device based on small sample, electronic equipment and medium
CN111615702A (en) Method, device and equipment for extracting structured data from image
CN113742483A (en) Document classification method and device, electronic equipment and storage medium
CN112949476B (en) Text relation detection method, device and storage medium based on graph convolution neural network
US11232299B2 (en) Identification of blocks of associated words in documents with complex structures
US11972625B2 (en) Character-based representation learning for table data extraction using artificial intelligence techniques
CN112308946A (en) Topic generation method and device, electronic equipment and readable storage medium
US20150139547A1 (en) Feature calculation device and method and computer program product
US11881044B2 (en) Method and apparatus for processing image, device and storage medium
CN113255767A (en) Bill classification method, device, equipment and storage medium
CN116151202B (en) Form filling method, device, electronic equipment and storage medium
CN114842482B (en) Image classification method, device, equipment and storage medium
CN115880702A (en) Data processing method, device, equipment, program product and storage medium
CN114120305A (en) Training method of text classification model, and recognition method and device of text content
CN113128496B (en) Method, device and equipment for extracting structured data from image
CN116259050B (en) Method, device, equipment and detection method for positioning and identifying label characters of filling barrel
CN114780773B (en) Document picture classification method and device, storage medium and electronic equipment
CN114821603B (en) Bill identification method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant