CN109344831B - Data table identification method and device and terminal equipment - Google Patents

Data table identification method and device and terminal equipment Download PDF

Info

Publication number
CN109344831B
CN109344831B CN201810963099.3A CN201810963099A CN109344831B CN 109344831 B CN109344831 B CN 109344831B CN 201810963099 A CN201810963099 A CN 201810963099A CN 109344831 B CN109344831 B CN 109344831B
Authority
CN
China
Prior art keywords
data table
data
character
image
field value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810963099.3A
Other languages
Chinese (zh)
Other versions
CN109344831A (en
Inventor
李亚宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN201810963099.3A priority Critical patent/CN109344831B/en
Publication of CN109344831A publication Critical patent/CN109344831A/en
Application granted granted Critical
Publication of CN109344831B publication Critical patent/CN109344831B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Input (AREA)

Abstract

The invention provides a data table identification method, a data table identification device and terminal equipment, which are applicable to the technical field of data processing, wherein the method comprises the following steps: performing character recognition on the data table image to determine the table names of the data tables in the data table image, and selecting a table template with matched table names; dividing the data table image into at least one data table area image according to the field value unit cells; performing character recognition on the data table area image, and judging whether a character recognition result meets the corresponding character string format requirement; if so, performing character string matching on the candidate character string library corresponding to the field value cell based on the character recognition result, and filling the matched character string data into the field value cell of the form template to obtain a data table recognition result corresponding to the data table image. The embodiment of the invention can ensure the accuracy of table data identification when the data table is identified and recorded.

Description

Data table identification method and device and terminal equipment
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a data table identification method and terminal equipment.
Background
In the prior art, when the data of the paper data sheet is recorded by a computer, all parts of contents in the data sheet are manually recorded by a human, so that a great deal of time is required to be consumed, and the efficiency is quite low. After the optical character recognition (Optical Character Recognition, OCR) technology appears, people begin to use the OCR technology to recognize and input the paper data sheet data, namely, the OCR technology is used to recognize the contents of each part of the data sheet and the characters in the data sheet and store the data sheet in a computer, however, in actual situations, the analysis and recognition effects of the OCR technology on the data sheet are not very good, so that the accuracy of table data recognition when the prior art recognizes and inputs the data sheet is lower.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a data table identification method, a data table identification device and terminal equipment, so as to solve the problem of low accuracy of table data identification when the data table is identified and recorded in the prior art.
A first aspect of an embodiment of the present invention provides a data table identifying method, including:
performing character recognition on the data table image, determining the table names of the data tables in the data table image, and selecting a table template matched with the table names from a preset table template library, wherein the table template comprises the table names, filled field name cells and blank field value cells;
dividing the data table image into at least one data table area image according to a field value cell;
performing character recognition on any data table area image of the data table images, and judging whether the obtained character recognition result meets the character string format requirement corresponding to the field value cell;
if the character recognition result meets the character string format requirement, carrying out character string matching on the character recognition result and a candidate character string library corresponding to the field value cell, and filling character string data matched from the character string library into the field value cell of the form template to obtain the recognition result of the data table region image;
And obtaining a data table identification result corresponding to the data table image according to the identification result of the at least one data table area image.
A second aspect of an embodiment of the present invention provides a data table identifying apparatus, including:
the template matching module is used for carrying out character recognition on the data table image, determining the table name of the data table in the data table image, and selecting a table template matched with the table name from a preset table template library, wherein the table template comprises the table name, filled field name cells and blank field value cells;
the region image segmentation module is used for segmenting the data table image into at least one data table region image according to a field value cell;
the region image recognition module is used for carrying out character recognition on the data sheet region image aiming at any data sheet region image of the data sheet image, and judging whether the obtained character recognition result meets the character string format requirement corresponding to the field value cell;
the first data filling module is used for carrying out character string matching on the character recognition result and a candidate character string library corresponding to the field value cell if the character recognition result meets the character string format requirement, filling the character string data matched from the character string library into the field value cell of the form template, and obtaining the recognition result of the data table region image;
And the result generation module is used for obtaining a data table identification result corresponding to the data table image according to the identification result of the at least one data table area image.
A third aspect of the embodiment of the present invention provides a data table identification terminal device, where the data table identification terminal device includes a memory, and a processor, where the memory stores a computer program that can run on the processor, and the processor implements the following steps when executing the computer program.
Performing character recognition on the data table image, determining the table names of the data tables in the data table image, and selecting a table template matched with the table names from a preset table template library, wherein the table template comprises the table names, filled field name cells and blank field value cells;
dividing the data table image into at least one data table area image according to a field value cell;
performing character recognition on any data table area image of the data table images, and judging whether the obtained character recognition result meets the character string format requirement corresponding to the field value cell;
If the character recognition result meets the character string format requirement, carrying out character string matching on the character recognition result and a candidate character string library corresponding to the field value cell, and filling character string data matched from the character string library into the field value cell of the form template to obtain the recognition result of the data table region image;
and obtaining a data table identification result corresponding to the data table image according to the identification result of the at least one data table area image.
A fourth aspect of embodiments of the present invention provides a computer-readable storage medium comprising: a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the data table identification method as described above.
Compared with the prior art, the embodiment of the invention has the beneficial effects that: considering that the OCR technology in the prior art is not friendly to the data table identification support, in the embodiment of the invention, corresponding form templates are set for the data table to be identified, form frames such as form names, field name cells, field value cells and the like are set in the templates, and field names in the field name cells are filled in advance, so that the data of the field value cells only need to be filled in the embodiment of the invention, and the identification of the data table can be completed. Specifically, since there may be more than one type of data table to be identified, in the embodiment of the present invention, first, simple character recognition is performed on the data table image, and the names of the data tables therein are determined to screen out the corresponding table templates. In the practical situation, the table names of the data tables to be identified belong to a limited and known table name position is generally fixed, for example, the table names are identified relatively simply and with high accuracy, and the table templates corresponding to the data table images to be identified can be determined very accurately. After the corresponding form template is determined, the data form image is segmented to determine the area image of each cell contained in the data form image, and character recognition is carried out on the field value cells at the same time, so that a preliminary recognition result of the preliminary field value cells is obtained. After the primary recognition result of the field value cell is obtained, considering that in practical application, the contents of a plurality of cells are character strings with certain format requirements and the filled data contents are all of fixed limited types, for example, the field value cell corresponding to the household registration type can only fill Chinese character strings of town or rural area, and the birth date can only fill 8-digit character strings with fixed format, therefore, on the basis of the primary recognition result of the known field value cell, the recognition result screening is carried out by utilizing the character string format requirements corresponding to the preset field value cell, firstly, the reliability of the recognition result can be ensured, and then the candidate character string library corresponding to the field value cell is matched, as the number of the candidate character strings is limited and known, the final recognition result is determined by directly matching the candidate character string library, so that the accuracy of the data of the obtained field value cell can be greatly ensured, and the data recognition accuracy of the data table during the data table recognition recording is ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an implementation flow of a data table identification method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an implementation flow of a data table identification method according to a second embodiment of the present invention;
fig. 3 is a schematic implementation flow chart of a data table identification method according to a third embodiment of the present invention;
fig. 4 is a schematic implementation flow chart of a data table identification method according to a fourth embodiment of the present invention;
fig. 5 is a schematic diagram of an implementation flow of a data table identification method according to a fifth embodiment of the present invention;
fig. 6 is a schematic structural diagram of a data table identification device according to a sixth embodiment of the present invention;
fig. 7 is a schematic diagram of a data table identification terminal device according to a seventh embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
For the convenience of understanding the technical solution of the present application, embodiments of the present invention will be briefly described herein:
a standard data table at least comprises three parts of table names, field name cells and field value cells, wherein the field name cells are used for filling out field names of some fixed attributes of the table, the field value cells are used for filling out field value contents corresponding to the fixed attributes, and in the data table, the position relationship between the field name cells and the field value cells is fixed, as shown in the following table 1:
TABLE 1
The "staff basic information table" is the table name of table 1, the cell filled with the field names of fixed attributes such as "name", "gender", "ethnicity", "phone", "mailbox" and "age" is the field name cell, the cell filled with the field value content corresponding to the fixed attribute is the field value cell, for example, the cell filled with the "Zhang Saner" corresponding to the "name" is the cell, and the location relationship is that the field value cells are all one cell on the right side of the corresponding field name cell.
In practical application, for any data table, after the data content in the field name cell is determined, in order to ensure that the data in the field value cell meets the requirement of the corresponding field name cell, the data content in the field value cell has a certain character string format requirement, for example, the data content in the field value cell corresponding to the field name cell 'age' is a 1-3-bit digital character string, and the data content in the field value cell corresponding to the field name cell 'mobile phone number' is a 11-bit digital character string. Meanwhile, for some field value cells, the fillable data content belongs to a fixed limited, i.e. the data content has a corresponding candidate character string library, for example, the data content in the field value cell corresponding to the field name cell 'ethnicity', only one of limited ethnicity can be filled, and for example, the data in the field value cell corresponding to the field name cell 'gender', only one of 'men' and 'women' can be filled. Therefore, for these fillable data contents to belong to a fixed limited field value cell, the data contents filled by the user must also belong to a result in the corresponding candidate string library on the basis of meeting the corresponding string format requirements.
In order to improve the efficiency of entering the paper data table, in the embodiment of the invention, corresponding table templates are pre-created for the tables needing to enter the paper data table, the table names of the table templates and the field name cells in the tables are pre-filled, the character string format requirements corresponding to the field value cells are pre-set, and meanwhile, if the field value cells have corresponding candidate character string libraries, the corresponding candidate character string libraries are set together, so that the follow-up calling and use can be performed when the recognition is performed. On the basis of the created form templates, the embodiment of the invention can identify the form names of the data sheet images obtained by scanning the paper data sheet so as to determine and select the corresponding form templates. After the form template is determined, positioning the corresponding region position of the field value cell in the data sheet image, performing character string format verification on the data sheet region image corresponding to the determined positioning to ensure the validity of the obtained data, and finally matching the final recognition result from the candidate character string library corresponding to the field value cell by utilizing the recognized result, so that the accurate recognition of the data content of the field value cell with the candidate character string library can be completed, and the recognition accuracy rate in the process of recording the paper data sheet is improved.
Fig. 1 shows a flowchart of an implementation of a data table identification method according to an embodiment of the present invention, which is described in detail below:
s101, character recognition is carried out on the data table image, the table names of the data tables in the data table image are determined, and a table template with matched table names is selected from a preset table template library, wherein the table template comprises the table names, filled field name cells and blank field value cells.
In order to determine a table template corresponding to a data table image to be identified, in the embodiment of the invention, the table names contained in the data table image are first identified so as to perform matching search on the table template. In practical application, the table names are generally located above the table or at fixed positions such as the first row of the table, so that when character recognition of the table names is performed, a corresponding table template can be determined by adopting a mode of starting line-by-line character recognition from above the data table image and performing table name matching, and at the moment, the determination of the table template can be completed without completely recognizing the characters of the whole data table image, thereby improving the speed and accuracy of the table template matching.
S102, dividing the data table image into at least one data table area image according to the field value unit cells.
After the form template corresponding to the data form image is determined, filling in the field value cells in the form template is needed to complete identification of the data form. In order to fill in the field value cells, firstly, a data table area image corresponding to the field value data cells needs to be determined from the data table image so as to provide a basis for subsequent character recognition filling.
In the embodiment of the present invention, in order to determine the data table area image corresponding to the field value data grid, first, the image area of the data table image is divided by the cells, where the specific dividing method is not limited herein, and includes, but is not limited to, dividing the data table image according to the row and column borders in the data table image. When the cell image is segmented, the cell image may be segmented by taking a single cell as a unit, and each data table area image obtained at this time only corresponds to one field value cell or one field name cell, or may be segmented by taking the field name cell+the corresponding field value cell as a unit, and each data table area image corresponds to one field name cell+the corresponding field value cell.
After determining the data table area image corresponding to each cell, positioning the data table area image corresponding to each field value cell in the cells, and determining the data table area image corresponding to each field value cell one by one so as to provide data for detection and filling of the follow-up recognition result. The specific cell positioning method is not limited herein, and may be specifically set by a skilled person, including but not limited to, for example: aligning the data table images according to the whole table frame of the table template, and then sequentially determining the data table area images corresponding to each cell after alignment, so as to determine the data table area images corresponding to each field value cell one by one, or matching the field names in the filled field name cells in the table template with the result after character recognition of the data table area images, and determining the data table area images corresponding to each field name cell according to the position relationship between the field name cells and the field value cells in the table template.
S103, aiming at any data table area image of the data table images, character recognition is carried out on the data table area images, and whether the obtained character recognition result meets the character string format requirement corresponding to the field value cell is judged.
After determining the data table area image corresponding to each field value cell one by one, starting character recognition on the data table area image so as to determine specific data content corresponding to the field value cell. Considering that the data to be filled in the field value cells of the data table have a certain format requirement, for example, the field value cells corresponding to the mobile phone number can only fill 11-bit digital character strings, and the field value cells corresponding to the identity card number can only fill 17-bit digital strings plus 1-bit digital check codes, after the character recognition result of the field value cells is obtained, in order to ensure that the recognition result is accurate and effective, in the embodiment of the invention, the character string format verification corresponding to the field value cells is performed on the character recognition result to judge whether the corresponding character string format requirement is met.
And S104, if the character recognition result meets the character string format requirement, carrying out character string matching on the character recognition result and a candidate character string library corresponding to the field value cell, and filling the character string data matched in the character string library into the field value cell of the form template to obtain the recognition result of the data table region image.
S105, obtaining a data table identification result corresponding to the data table image according to the identification result of at least one data table area image.
When the character recognition result of the field value cell meets the corresponding character string format requirement, the data content of the field value cell is theoretically obtained, but in practical situations, it is difficult to ensure that the obtained character recognition result is always accurate only if the character string format requirement is met, and errors such as wrongly written characters may exist. Therefore, in order to further improve the accuracy of identifying the data content of the field value cell, in the embodiment of the invention, the character identification result is further matched according to the candidate character string library corresponding to the field value cell, and because each candidate character string in the candidate character string library is a standard result preset by a technician according to the actual condition of the field value cell, the accuracy of identifying the field value cell can be greatly improved and the accuracy of identifying and inputting the data table can be improved by matching the candidate character string library.
After the identification result of each data table area image is obtained, filling the identification results into corresponding cells in the table template, and obtaining the final identification result of the data table.
In the embodiment of the invention, considering that the OCR technology in the prior art is not friendly to the recognition support of the data table, the data table to be recognized is firstly provided with the corresponding table templates, so that the data of the field value unit cells only need to be filled in the embodiment of the invention, and the recognition of the data table can be completed. Specifically, since there may be more than one type of data table to be identified, in the embodiment of the present invention, first, simple character recognition is performed on the data table image, and the names of the data tables are determined and the corresponding table templates are screened out. In the practical situation, the table names of the data tables to be identified belong to a limited and known table name position is generally fixed, for example, the table names are identified relatively simply and with high accuracy, and the table templates corresponding to the data table images to be identified can be determined very accurately. After the corresponding form template is determined, the data form image is segmented to determine the area image of each cell contained in the data form image, and character recognition is carried out on the field value cells at the same time, so that a preliminary recognition result of the preliminary field value cells is obtained. After the primary recognition result of the field value cells is obtained, the fact that the contents of a plurality of cells have certain format requirements and the filled data contents are character strings of fixed limited types is considered in practical application, so that on the basis of the primary recognition result of the known field value cells, the recognition result screening is carried out by utilizing the character string format requirements corresponding to the preset field value cells, the reliability of the recognition result can be ensured firstly, and the candidate character string library corresponding to the field value cells is matched, so that the accuracy of the data of the obtained field value cells can be greatly ensured, and the accuracy of the table data recognition during the data table recognition and recording is ensured.
As a specific implementation manner for positioning the data table area image of the field value cell, because the whole frame of the data table is already provided in the table template, the frame contains the specific position relationship between each field value cell and the field name cell, as in the above table 1, each field value cell is located on the right side of the corresponding field name cell, and meanwhile, the specific field name character string data content of each field name cell is filled in the table template, so in order to determine the data table area image corresponding to each field value cell one by one, as shown in fig. 2, in the embodiment of the invention, the positioning is performed on the field name cell first, and then the data table area image corresponding to the field value cell is deduced according to the specific position relationship between the field value cell and the field name cell, which is described in detail as follows:
s201, character recognition is carried out on the data table area image, and the area character string corresponding to the data table area image obtained through the character recognition is matched with the field name character string contained in the field name cell in the table template, so that the data table area image corresponding to the field name cell in the table template is determined.
In order to realize the positioning of the field name cells in the first step, character recognition is performed on the data table area images obtained through division in the embodiment of the invention to obtain character recognition results corresponding to each data table area image, and then the field name character strings contained in each field name cell in the table template are matched according to the character recognition results to determine the data table area image specifically corresponding to each field name cell in the data table image, so that the positioning of the field name cells can be realized.
S202, determining the data table area image corresponding to the field value cell in the table template based on the position relation between the field name cell and the field value cell in the table template and the data table area image corresponding to the field name cell in the table template.
After the positioning of the field name cells is completed, determining the field value cells corresponding to the field name cells in the table template according to the position relation between the field value cells and the field name cells recorded in the table template, and determining the data table area image corresponding to the field value cells according to the position relation between the field value cells and the field name cells in the data table image, so that the positioning of each field value cell can be realized. For example, assume that the positional relationship between the field value cells and the field name cells in the table template is: each field value cell is located on the right side of the corresponding field name cell, the field name cell A corresponds to the field value cell a, and if the field name cell A is determined to correspond to the data table area image A, the field value cell a corresponding to the left side cell of the field name cell A is found out in the table template according to the position relation, the data table area image a corresponding to the left side cell of the data table area image A is found out according to the position relation in the data table image, and the data table area image a is determined to be the data table area image corresponding to the field value cell a, so that the positioning of the field value cell a is completed.
As shown in fig. 3, if the character recognition result meets the character string format requirement, the third embodiment of the present invention includes:
s301, if the field value unit cell does not have the corresponding candidate character string library and the number of characters of the character recognition result is larger than 1, performing text error correction processing on the character recognition result.
Considering that in actual situations, not all the field value cells have corresponding candidate string libraries, for example, in the field value cells corresponding to the field name cells "self-evaluation", the user can fill out any data content by himself, and at this time, the data content corresponding to the field value cells is not limited and known, and it is impossible to have the corresponding candidate string libraries. Compared with the standard result obtained by matching the candidate character string library, the identification result of the field value cell without the candidate character string library is greatly influenced by personal factors of the user, such as the possibility of wrong writing, missing characters and the like of the user, particularly when the number of characters of the data content to be filled is large, the errors are more likely to occur, and therefore, the accuracy of the identification result of the data content of the field value cell is difficult to be ensured.
Therefore, in order to improve the accuracy of the character recognition result of the field value cell data content, in the embodiment of the invention, text error correction of the character recognition result is performed on the field value cell which does not have the candidate character string library and has a large number of characters of the recognition result, so that the accuracy of the finally obtained character recognition result is ensured. The specific text error correction processing method is not limited herein, and may be specifically set by a skilled person, including but not limited to a decision table method, a bayesian learning method, a window learning method, etc., in order to ensure the effect of text error correction, it is preferable to select some text error correction algorithms supporting more types of error correction to perform the text error correction processing herein.
S302, filling the character recognition result after the text error correction processing into a field value cell of the form template to obtain a data sheet recognition result corresponding to the data sheet image.
After the error correction is completed, the obtained result is directly filled into the field value cell, and filling of the field value cell can be completed, so that a final filled data sheet identification result is obtained.
As shown in fig. 4, although the first to third embodiments of the present invention provide a corresponding processing method for recognizing a data table, in consideration of the actual situation, there may be a case where the data content filled by the user does not satisfy the character string format requirement corresponding to the field value cell, or the recognition result does not satisfy the character string format requirement corresponding to the field value cell due to the character recognition error, and at this time, a result is obtained that the character recognition result does not satisfy the character string format requirement, so that the first to third embodiments of the present invention cannot recognize a normal data table. In order to ensure the accuracy of the identified data content as much as possible, the fourth embodiment of the present invention specifically includes:
s401, if the character recognition result does not meet the character string format requirement, updating the total times that the character recognition result of the data table area image corresponding to the field value cell does not meet the character string format requirement.
And S402, if the total number of times is smaller than or equal to a preset error threshold value, returning to execute the operation of carrying out character recognition on the data table area image corresponding to the field value cell, and judging whether the obtained character recognition result meets the character string format requirement corresponding to the preset field value cell.
And S403, if the total times are greater than the error threshold, filling the character recognition result of the data table area image corresponding to the last field value cell into the field value cell of the form template to obtain the data table recognition result corresponding to the data table image.
The failure of the verification of the character string format can be caused by errors of character recognition or errors of data content filled by a user. For character recognition errors, the character recognition errors can be corrected through re-recognition, so that after verification fails, the embodiment of the invention can re-perform character recognition and character string format verification on the field value cells. However, if the data content filled by the user is wrong, no matter how many times the verification is re-identified, the result is necessarily verification failure, so in the embodiment of the invention, a threshold of the number of times of verification failure retries is set at the same time, namely the error threshold, when each field value cell is subjected to character recognition and character string format verification, the corresponding number of times of verification errors is recorded, for example, when the field value cell a is subjected to character recognition and character string format verification, if the verification is successful, the embodiment of the invention is used for carrying out the next processing, if the verification is failed, the total number of times of verification failure of the field value cell a is updated, and when the total number of times of verification failure corresponding to the field value cell reaches the set error threshold, the embodiment of the invention can stop character recognition and character string format verification of the field value cell, and directly fill out the last character recognition result as the final data content to the field value cell, so as to prevent excessive retries from bringing unnecessary workload to the processor, and enable the recognition time of the data table to be excessively long. The specific value of the error threshold can be set by the skilled person.
As an embodiment of the present invention, to help the user understand the specific situation of recognition on the data table, after step S403, a comment of failure in checking the character string format may be added to the field value cells with total number of failure in checking the character string format greater than the error threshold, where the comment may be directly added to the data table obtained by final recognition, or may be recorded and output in the form of other files or information.
As a specific implementation manner of performing text error correction processing on a character recognition result, as shown in fig. 5, a fifth embodiment of the present invention includes:
s501, processing a character recognition result by using a preset text error correction algorithm, and determining the character to be corrected in the character recognition result and N candidate replacement characters with the maximum replacement probability corresponding to the character to be corrected, wherein N is a natural number.
S502, character image segmentation is carried out on the data table area image, and a character image corresponding to the character to be corrected is determined.
S503, carrying out font structure analysis on the character recognition result of the character image, and determining font structure information corresponding to the character image.
S504, replacing the corresponding character to be corrected by utilizing the candidate replacement character with the largest matching degree of the font structure information in the N candidate replacement characters and the font structure information of the character image and the largest replacement probability, thereby obtaining the character recognition result after the text correction processing.
The character to be corrected is the character with errors identified in the text error correction processing, and the font structure information refers to specific font structure types of the character, such as a single character, a left-right structure, an up-down structure, a mosaic structure and the like.
In practical application, due to the influence of environmental factors and personal factors of users, such as insufficient light, blocked or partially erased words when images are acquired, unclear words of users, and the like, words in the finally obtained data sheet images are difficult to accurately identify, so that character errors are easy to occur in the results obtained by character identification.
In order to improve the phenomenon that the character recognition has errors, the third embodiment of the invention proposes to use text error correction processing to improve the accuracy of the finally obtained character recognition result, but the text error correction algorithm in the prior art, such as the decision table method, the Bayesian learning method, the Window learning method and the like, analyzes the character string according to grammar rules or semantic analysis and the like, finds out the character with errors, determines a plurality of candidate replacement characters with unequal replacement probabilities, and directly replaces the candidate replacement character with the largest replacement probability with the corresponding character with errors, so that although error correction of the character recognition result can be realized to a certain extent, a plurality of candidate words with poor replacement probability can be obtained sometimes, and at the moment, the candidate word with the largest replacement probability is directly replaced, so that the error correction accuracy is difficult to be effectively ensured.
Therefore, in order to improve the text error correction accuracy of the character recognition result and improve the accuracy of the character recognition result, the embodiment of the invention is based on the existing text error correction algorithm, and considers that in actual situations, even if characters are not clear or blocked, the character recognition of an image can acquire the characteristics of a more accurate character font structure (namely, the font structure information of the whole character is relatively easy to acquire), the character with errors is independently analyzed for the font structure, and then N candidate replacement characters with the maximum replacement probability are screened based on the obtained font structure information of the character with errors, so that the candidate replacement characters with the maximum replacement probability meet the font structure information are determined to replace the characters. In consideration of the actual situation, if the characters in the data table image are blocked and the characters are missing, the font structure of the characters may be changed, so that it may be difficult to directly determine the specific font structure type of a certain character, and only the probability of the corresponding font structure type can be determined approximately. The specific value of N can be set by the technicians according to the requirements.
Corresponding to the method of the above embodiment, fig. 6 shows a block diagram of the data table identifying apparatus provided in the embodiment of the present invention, and for convenience of explanation, only the portion relevant to the embodiment of the present invention is shown. The data table identifying apparatus illustrated in fig. 6 may be an execution subject of the data table identifying method provided in the first embodiment.
Referring to fig. 6, the data table identifying apparatus includes:
the template matching module 61 is configured to perform character recognition on a data table image, determine a table name of a data table in the data table image, and select a table template matched with the table name from a preset table template library, where the table template includes the table name, filled field name cells and blank field value cells.
The area image segmentation module 62 is configured to segment the data table image into at least one data table area image according to a field value unit.
The region image recognition module 63 is configured to perform character recognition on the data table region image according to any one of the data table region images, and determine whether the obtained character recognition result meets the character string format requirement corresponding to the field value cell.
And a first data filling module 64, configured to perform string matching on the character recognition result and a candidate string library corresponding to the field value cell if the character recognition result meets the character string format requirement, and fill the string data matched from the string library into the field value cell of the form template, so as to obtain a recognition result of the data table region image.
And the result generating module 65 is configured to obtain a data table identification result corresponding to the data table image according to the identification result of the at least one data table area image.
Further, the area image matching module 62 includes:
and the image character string matching module is used for carrying out character recognition on the data table area image, matching the area character string corresponding to the data table area image obtained by character recognition with the field name character string contained in the field name cell in the table template, and determining the data table area image corresponding to the field name cell in the table template.
The image position matching module is used for determining the data table area image corresponding to the field value cell in the table template based on the position relation between the field name cell and the field value cell in the table template and the data table area image corresponding to the field name cell in the table template.
Further, the data table identifying device further includes:
and the text error correction module is used for carrying out text error correction processing on the character recognition result if the field value unit cell does not have the corresponding candidate character string library and the character number of the character recognition result is larger than 1.
And the second data filling module is used for filling the character recognition result after the text error correction processing into the field value cells of the form template to obtain a data table recognition result corresponding to the data table image.
Further, the data table identifying device further includes:
and the error times recording module is used for updating the total times that the character recognition result of the data table area image corresponding to the field value cell does not meet the character string format requirement if the character recognition result does not meet the character string format requirement.
And the error retry module is used for returning to execute the operation of carrying out character recognition on the data table area image corresponding to the field value cell if the total times are smaller than or equal to a preset error threshold value, and judging whether the obtained character recognition result meets the preset corresponding character string format requirement of the field value cell.
And the third data filling module is used for filling the character recognition result of the data table area image corresponding to the field value cell at the last time into the field value cell of the table template to obtain the data table recognition result corresponding to the data table image if the total times are larger than the error threshold.
Further, the text error correction module includes:
and processing the character recognition result by using a preset text error correction algorithm, and determining the character to be corrected in the character recognition result and N candidate replacement characters with the maximum replacement probability corresponding to the character to be corrected, wherein N is a natural number.
And carrying out character image segmentation on the data table area image, and determining a character image corresponding to the character to be corrected.
And carrying out font structure analysis on the character recognition result of the character image to determine font structure information corresponding to the character image.
And replacing the corresponding character to be corrected by using the candidate replacement character with the largest matching degree between the font structure information in the N candidate replacement characters and the font structure information of the character image and the largest replacement probability, so as to obtain the character recognition result after the text error correction processing.
The process of implementing the respective functions of each module in the data table identifying apparatus provided in the embodiment of the present invention may refer to the description of the first embodiment shown in fig. 1, and will not be repeated here.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
It will also be understood that, although the terms "first," "second," etc. may be used herein in some embodiments of the invention to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first contact may be named a second contact, and similarly, a second contact may be named a first contact without departing from the scope of the various described embodiments. The first contact and the second contact are both contacts, but they are not the same contact.
Fig. 7 is a schematic diagram of a data table identification terminal device according to an embodiment of the present invention. As shown in fig. 7, the data table identification terminal device 7 of this embodiment includes: a processor 70, a memory 71, said memory 71 having stored therein a computer program 72 executable on said processor 70. The processor 70, when executing the computer program 72, implements the steps of the various data table identification method embodiments described above, such as steps 101 through 105 shown in fig. 1. Alternatively, the processor 70, when executing the computer program 72, performs the functions of the modules/units of the apparatus embodiments described above, such as the functions of the modules 61 to 65 shown in fig. 6.
The data table identification terminal device 7 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The data sheet identifying terminal device may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is merely an example of the data table identification terminal device 7 and does not constitute a limitation of the data table identification terminal device 7, and may include more or less components than illustrated, or may combine certain components, or different components, e.g. the data table identification terminal device may further include an input transmission device, a network access device, a bus, etc.
The processor 70 may be a central processing unit (Central Processing Unit, CPU), or may be another general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 71 may be an internal storage unit of the data table identification terminal device 7, for example a hard disk or a memory of the data table identification terminal device 7. The memory 71 may be an external storage device of the data table identification terminal device 7, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like provided in the data table identification terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the data table identification terminal device 7. The memory 71 is used for storing the computer program and other programs and data required for the data table to identify the terminal device. The memory 71 may also be used for temporarily storing data that has been transmitted or is to be transmitted.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (9)

1. A method of data table identification, comprising:
performing character recognition on the data table image, determining the table names of the data tables in the data table image, and selecting a table template matched with the table names from a preset table template library, wherein the table template comprises the table names, filled field name cells and blank field value cells;
dividing the data table image into at least one data table area image according to field value cells, wherein each data table area image corresponds to one field value cell and/or one field name cell;
Aligning the data table images according to the whole table frame of the table template, sequentially determining the data table area images corresponding to each cell after alignment, thereby determining the data table area images corresponding to each field value cell one by one, or matching the field names in the filled field name cells in the table template with the result after character recognition of the data table area images, and determining the data table area images corresponding to each field name cell according to the position relation between the field name cells and the field value cells in the table template after determining the data table area images corresponding to each field name cell;
performing character recognition on any data table area image of the data table images, and judging whether the obtained character recognition result meets the character string format requirement corresponding to the field value cell;
if the character recognition result meets the character string format requirement, carrying out character string matching on the character recognition result and a candidate character string library corresponding to the field value cell, and filling character string data matched from the character string library into the field value cell of the form template to obtain the recognition result of the data table region image;
And obtaining a data table identification result corresponding to the data table image according to the identification result of the at least one data table area image.
2. The data sheet recognition method as claimed in claim 1, further comprising, before said character recognition is performed on any one of said data sheet area images with respect to said data sheet area image:
performing character recognition on the data table area image, and matching an area character string corresponding to the data table area image obtained by character recognition with a field name character string contained in the field name cell in the table template to determine the data table area image corresponding to the field name cell in the table template;
and determining the data table area image corresponding to the field value cell in the table template based on the position relation between the field name cell and the field value cell in the table template and the data table area image corresponding to the field name cell in the table template.
3. The data sheet recognition method of claim 1, wherein if the character recognition result satisfies the character string format requirement, further comprising:
If the field value unit cell does not have the corresponding candidate character string library and the character number of the character recognition result is larger than 1, performing text error correction processing on the character recognition result;
filling the character recognition result after the text error correction processing into the field value cell of the form template to obtain a data sheet recognition result corresponding to the data sheet image.
4. A data sheet identification method as claimed in any one of claims 1 to 3, further comprising:
if the character recognition result does not meet the character string format requirement, updating the total times that the character recognition result of the data table area image corresponding to the field value unit lattice does not meet the character string format requirement;
if the total times is smaller than or equal to a preset error threshold value, returning to execute the operation of carrying out character recognition on the data table area image corresponding to the field value cell, and judging whether the obtained character recognition result meets the character string format requirement preset and corresponding to the field value cell;
and if the total times are greater than the error threshold, filling the character recognition result of the data table area image corresponding to the last field value cell into the field value cell of the table template to obtain the data table recognition result corresponding to the data table image.
5. The data sheet recognition method of claim 3, wherein said performing text error correction processing on said character recognition result comprises:
processing the character recognition result by using a preset text error correction algorithm, and determining the character to be corrected in the character recognition result and N candidate replacement characters with the maximum replacement probability corresponding to the character to be corrected, wherein N is a natural number;
performing character image segmentation on the data table area image, and determining a character image corresponding to the character to be corrected;
carrying out font structure analysis on the character recognition result of the character image to determine font structure information corresponding to the character image;
and replacing the corresponding character to be corrected by using the candidate replacement character with the largest matching degree between the font structure information in the N candidate replacement characters and the font structure information of the character image and the largest replacement probability, so as to obtain the character recognition result after the text error correction processing.
6. A data sheet identification apparatus, comprising:
the template matching module is used for carrying out character recognition on the data table image, determining the table name of the data table in the data table image, and selecting a table template matched with the table name from a preset table template library, wherein the table template comprises the table name, filled field name cells and blank field value cells;
The region image segmentation module is used for segmenting the data table image into at least one data table region image according to field value cells, wherein each data table region image corresponds to one field value cell and/or one field name cell; aligning the data table images according to the whole table frame of the table template, sequentially determining the data table area images corresponding to each cell after alignment, thereby determining the data table area images corresponding to each field value cell one by one, or matching the field names in the filled field name cells in the table template with the result after character recognition of the data table area images, and determining the data table area images corresponding to each field name cell according to the position relation between the field name cells and the field value cells in the table template after determining the data table area images corresponding to each field name cell;
the region image recognition module is used for carrying out character recognition on the data sheet region image aiming at any data sheet region image of the data sheet image, and judging whether the obtained character recognition result meets the character string format requirement corresponding to the field value cell;
The first data filling module is used for carrying out character string matching on the character recognition result and a candidate character string library corresponding to the field value cell if the character recognition result meets the character string format requirement, filling the character string data matched from the character string library into the field value cell of the form template, and obtaining the recognition result of the data table region image;
and the result generation module is used for obtaining a data table identification result corresponding to the data table image according to the identification result of the at least one data table area image.
7. The data sheet identification device of claim 6, further comprising:
the text error correction module is used for carrying out text error correction processing on the character recognition result if the field value unit cell does not have the corresponding candidate character string library and the character number of the character recognition result is larger than 1;
and the second data filling module is used for filling the character recognition result after the text error correction processing into the field value cells of the form template to obtain a data table recognition result corresponding to the data table image.
8. A terminal device, characterized in that it comprises a memory, a processor, on which a computer program is stored which can be run on the processor, the processor implementing the data table identification method according to any of claims 1-5 when executing the computer program.
9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 5.
CN201810963099.3A 2018-08-22 2018-08-22 Data table identification method and device and terminal equipment Active CN109344831B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810963099.3A CN109344831B (en) 2018-08-22 2018-08-22 Data table identification method and device and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810963099.3A CN109344831B (en) 2018-08-22 2018-08-22 Data table identification method and device and terminal equipment

Publications (2)

Publication Number Publication Date
CN109344831A CN109344831A (en) 2019-02-15
CN109344831B true CN109344831B (en) 2024-04-05

Family

ID=65291974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810963099.3A Active CN109344831B (en) 2018-08-22 2018-08-22 Data table identification method and device and terminal equipment

Country Status (1)

Country Link
CN (1) CN109344831B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109993112B (en) * 2019-03-29 2021-04-09 杭州睿琪软件有限公司 Method and device for identifying table in picture
CN110083815B (en) * 2019-05-07 2023-05-23 中冶赛迪信息技术(重庆)有限公司 Synonymous variable identification method and system
CN110297833A (en) * 2019-07-05 2019-10-01 税安科技(杭州)有限公司 A kind of bordereau error correction method
CN110532273A (en) * 2019-08-30 2019-12-03 北京明略软件系统有限公司 The processing method and processing device of tables of data, storage medium, electronic device
KR20210094483A (en) * 2020-01-21 2021-07-29 캐논 가부시끼가이샤 Image processing system that computerizes document, control method thereof, and storage medium
CN112016424A (en) * 2020-03-31 2020-12-01 北京来也网络科技有限公司 Image data processing method and electronic equipment combining RPA and AI
CN111966794A (en) * 2020-03-31 2020-11-20 复旦大学附属中山医院 Diagnosis and treatment data identification method, system and device
CN111898606B (en) * 2020-05-19 2023-04-07 武汉东智科技股份有限公司 Night imaging identification method for superimposing transparent time characters in video image
CN111768565B (en) * 2020-05-21 2022-03-18 程功勋 Method for identifying and post-processing invoice codes in value-added tax invoices
CN111767818B (en) * 2020-06-23 2024-04-26 北京思特奇信息技术股份有限公司 Method and device for intelligently accepting business
CN111683285B (en) * 2020-08-11 2021-01-26 腾讯科技(深圳)有限公司 File content identification method and device, computer equipment and storage medium
CN112149506A (en) * 2020-08-25 2020-12-29 北京来也网络科技有限公司 Table generation method, apparatus and storage medium in image combining RPA and AI
CN112149399B (en) * 2020-09-25 2024-06-04 北京来也网络科技有限公司 Table information extraction method, device, equipment and medium based on RPA and AI
CN112528832A (en) * 2020-12-07 2021-03-19 国网青海省电力公司电力科学研究院 Method and system for processing PDF-format relay protection fixed value list
CN112926587B (en) * 2021-02-19 2024-03-29 北京大米未来科技有限公司 Text recognition method and device, readable storage medium and electronic equipment
CN112801232A (en) * 2021-04-09 2021-05-14 苏州艾隆科技股份有限公司 Scanning identification method and system applied to prescription entry
CN112995572A (en) * 2021-04-23 2021-06-18 深圳市黑金工业制造有限公司 Remote conference system and physical display method in remote conference
CN113128504B (en) * 2021-04-25 2023-06-20 福州符号信息科技有限公司 OCR recognition result error correction method and device based on verification rule
CN114255840B (en) * 2022-02-25 2022-06-24 广州科犁医学研究有限公司 Intelligent data processing system based on clinical research data
CN114937272A (en) * 2022-05-26 2022-08-23 中国平安人寿保险股份有限公司 Recognition result detection method, device, equipment and medium based on character recognition
CN115964989B (en) * 2023-02-23 2023-09-08 天津联想协同科技有限公司 Information display method, device and storage medium of electronic form
CN116861865A (en) * 2023-06-26 2023-10-10 江苏常熟农村商业银行股份有限公司 EXCEL data processing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105373791A (en) * 2015-11-12 2016-03-02 中国建设银行股份有限公司 Information processing method and information processing device
CN105654072A (en) * 2016-03-24 2016-06-08 哈尔滨工业大学 Automatic character extraction and recognition system and method for low-resolution medical bill image
CN107622263A (en) * 2017-02-20 2018-01-23 平安科技(深圳)有限公司 The character identifying method and device of document image
CN107862303A (en) * 2017-11-30 2018-03-30 平安科技(深圳)有限公司 Information identifying method, electronic installation and the readable storage medium storing program for executing of form class diagram picture
CN108121966A (en) * 2017-12-21 2018-06-05 欧浦智网股份有限公司 A kind of list method for automatically inputting, electronic equipment and storage medium based on OCR technique
CN108345581A (en) * 2017-01-24 2018-07-31 北京搜狗科技发展有限公司 A kind of information identifying method, device and terminal device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105373791A (en) * 2015-11-12 2016-03-02 中国建设银行股份有限公司 Information processing method and information processing device
CN105654072A (en) * 2016-03-24 2016-06-08 哈尔滨工业大学 Automatic character extraction and recognition system and method for low-resolution medical bill image
CN108345581A (en) * 2017-01-24 2018-07-31 北京搜狗科技发展有限公司 A kind of information identifying method, device and terminal device
CN107622263A (en) * 2017-02-20 2018-01-23 平安科技(深圳)有限公司 The character identifying method and device of document image
CN107862303A (en) * 2017-11-30 2018-03-30 平安科技(深圳)有限公司 Information identifying method, electronic installation and the readable storage medium storing program for executing of form class diagram picture
CN108121966A (en) * 2017-12-21 2018-06-05 欧浦智网股份有限公司 A kind of list method for automatically inputting, electronic equipment and storage medium based on OCR technique

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张艳 等.表格型文档自动识别系统及其应用.《系统仿真学报》.2009,21(10),第2916-2920页. *

Also Published As

Publication number Publication date
CN109344831A (en) 2019-02-15

Similar Documents

Publication Publication Date Title
CN109344831B (en) Data table identification method and device and terminal equipment
KR102609341B1 (en) Table recognition method, device, equipment, medium and computer program
US10339428B2 (en) Intelligent scoring method and system for text objective question
WO2022156066A1 (en) Character recognition method and apparatus, electronic device and storage medium
US20210295114A1 (en) Method and apparatus for extracting structured data from image, and device
CN109635305B (en) Voice translation method and device, equipment and storage medium
US11495014B2 (en) Systems and methods for automated document image orientation correction
CN111353501A (en) Book point-reading method and system based on deep learning
RU2571396C2 (en) Method and system for verification during reading
CN110728272A (en) Method for inputting certificate information based on OCR and related device
US11086913B2 (en) Named entity recognition from short unstructured text
CN111340640A (en) Insurance claim settlement material auditing method, device and equipment
US10242277B1 (en) Validating digital content rendering
CN112417899A (en) Character translation method, device, computer equipment and storage medium
CN113205047A (en) Drug name identification method and device, computer equipment and storage medium
US11106908B2 (en) Techniques to determine document recognition errors
CN111104400A (en) Data normalization method and device, electronic equipment and storage medium
CN112149680B (en) Method and device for detecting and identifying wrong words, electronic equipment and storage medium
CN111339910B (en) Text processing and text classification model training method and device
CN110929514B (en) Text collation method, text collation apparatus, computer-readable storage medium, and electronic device
CN116860747A (en) Training sample generation method and device, electronic equipment and storage medium
CN114387602B (en) Medical OCR data optimization model training method, optimization method and equipment
JP2020087112A (en) Document processing apparatus and document processing method
US20210019554A1 (en) Information processing device and information processing method
CN110827261B (en) Image quality detection method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant